WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 80% (H?Y) 00:00:02.100 --> 00:00:09.200 Okay, diagnostic performance, 1 and 2. Because we are going to talk for two lectures on diagnostic 00:00:09.200 --> 00:00:15.450 performance. And we talked a little bit about things around it or things that are linked to this. 00:00:15.450 --> 00:00:20.700 We talked about what is screening because that is when you need diagnostic performance. We will talk 00:00:20.700 --> 00:00:26.400 about the Cats quickly, the critical appraisal tools. We've seen that already but now in relation 00:00:26.400 --> 00:00:31.900 to screening. Then we start talking about diagnostic performance and I think NOTE Treffsikkerhet: 91% (H?Y) 00:00:31.900 --> 00:00:37.600 that's about where we will stop for today. But then we also will talk about later on about what 00:00:37.600 --> 00:00:44.700 cutoff points. When do you decide if you use a screen someone is at risk or not of a certain 00:00:44.700 --> 00:00:50.500 condition, of a disease, you need to decide when someone fails your screen and needs further 00:00:50.500 --> 00:00:56.500 assessment. Then we talk a little bit more in detail about diagnostic performance, and we end with 00:00:56.500 --> 00:01:01.849 error, accuracy, and precision. All of that is two lectures. NOTE Treffsikkerhet: 90% (H?Y) 00:01:01.849 --> 00:01:05.500 But for now, we will start with screening. NOTE Treffsikkerhet: 90% (H?Y) 00:01:05.500 --> 00:01:11.700 So, what is screening? There are many, many different screenings with different rules for different 00:01:11.700 --> 00:01:19.300 reasons, and different purposes. But in general, screening intervention is designed to identify a 00:01:19.300 --> 00:01:27.100 disease and you want to identify it before symptoms are visible. You want to identify it ASAP as 00:01:27.100 --> 00:01:35.100 soon as possible so you can intervene before it spreads. So you're looking for unrecognized cases of 00:01:35.100 --> 00:01:35.400 a disease NOTE Treffsikkerhet: 84% (H?Y) 00:01:35.400 --> 00:01:42.900 or condition and you want to apply it usually to the whole population. Usually, not always, we 00:01:42.900 --> 00:01:48.000 talk about that a bit later. So the tests are done in as I said in those who have not got the 00:01:48.000 --> 00:01:53.300 symptoms yet. Because if you've got the symptoms, why screen, you know already you've got the 00:01:53.300 --> 00:02:00.300 disease. So screenings are not designed to be diagnostic. That's an important thing. They 00:02:00.300 --> 00:02:05.350 are actually meant for if you fail the screen, you are referred for further assessment. NOTE Treffsikkerhet: 86% (H?Y) 00:02:05.350 --> 00:02:12.000 That's all, it's only at risk or not at risk. So you've got the whole population or whatever you 00:02:12.000 --> 00:02:17.600 decide that is your target population. You screen them, those who fail, you will refer for further 00:02:17.600 --> 00:02:24.899 assessment and that can be for the support, a treatment or whatever is available for that disease. 00:02:24.899 --> 00:02:32.500 There are several factors that are influencing whether you want to 00:02:32.500 --> 00:02:35.350 screen or not. It is the costs of cause of it, NOTE Treffsikkerhet: 78% (H?Y) 00:02:35.350 --> 00:02:41.200 if you want to screen the whole population of Norway, that's about 5 million. And it's about 00:02:41.200 --> 00:02:47.900 scope, how broad, are you going to scream all age groups? Are you going to screen people that are more at 00:02:47.900 --> 00:02:55.500 risk? And it is how much time will it cost. All of that is important to decide before implementing a 00:02:55.500 --> 00:03:02.200 screen. Now, screening criteria in general are things like... it should be importance of Public Health 00:03:02.200 --> 00:03:05.300 or public education or the significant burden of a disease. NOTE Treffsikkerhet: 83% (H?Y) 00:03:05.300 --> 00:03:11.000 You should be able to recognize it as at an early stage, if you can't 00:03:11.000 --> 00:03:20.100 recognize it, no screening. So you must be able to diagnose before onset of symptoms. You must 00:03:20.100 --> 00:03:25.300 have a good test available with good diagnostic performance. We will talk about that a bit later. 00:03:25.300 --> 00:03:33.000 Once you say these are at risk of a certain Behavior, you must have resources to confirm a 00:03:33.000 --> 00:03:35.500 diagnosis. It must have benefit, NOTE Treffsikkerhet: 84% (H?Y) 00:03:35.500 --> 00:03:43.400 I mean, if you diagnose people, if you identify people at risk early, but there's no 00:03:43.400 --> 00:03:49.300 benefit in finding them, then why would you screen? A benefit could be, you've got an 00:03:49.300 --> 00:03:55.900 intervention ready, you can do something about it, improve it or support symptoms or make life 00:03:55.900 --> 00:04:03.200 easier. So you want an effective treatment and you want to have a good prognosis. All of that can be 00:04:03.200 --> 00:04:05.250 of importance to before deciding NOTE Treffsikkerhet: 85% (H?Y) 00:04:05.250 --> 00:04:12.500 we are going to screen all children or not or adults or whatever. So different purposes, one is for 00:04:12.500 --> 00:04:19.899 instance, cases detection. People are screened for their own benefit. Then you've got also, you can 00:04:19.899 --> 00:04:26.000 screen people to control a disease. You are screening because you won't like AIDS, you want to 00:04:26.000 --> 00:04:30.000 control the disease for benefit also of others. NOTE Treffsikkerhet: 87% (H?Y) 00:04:30.000 --> 00:04:35.500 It could be for research purposes that you want to understand how does this disease 00:04:35.500 --> 00:04:42.250 develop or this condition in education it can be like public awareness if you all need to be 00:04:42.250 --> 00:04:49.500 aware of a certain disease or a certain risk. Now, if we look at screaming for cases detection, then 00:04:49.500 --> 00:04:56.400 it is about identification of unrecognized disease, which does not arise from the patient. It's not the 00:04:56.400 --> 00:04:59.900 patient that is asking to be screened for incident. NOTE Treffsikkerhet: 82% (H?Y) 00:04:59.900 --> 00:05:06.400 Neonatal screaming is an important one. It is initiated by medical and public health personnel. 00:05:06.400 --> 00:05:12.300 And for example, if we talk about health, we talk about breast cancer, diabetes, but also have got 00:05:12.300 --> 00:05:18.200 hearing in young kids and we can do that now earlier and earlier, and it has got good 00:05:18.200 --> 00:05:24.400 consequences for child Development if we know the child will got a hearing problem. 00:05:24.400 --> 00:05:30.000 Now different types and we go to mass screening. So, mass screening is to screen of NOTE Treffsikkerhet: 91% (H?Y) 00:05:30.000 --> 00:05:36.450 all irrespective of the particular risk, everybody. Then you have high risk of selection screening. 00:05:36.450 --> 00:05:44.800 Then you only apply it in those at high risk, select a certain group. Now multi-purpose screening. 00:05:44.800 --> 00:05:50.600 That means you apply 2 or more times in a combination to a large number at one time. So you combine 00:05:50.600 --> 00:05:56.800 them. And that means that for instance, you can have sequential screening, a two-stage screening which 00:05:56.800 --> 00:05:59.800 means the first test is conducted. NOTE Treffsikkerhet: 87% (H?Y) 00:05:59.800 --> 00:06:05.900 And those who fail the first one, only those get the second test. That means the overall process will 00:06:05.900 --> 00:06:13.000 increase specificity, but reduce sensitivity. We talk about those two terms, two 00:06:13.000 --> 00:06:15.300 concepts a bit later. NOTE Treffsikkerhet: 87% (H?Y) 00:06:16.000 --> 00:06:22.600 Then you've got a diagnostic test and that one is performed after a positive screen. Screen tells 00:06:22.600 --> 00:06:29.200 you that you are at risk, then diagnostic tests can confirm a diagnosis and give you a definitive 00:06:29.200 --> 00:06:33.150 diagnosis. Screening does not do that. NOTE Treffsikkerhet: 79% (H?Y) 00:06:33.150 --> 00:06:38.600 Now, if we go to screening test versus diagnostic test, what are the differences, because there are 00:06:38.600 --> 00:06:46.400 differences. Then, for instance, screenings is done in apparently people without symptoms or 00:06:46.400 --> 00:06:52.500 children without any characteristics. And diagnostic you do that in a group that is identified as 00:06:52.500 --> 00:07:00.700 being at risk. Screening is in groups, diagnostic test is in a single patient. Screening is not 00:07:00.700 --> 00:07:03.450 black and white. It's not confirmed. You may still NOTE Treffsikkerhet: 75% (MEDIUM) 00:07:03.450 --> 00:07:09.750 Don't have the disease or the condition. And diagnostic is based on a evaluation offset signs. 00:07:09.750 --> 00:07:15.600 Screaming is a single criterion and a cut-off point, you decide that when you think it is useful to 00:07:15.600 --> 00:07:25.800 refer a child for further dyslexia assessment or you name it. But diagnostic tests are based 00:07:25.800 --> 00:07:32.400 on evaluation of signs. And on the left, it's less accurate. It's less expensive. It's not free treatment. NOTE Treffsikkerhet: 74% (MEDIUM) 00:07:33.450 --> 00:07:39.000 On the right, is more accurate, but it's more expensive, you do it as a base of the outcome of 00:07:39.000 --> 00:07:47.000 a diagnostic tests can be used in treatment. And in the end in the final bit is, screening is you 00:07:47.000 --> 00:07:52.549 initiate that. And diagnostic tests can be initiated from the parent from the child, or the parents, 00:07:52.549 --> 00:07:57.800 or patients. They have complaints, they come to you and you need to... So it's really a 00:07:57.800 --> 00:08:03.549 different thing. Screening is supposed to be quick and easy, cheap and it can NOTE Treffsikkerhet: 91% (H?Y) 00:08:03.549 --> 00:08:06.950 for any sort of condition. NOTE Treffsikkerhet: 86% (H?Y) 00:08:06.950 --> 00:08:13.600 Now there are some criteria before you implement a screening. So if you talk about a disease or 00:08:13.600 --> 00:08:18.200 condition again, but all these screens talk always about disease but we do exactly the same in 00:08:18.200 --> 00:08:26.250 teaching and education. There must be some serious consequence. A developmental consequence, a 00:08:26.250 --> 00:08:33.700 educational goal that if you screen early can intervene. So, it should the disease should be 00:08:33.700 --> 00:08:37.600 serious. They should be a high prevalence of the NOTE Treffsikkerhet: 89% (H?Y) 00:08:37.600 --> 00:08:45.100 stage, meaning, if it is very rare, very very rare than it may not be useful to screen. You must 00:08:45.100 --> 00:08:51.800 understand how it develops. There must be a test available, etc. Etc. If we talk about diagnostic 00:08:51.800 --> 00:08:57.300 tests, you want one that is sensitive and specific. That one should still be simple and cheap 00:08:57.300 --> 00:09:02.600 because it's still relatively large group that you probably use it for. It should be reliable, that 00:09:02.600 --> 00:09:07.550 means if you repeat it, you want the same results and of course it should be safe. NOTE Treffsikkerhet: 76% (H?Y) 00:09:07.550 --> 00:09:13.600 Now that also has got to do with diagnosis and treatment. If you screen or you've got 00:09:13.600 --> 00:09:19.000 diagnostic test, and you can't treat it, that has got a different consequence. And if you know, 00:09:19.000 --> 00:09:25.800 well, if I treat them early or diagnose early or identify them early, I can actually 00:09:25.800 --> 00:09:33.100 intervene. All of these criteria are important before deciding that you want to screen for a certain 00:09:33.100 --> 00:09:37.550 condition or disease. We know if you screen NOTE Treffsikkerhet: 83% (H?Y) 00:09:37.550 --> 00:09:44.300 very early in very young kids for hearing impaired children, you identify them early that the 00:09:44.300 --> 00:09:49.000 outcome is much better and you can intervene. So it's important to do that early so all the young 00:09:49.000 --> 00:09:56.600 kids are being screened early. Now screening there has got to do also with Cats, critical appraisal 00:09:56.600 --> 00:10:03.500 tools. Now, we talked about Cats in the past and you also need to know 00:10:03.500 --> 00:10:07.500 if you talk about selection of screening, which one do you NOTE Treffsikkerhet: 76% (H?Y) 00:10:07.500 --> 00:10:14.400 select and we said well, you look for an article that describes the test, the screen; you 00:10:14.400 --> 00:10:21.050 look at diagnostic performance of that screen and then you can implement it. But if study quality 00:10:21.050 --> 00:10:26.900 of the study reporting on a screen is poor, you are no longer interested in that particular 00:10:26.900 --> 00:10:34.100 screen. If diagnostic performance is poor off that screen, your again not interested. Only if study 00:10:34.100 --> 00:10:37.350 quality and diagnostic performance are okay, NOTE Treffsikkerhet: 90% (H?Y) 00:10:37.350 --> 00:10:44.500 then you can implement or consider implementing that screen in clinics and research. Now we said 00:10:44.500 --> 00:10:51.000 already Cats is what you need to use when you want 00:10:51.000 --> 00:10:59.600 to address the methodological quality of a screen. And you've got Cats specifically for 00:10:59.600 --> 00:11:06.200 diagnostic accuracy studies. And this was a simple one. This was the Cochran one. We seen that one 00:11:06.200 --> 00:11:08.500 already quickly or briefly. NOTE Treffsikkerhet: 74% (MEDIUM) 00:11:08.500 --> 00:11:15.500 The Cochran one has got nine criteria. Six criteria are about validity. One criteria is 00:11:15.500 --> 00:11:23.900 about generalizability of your results, of your screen. And two criteria about reliability, meaning, 00:11:23.900 --> 00:11:30.700 how precise can you repeat it having the same results. Validity, do you actually 00:11:30.700 --> 00:11:35.200 measure what you think you're measuring. Generalizability, can I use it in other 00:11:35.200 --> 00:11:38.500 populations? Reliability is about consistency. NOTE Treffsikkerhet: 77% (H?Y) 00:11:38.500 --> 00:11:44.000 Another example is QUADAS and QUADAS 2 because as I said, there was a QUADAS 1. 00:11:44.000 --> 00:11:49.900 You can have a look at the website and check that one. It's got criteria. It combines 00:11:49.900 --> 00:11:57.000 description, signaling questions, risk of bias and concerns regarding applicability per column and 00:11:57.000 --> 00:12:02.200 each column is like patient, selection index, test reference, standard flow and time. Well, you can 00:12:02.200 --> 00:12:07.150 have a look at more detail and then you get these beautiful pictures always with generally NOTE Treffsikkerhet: 75% (MEDIUM) 00:12:07.150 --> 00:12:16.400 red is bad, green is good. You can see how each screen is performing on these different item, 00:12:16.400 --> 00:12:26.400 different domains. Now, one more, there's also started at 2015, and that one has got its also 00:12:26.400 --> 00:12:31.500 guidelines or reporting on these screens. And it's got to do with the abstract, introduction, methods. 00:12:31.500 --> 00:12:37.500 You can see it follows again the IMRAD structure and it gives you a criteria what you NOTE Treffsikkerhet: 82% (H?Y) 00:12:37.500 --> 00:12:45.500 need to report on and what should be reported if you read an article that is describing a screen. 00:12:45.500 --> 00:12:53.300 And that goes on and in the end you will have the results, the analysis, results of the participants, 00:12:53.300 --> 00:13:01.800 the test results and the discussion. Usually these Cats follow the IMRAD 00:13:01.800 --> 00:13:07.400 structure which helps you. It's like a recipe. I always say writing an article NOTE Treffsikkerhet: 75% (MEDIUM) 00:13:07.400 --> 00:13:14.000 is a standard recipe. It's not too creative. We're not writing novels, but it helps us 00:13:14.000 --> 00:13:19.800 understanding research easier, you know where to look in the article. Now, STARD comes also with a 00:13:19.800 --> 00:13:26.800 flow chart and you can see the flow chart about what happens with patients. So they start here with 00:13:26.800 --> 00:13:33.800 they've got a number of patients enrolled, then you need to explain why you exclude them, these are 00:13:33.800 --> 00:13:37.500 potentially eligible. These are not eligible. Then you've got your NOTE Treffsikkerhet: 79% (H?Y) 00:13:37.500 --> 00:13:44.400 indexes that you reference. That is your your test you want to check. And here you can see negative 00:13:44.400 --> 00:13:51.000 positive, inclusive, etc, etc. And then you compare it with a reference test. 00:13:51.000 --> 00:13:57.200 And the reference test is your gold standard, and it should not be too much time between your index 00:13:57.200 --> 00:14:04.200 test and your reference stick test because of course, you don't want change in your results. And if 00:14:04.200 --> 00:14:07.450 patients change, then if their condition NOTE Treffsikkerhet: 91% (H?Y) 00:14:07.450 --> 00:14:14.600 change or a child matures... Then that doesn't help very much. All of that, that whole flow chart is 00:14:14.600 --> 00:14:21.200 part of as I said of STARD. Okay, let's move on if you want to know more about Cats and this is all 00:14:21.200 --> 00:14:28.300 I'm going to talk about cats for now, then go to the website. But there are several cats. That's all 00:14:28.300 --> 00:14:35.150 I'm saying and they all have got pros and cons. But what we really want to talk about today is 00:14:35.150 --> 00:14:37.200 diagnostic performance. NOTE Treffsikkerhet: 89% (H?Y) 00:14:37.200 --> 00:14:45.400 So, let's see how we're doing there. Diagnostic performance, it's got to do with reference tests and 00:14:45.400 --> 00:14:51.200 index tests. Now, you must know the correct disease status prior to the calculation, 00:14:51.200 --> 00:14:57.400 meaning, you need to know a few things. You need to know about the 00:14:57.400 --> 00:15:03.000 symptoms. You need to know how to identify it, but you need a gold standard test or so-called 00:15:03.000 --> 00:15:07.400 reference test and that is the test in a certain area NOTE Treffsikkerhet: 86% (H?Y) 00:15:07.400 --> 00:15:16.000 that all experts agree on, this is the best way to assess, evaluate, diagnose a certain 00:15:16.000 --> 00:15:22.600 condition. Now if I talk about my area, swallowing disorders, we talk about a recording and 00:15:22.600 --> 00:15:29.750 endoscopic or a video fluoroscopic recording of swolling, we say that is the best you can do. 00:15:29.750 --> 00:15:35.700 Okay, but because we don't want to send every single patient or child into the hospital for a video 00:15:35.700 --> 00:15:37.450 fluoroscopic recording, NOTE Treffsikkerhet: 89% (H?Y) 00:15:37.450 --> 00:15:43.000 we've got simple screening tests like water swallows that you can do anywhere. Well, in education, we've got 00:15:43.000 --> 00:15:51.600 also many screening tests. So we want to compare a screen with a gold standard, because the screen is 00:15:51.600 --> 00:15:58.800 easier, quicker, cheaper, Etc. But sometimes, you do not have a gold standard, that's a little bit 00:15:58.800 --> 00:16:03.800 tricky. We talk about that a bit later, because how can you compare your screen if there's no gold 00:16:03.800 --> 00:16:07.349 standard in your area? For instance, you want to screen NOTE Treffsikkerhet: 77% (H?Y) 00:16:07.349 --> 00:16:13.600 for shyness, there will not be a gold standard for shyness. And I give that example 00:16:13.600 --> 00:16:19.300 because we did a review on shyness. But anyhow, what you do, you've got your gold standard. You've 00:16:19.300 --> 00:16:26.600 got your screening test and then you use both in the same population and you use a cross 00:16:26.600 --> 00:16:33.600 tabulation, two by two table to compare results. So, you've got the same moment, almost the same 00:16:33.600 --> 00:16:37.400 moment, you screen the population and you use your gold standard NOTE Treffsikkerhet: 90% (H?Y) 00:16:37.400 --> 00:16:45.400 and then you can compare and then you can see how well is your screen performing. That is the 00:16:45.400 --> 00:16:53.300 basis of screening. Okay, but sometimes you get lost because there's a lot out there. This is 00:16:53.300 --> 00:16:59.100 not even all but this is what I'm going to talk about. All of this is part of diagnostic 00:16:59.100 --> 00:17:03.900 performance. So you've got prevalence, we know what prevalence is,five people out of 00:17:03.900 --> 00:17:07.400 hundred have got the flu. I always take that example because it happens. NOTE Treffsikkerhet: 90% (H?Y) 00:17:07.400 --> 00:17:13.000 So that means prevalence 5% of flue. Its cross-sectional, it's just one moment and I'll measure it, 00:17:13.000 --> 00:17:18.599 but we've got sensitivity and specificity. We've got predictive values, positive and negative. We've 00:17:18.599 --> 00:17:24.750 got likelihood ratios, positive and negative. But we've got more, but this is where we stop. NOTE Treffsikkerhet: 86% (H?Y) 00:17:24.750 --> 00:17:32.600 Okay, prevalence. We've done that one, easy one. We've got a group of people, proportion of subjects 00:17:32.600 --> 00:17:37.400 in a population, having a disease. But I want you to get used to the cross tabs, that's why I give it 00:17:37.400 --> 00:17:42.500 now also. Ignore the screen. Just look at the gold standard, the reference 00:17:42.500 --> 00:17:48.200 test at this moment. We say we've got people that have got the condition and we've got people that 00:17:48.200 --> 00:17:54.150 do hot not have got the condition. Now and that is my gold standard, NOTE Treffsikkerhet: 76% (H?Y) 00:17:54.150 --> 00:18:00.700 if I take the population of Portugal, it's about 10 million. The prevalence is 16%, we 00:18:00.700 --> 00:18:08.400 know that, I knowthe number of people at risk of swallowing problems. I look at my test or in 00:18:08.400 --> 00:18:13.000 this case I look at my reference test and I know that those are the people with a swallowing 00:18:13.000 --> 00:18:20.400 problem. That is my total. Then that is the difference. So, of course, the total should always be 00:18:20.400 --> 00:18:24.199 this some of this cell and that cell. NOTE Treffsikkerhet: 81% (H?Y) 00:18:24.199 --> 00:18:30.700 So if this would be a gold standard and then you say diseased, healthy. Now, here is my index test. We have a 00:18:30.700 --> 00:18:32.400 look at that one later. NOTE Treffsikkerhet: 85% (H?Y) 00:18:32.400 --> 00:18:39.800 So now if we go to the cross tabs, we can see, we've got true positives and true negatives. These 00:18:39.800 --> 00:18:47.700 ones are my reference tests. These are ill, whatever condition is and my 00:18:47.700 --> 00:18:54.800 screen says, yeah, I agree. So, green is good. Here as well, these are healthy according to 00:18:54.800 --> 00:18:59.600 my reference test and healthy according to a screening test. So you can imagine I actually would 00:18:59.600 --> 00:19:03.350 prefer all my subjects in the green cells. NOTE Treffsikkerhet: 76% (H?Y) 00:19:03.350 --> 00:19:11.050 That's not life. We know, we've got false negatives. We know we've got false positives. So true 00:19:11.050 --> 00:19:20.699 positives are identified patients or person that the condition. True negatives are identified 00:19:20.699 --> 00:19:27.400 healthy persons, then the problems with my screen because the gold standard is always right, that's 00:19:27.400 --> 00:19:32.699 what we assume. The problems with my screen is in the red cell. NOTE Treffsikkerhet: 83% (H?Y) 00:19:32.699 --> 00:19:40.600 So false negatives missed patients or children of the condition and you missed them. False 00:19:40.600 --> 00:19:49.600 positives are healthy persons, identified patient or a child, typically developing child, but 00:19:49.600 --> 00:19:57.100 you identify the child as having a condition. This is the basics of diagnostic performance. So you 00:19:57.100 --> 00:20:02.100 need to understand this cell. So what you like would like is of course that everything is in the green. 00:20:02.699 --> 00:20:07.000 Again, green is good. Red is bad. That is what you need to remember. NOTE Treffsikkerhet: 89% (H?Y) 00:20:07.000 --> 00:20:13.100 So back to this one, prevalence again. Proportion of subjects in a population, having a disease. 00:20:13.100 --> 00:20:19.100 We're going to have a look now at these cross tabs and this is a ratio. That's why I've got the 00:20:19.100 --> 00:20:26.800 column there. So this divided by that, let's have a look. These are according to my gold 00:20:26.800 --> 00:20:34.449 standard, the people that are ill or have the condition, NOTE Treffsikkerhet: 80% (H?Y) 00:20:34.449 --> 00:20:43.500 divided by the total group. So, these two cells divided by that, that is prevalence. NOTE Treffsikkerhet: 87% (H?Y) 00:20:43.500 --> 00:20:52.300 Now, I give you an example or a question actually. So if you look at my beautiful cross tabs, under 00:20:52.300 --> 00:20:58.600 what circumstances would you want to minimize the false positive? So minimize, these are the 00:20:58.600 --> 00:21:05.100 acronyms that you will see a lot. You want to FPs, false positives. 00:21:05.100 --> 00:21:13.050 You want to minimize those, so you don't want too many healthy persons identified as patients. 00:21:13.050 --> 00:21:14.199 When is that problem? NOTE Treffsikkerhet: 91% (H?Y) 00:21:14.199 --> 00:21:16.800 Is that important? NOTE Treffsikkerhet: 80% (H?Y) 00:21:16.800 --> 00:21:22.900 Now such a small group so I'm not going to tease the few that are there, but the reason is, one 00:21:22.900 --> 00:21:32.300 reason could be if the costs or risk of follow-up therapy very high and for instance 00:21:32.300 --> 00:21:40.000 if the disease is not life-threatening... So you've got high costs like flu, it's not 00:21:40.000 --> 00:21:47.050 life-threatening, why would you test for flu? So you need to NOTE Treffsikkerhet: 91% (H?Y) 00:21:47.050 --> 00:21:53.900 in case of high costs, and or the disease is not life-threatening, then why would you bother? NOTE Treffsikkerhet: 90% (H?Y) 00:21:53.900 --> 00:21:59.500 So healthy persons identified as patients. That was what we were looking for. Now, false 00:21:59.500 --> 00:22:05.900 negatives, under what circumstances would you want to minimize the false negatives? So we're talking 00:22:05.900 --> 00:22:15.200 now about when do you want to reduce these numbers? The misspatients? 00:22:15.200 --> 00:22:21.500 Well, the answer would be of course if disease does not have symptoms and it is serious. So you may 00:22:21.500 --> 00:22:24.150 miss it, if it progresses quickly NOTE Treffsikkerhet: 86% (H?Y) 00:22:24.150 --> 00:22:31.200 and you could treat it more effectively at early stages or if it is very contagious, 00:22:31.200 --> 00:22:38.300 it spreads easily from one person to the other, then you do not want to miss a patient. You do want to 00:22:38.300 --> 00:22:42.400 reduce the numbers in this particular cell. NOTE Treffsikkerhet: 91% (H?Y) 00:22:42.400 --> 00:22:51.100 I hope that's clear. Okay. Now we will talk another time about Cosmin. But Cosmin has got to do 00:22:51.100 --> 00:22:56.800 withcriterion validity, is one of the nine psychometric properties within the psychometric 00:22:56.800 --> 00:23:03.900 framework of the cosmin group. And Criterion validity is the degree to which scores of a 00:23:03.900 --> 00:23:14.950 health-related... This is just a prom, instrument for patients are an adequate reflection of a gold standard. 00:23:14.950 --> 00:23:21.300 Now that is actually what we are doing. So we have the criterion or reference test. That is 00:23:21.300 --> 00:23:27.800 my gold standard and I compare my index test with that gold standard. So we actually are looking at 00:23:27.800 --> 00:23:33.800 Criterion validity when we are comparing the test with the gold standard, that is one of the 00:23:33.800 --> 00:23:39.100 psychometric properties. Okay, now we use different terms when we talk about diagnostic and diagnostic 00:23:39.100 --> 00:23:42.950 performance. We talk for instance about sensitivity. NOTE Treffsikkerhet: 77% (H?Y) 00:23:42.950 --> 00:23:50.650 Sometimes people call it recall but sensitivity is much more common or specificity. Very common 00:23:50.650 --> 00:23:59.400 terms in diagnostic performance. Now, let's first have a look at sensitivity. Sensitivity is the 00:23:59.400 --> 00:24:05.800 proportion of reference test positive. We're talking about diseased subjects and 00:24:05.800 --> 00:24:11.300 disease is also having a condition or whatever. So otherwise me to keep repeating it. It's the 00:24:11.300 --> 00:24:13.200 same, you know what I mean? NOTE Treffsikkerhet: 80% (H?Y) 00:24:13.200 --> 00:24:21.700 So diseased subjects, who test positive with the screen. So this is actually about how 00:24:21.700 --> 00:24:28.900 well does the screening test for presence of disease and you can see the beautiful formula. 00:24:28.900 --> 00:24:39.100 How does that look in my ratio? So that divided by that one, that is my true positives divided 00:24:39.100 --> 00:24:43.650 by the true and the true positive and false negatives. NOTE Treffsikkerhet: 87% (H?Y) 00:24:43.650 --> 00:24:50.900 So this is what my gold standard says, both are disease. You can see the plus. There's anything 00:24:50.900 --> 00:24:58.600 here. The total here both cells are part of what the reference says they are disease. 00:24:58.600 --> 00:25:06.900 My screen only finds these. So this ratio, that cell divided to what actually is true. 00:25:06.900 --> 00:25:13.000 According to my gold standard that is sensitivity. So sensitivity is true positives. NOTE Treffsikkerhet: 81% (H?Y) 00:25:13.000 --> 00:25:22.250 That cell divided by true positives and false negatives. So it is how well does my screen 00:25:22.250 --> 00:25:29.000 pick up on those that are disease and this is what it takes up. So it's how well does the 00:25:29.000 --> 00:25:37.800 screen for presence of disease. I hope that's clear. Now, if I look at specificity, then I've got a 00:25:37.800 --> 00:25:43.300 similar reasoning but now it is the proportion of reference test negative NOTE Treffsikkerhet: 76% (H?Y) 00:25:43.300 --> 00:25:50.500 healthy typically-developing subjects who test negative screening tests, we talking about how well 00:25:50.500 --> 00:25:57.000 this is a screening test for absence of disease. Again, beautiful formula. How does it look in my 00:25:57.000 --> 00:26:04.500 crosstabs? It is the true negatives... So these have been identified by my screening tests, but that 00:26:04.500 --> 00:26:12.950 is what actually is healthy, the true negative divided by the sum. This is the total, identified by my NOTE Treffsikkerhet: 91% (H?Y) 00:26:12.950 --> 00:26:22.700 criterion, by my reference test, but I only found this with my screen. So the ratio is set specificity, 00:26:22.700 --> 00:26:29.800 true negatives divided by the sum of true negatives and false positives. How well does 00:26:29.800 --> 00:26:38.449 my screen test for absence of disease? So that's the difference between specificity and sensitivity. NOTE Treffsikkerhet: 87% (H?Y) 00:26:38.449 --> 00:26:42.300 You need to stop me if you've got a question. NOTE Treffsikkerhet: 88% (H?Y) 00:26:43.600 --> 00:26:53.900 Okay, so in general, high sensitivity tests have low specificity. In other words, they are good for 00:26:53.900 --> 00:26:59.800 catching actual cases of the disease, but they also come with a fairly High 00:26:59.800 --> 00:27:06.500 rate of false positives. Now here we've got an example, we publish something on the prevalence of 00:27:06.500 --> 00:27:13.400 drooling swollowing feeding in cerebral palsy and prevalence is something that is usually identified 00:27:13.400 --> 00:27:14.650 with the screen. NOTE Treffsikkerhet: 91% (H?Y) 00:27:14.650 --> 00:27:20.500 And we found that drooling swallowing and feeding problems are very common. Highs are high 00:27:20.500 --> 00:27:27.000 percentages, we knew that, but it's really pretty high, about 50% and one of the screens that people 00:27:27.000 --> 00:27:33.900 use is a so-called water swallow test. So, let me tell you about the water swallow test. This is how 00:27:33.900 --> 00:27:37.200 it looks like, can't be much easier. NOTE Treffsikkerhet: 81% (H?Y) 00:27:37.300 --> 00:27:41.100 Being instructed in Dutch here. NOTE Treffsikkerhet: 91% (H?Y) 00:27:42.200 --> 00:27:45.700 I got rid of the sound. NOTE Treffsikkerhet: 86% (H?Y) 00:27:45.700 --> 00:27:52.250 She drinks. Any disruption cough or whatever means she fails. NOTE Treffsikkerhet: 83% (H?Y) 00:27:52.250 --> 00:27:59.000 And that is an obvious fail of a water swallow. So, the only thing that you do, you ask them to 00:27:59.000 --> 00:28:06.300 drink 90 CC and you check for any disruption, like coughing, choking, not being able 00:28:06.300 --> 00:28:13.300 to finish it in one minute, simple things like that are being used to identify at risk or not. 00:28:13.300 --> 00:28:20.400 This actually at risk or not of a swallowing problem. Now, we're going to look at diagnostic. So the gold 00:28:20.400 --> 00:28:22.600 standard is endoscopy, that means you put NOTE Treffsikkerhet: 88% (H?Y) 00:28:22.600 --> 00:28:30.700 scope into the mouth or your nose and then you can see if someone is leaking anything 00:28:30.700 --> 00:28:36.600 of the water into the lungs. And we are talking about prediction of aspiration. So we're talking 00:28:36.600 --> 00:28:44.300 about, can my screen predict that food or liquid goes into my lungs, because that's exactly what you 00:28:44.300 --> 00:28:51.100 do not want. And the three ounce water swallow has got a sensitivity of 96% and is specificity of 49%. 00:28:52.400 --> 00:28:58.100 So what does that mean? Is that good? Is that bad? So that is actually what we want to know, because 00:28:58.100 --> 00:29:03.800 I can give you numbers, but how to interpret them. Now again this test. So we said this is 00:29:03.800 --> 00:29:13.500 sensitivity, how well does the screen test for the presence of disease. Sensitivity 96%. Meaning 96% 00:29:13.500 --> 00:29:22.600 of your patient, almost everybody is identified. That is very good. That is a NOTE Treffsikkerhet: 91% (H?Y) 00:29:22.600 --> 00:29:30.000 fantastic sensitivity. Specificity, how well does the screening test for absence of the disease? The 00:29:30.000 --> 00:29:38.200 specificity was 49 percent. That means you've got a lot of false positives. So that means all these 00:29:38.200 --> 00:29:44.800 false positives, children, adults that have you think they might be at risk, but actually do not 00:29:44.800 --> 00:29:51.500 have the disease or condition, you all referred them for further assessment and that is a problem. 00:29:51.500 --> 00:29:52.500 Because you will boom, NOTE Treffsikkerhet: 71% (MEDIUM) 00:29:52.500 --> 00:29:58.600 you will overload the healthcare system or educational system with that. So although 00:29:58.600 --> 00:30:04.500 sensitivity is fantastic, patient safety very good, identification of disease or condition is fantastic, 00:30:04.500 --> 00:30:13.500 you probably have got a problem dealing with such a low specificity. That is a lot. 00:30:13.500 --> 00:30:21.699 So that is actually probably not feasible to have low numbers like that for specificity. NOTE Treffsikkerhet: 90% (H?Y) 00:30:21.699 --> 00:30:26.750 Now I've got something. This is the final bit I want to do before the break. 00:30:26.750 --> 00:30:35.400 So, we're going to talk about deer and cows, sensitivity and specificity. 00:30:35.400 --> 00:30:41.600 So we start with reading this little bit. So specificity 00:30:41.600 --> 00:30:49.300 for the outdoorsmen and women, if there are deer, so there 00:30:49.300 --> 00:30:51.449 is the one on the left for those who don't know, NOTE Treffsikkerhet: 82% (H?Y) 00:30:51.449 --> 00:30:59.400 that's a dear, if ther are deer and cows in the field, a nearsighted but knowledgeable naturalist will 00:30:59.400 --> 00:31:07.100 not mistake cow for a deer. He knows which one is which, but there may be some deer he doesn't see. 00:31:07.100 --> 00:31:15.600 He is like a specific test. A deer hunter may see all the deer but also mistakenly shoot a few cows. 00:31:15.600 --> 00:31:19.150 He is like a sensitive test that is not very specific. NOTE Treffsikkerhet: 75% (MEDIUM) 00:31:19.150 --> 00:31:25.050 I'm sure you're lost now. So let's have a better look at these cows and deers. Here they are. NOTE Treffsikkerhet: 87% (H?Y) 00:31:25.050 --> 00:31:32.600 Okay, here we go. The aim is to identify the deer, hunting deer. 00:31:32.600 --> 00:31:40.100 A nearsighted but knowledge naturalist, so he only sees what is in the circle, will not mistake 00:31:40.100 --> 00:31:47.900 a cow for a deer. So all the cows are true negatives. But there may be some deer he doesn't see 00:31:47.900 --> 00:31:53.600 because he was nearsighted and that is got to do with sensitivity, how well does the screen test for 00:31:53.600 --> 00:31:55.300 presence of disease? NOTE Treffsikkerhet: 90% (H?Y) 00:31:55.300 --> 00:32:03.700 So, this guy/woman is like a specific test that is not very sensitive. So sensitivity was 00:32:03.700 --> 00:32:08.200 how well does he screen for the presence of disease, well he's missing quite a few deer there. 00:32:08.200 --> 00:32:15.900 But how well does the screen test for absence of the disease? 00:32:15.900 --> 00:32:15.400 Well, that was pretty good. NOTE Treffsikkerhet: 91% (H?Y) 00:32:15.400 --> 00:32:23.900 Now if you go to the other guy, a deer hunter may see all the deer, how well does the screening test 00:32:23.900 --> 00:32:30.100 for presence of disease. So he's good at that. But they also mistakenly shoots a few cows, false 00:32:30.100 --> 00:32:35.900 positives. That has got to do with how well does the screen test for absence of disease. He's like a 00:32:35.900 --> 00:32:39.400 sensitive test, but not very specific. NOTE Treffsikkerhet: 81% (H?Y) 00:32:39.400 --> 00:32:49.400 Okay, again, this was sensitivity and specificity. I'm actually drilling it in, 00:32:49.400 --> 00:32:55.400 isn't it? But just to keep it in your mind, sensitivity got to do with presence of the disease and 00:32:55.400 --> 00:32:59.400 specificity has got to with absence of the disease. How well are they doing it? NOTE Treffsikkerhet: 88% (H?Y) 00:32:59.400 --> 00:33:08.900 Few more slides and then you run for coffee. Okay? Snout. It's about sensitive test, sensitivity of a 00:33:08.900 --> 00:33:14.400 true positive rate --also that's it a different way of calling it-- of a test, proportion of people 00:33:14.400 --> 00:33:21.000 with the disease who will have a positive results. In other words, is the test 00:33:21.000 --> 00:33:23.900 correctly identify the patient with the disease. NOTE Treffsikkerhet: 87% (H?Y) 00:33:23.900 --> 00:33:31.300 That means if a test is a hundred percent sensitive, it will identify all patients who have the disease. 00:33:32.500 --> 00:33:43.700 Is it 90% sensitive, you will identify 90%, but you will miss 10%. Okay, a highly sensitive test can 00:33:43.700 --> 00:33:50.700 be used for ruling out a disease. So if a person has a negative result, that means the acronym they use is 00:33:50.700 --> 00:33:59.200 high sensitivity, rule out the disease, Snout. So in a highly sensitive test is very useful 00:33:59.200 --> 00:34:02.800 for ruling out a disease. So if the person has got a NOTE Treffsikkerhet: 91% (H?Y) 00:34:02.800 --> 00:34:07.150 negative result, he is pretty sure he doesn't have the disease. NOTE Treffsikkerhet: 70% (MEDIUM) 00:34:07.150 --> 00:34:18.900 But SPIN, specificity of the test 00:34:18.900 --> 00:34:25.400 is the proportion of people without the disease who will have a negative result. 00:34:25.400 --> 00:34:33.900 That means a test that has got hundred percent specificity, will identify hundred percent of patients 00:34:33.900 --> 00:34:37.100 who do not have the disease. NOTE Treffsikkerhet: 76% (H?Y) 00:34:37.100 --> 00:34:44.899 A test has 90% specifically will identify 90% of patients who do not have the disease and miss 10%. NOTE Treffsikkerhet: 73% (MEDIUM) 00:34:44.899 --> 00:34:51.300 Test with the highest specificity are most useful when the result is positive, a highly specific 00:34:51.300 --> 00:34:58.500 tests can be useful for ruling in patients who have a certain disease. The acronym is SPIN, high 00:34:58.500 --> 00:35:09.800 specificity rule in a disease. So that is Snout and Spin approach. To quantify the diagnostic tests 00:35:09.800 --> 00:35:15.200 ability, snout rule disease out, very sensitive test. NOTE Treffsikkerhet: 74% (MEDIUM) 00:35:15.200 --> 00:35:24.700 Plus result is not helpful. But a minus result is useful. Spin rule disease in, very specific test. 00:35:24.700 --> 00:35:32.600 A minus result is not very helpful, but the plus result is very helpful. That is where I'm going to stop now recording. 00:35:32.600 --> 00:35:36.100 Let's see how we stop that thing. NOTE Treffsikkerhet: 91% (H?Y) 00:35:39.300 --> 00:35:49.000 Okay, so how to calculate specificity and sensitivity. Of course the rest is also interesting, but 00:35:49.000 --> 00:35:51.400 let's first do this. NOTE Treffsikkerhet: 79% (H?Y) 00:35:51.500 --> 00:36:00.600 And I'll give you an example. So assume we've got a hundred people, hundred children, hundred God 00:36:00.600 --> 00:36:08.200 knows what. Hundred had caught a disease or a condition, 900 are typically developing, no problem 00:36:08.200 --> 00:36:09.700 whatsoever. NOTE Treffsikkerhet: 82% (H?Y) 00:36:09.700 --> 00:36:18.300 This is how I populate by table, my cross tabs. I've got the true characteristics and I've got my screen. 00:36:18.300 --> 00:36:25.300 This is, you know, the gold standard, my criteria. And this is yes disease, no disease. 00:36:25.300 --> 00:36:30.200 And as I said already, this is condition, whatever it is, and this is typically developing, the same 00:36:30.200 --> 00:36:37.100 thing. Positive means it's disease. Now, I said a thousand, thousand goes into that cell, that's your total. 00:36:37.100 --> 00:36:40.300 Hundred have a disease. I'm talking about NOTE Treffsikkerhet: 88% (H?Y) 00:36:40.300 --> 00:36:46.300 the gold standard. Hundred have a disease, 900 not. Meaning, here is my 100, it's a total as 00:36:46.300 --> 00:36:55.000 identified by my criteria in my reference test, here is the 900 who do not have the disease. 00:36:55.000 --> 00:37:02.300 The sum, of course, is a thousand. Now if I give you these other data, if these are the results 00:37:02.300 --> 00:37:10.250 from my screen... Meaning, I see now already on my NOTE Treffsikkerhet: 71% (MEDIUM) 00:37:10.250 --> 00:37:15.700 reference test and then I screen them, I compare them and I could populate all these cells. NOTE Treffsikkerhet: 90% (H?Y) 00:37:15.700 --> 00:37:22.400 Now based on this cross tabs, we can of course identify the true positives and the true 00:37:22.400 --> 00:37:23.950 negatives. NOTE Treffsikkerhet: 91% (H?Y) 00:37:23.950 --> 00:37:33.100 And we've got a formula and it's relatively easy then to if we add all the names, here are the true 00:37:33.100 --> 00:37:40.400 positives and true negatives, false negatives, false positives. And you use the 00:37:40.400 --> 00:37:49.000 formula, then that is sensitivity, that is 80 true positives divided by the sum -- 80, you can also 00:37:49.000 --> 00:37:54.300 use that one -- Then a hundred and if I look at specificity, NOTE Treffsikkerhet: 88% (H?Y) 00:37:54.300 --> 00:38:03.000 I've got the true negatives, 800, that we are divided by the sum of true negatives and false positives. 00:38:03.000 --> 00:38:09.650 The sum of those is of course, bit easier is just that one, this cell divided by that cell. 00:38:09.650 --> 00:38:20.600 And I've got 80% and 89%. Again, sensitivity, how well this is screen for presence. Specificity, how 00:38:20.600 --> 00:38:24.350 well this is screen test for absence. So this is NOTE Treffsikkerhet: 75% (MEDIUM) 00:38:24.350 --> 00:38:30.100 because if you look at presence that is here, the true positives, sensitivity is about true 00:38:30.100 --> 00:38:37.500 positives and specificity true negatives. How well is it doing? And you can see rather high 00:38:37.500 --> 00:38:42.900 percentage is I would be quite pleased with numbers like that. That is pretty good. I would use 00:38:42.900 --> 00:38:52.500 that one. Okay, I'll give you another example about populating cross tabs. So sensitivity 00:38:52.500 --> 00:38:54.400 and specificity of a low-cost NOTE Treffsikkerhet: 87% (H?Y) 00:38:54.400 --> 00:38:59.900 screening protocol for identifying children at risk for language disorders. That's what we're 00:38:59.900 --> 00:39:07.000 looking at. And the objectives were to compare the diagnostic accuracy of a low-cost screening test for 00:39:07.000 --> 00:39:13.700 identifying children at risk for language disorders with that specific language assessment. That is 00:39:13.700 --> 00:39:20.000 typically what you are doing, you've got a screen low-cost, and you want to know compared with a gold 00:39:20.000 --> 00:39:24.300 standard, in this case a full language assessment. That is why you use screens. 00:39:25.500 --> 00:39:32.600 Here is my cross tabs, here is again that's always the same, never by the way swap columns or 00:39:32.600 --> 00:39:40.000 anything because it will get you totally confused, always populate the cross step this way. 00:39:40.000 --> 00:39:47.400 Otherwise you really get confused with cells and formulas and all of that. So again green is good 00:39:47.400 --> 00:39:53.400 and red is bad. There is my speech language assessment. My gold standard in this case. And this is 00:39:53.400 --> 00:39:55.550 the low-cost screen. NOTE Treffsikkerhet: 85% (H?Y) 00:39:55.550 --> 00:40:01.700 So that means you've got a language disorder or you do not have a language disorder based on your 00:40:01.700 --> 00:40:10.450 gold standard. Same thing. Now, this is the text, the study included a thousand Brazilian kids, 00:40:10.450 --> 00:40:17.100 aged between zero and five years. All kids were screened, Asha, and assessed using a speech language 00:40:17.100 --> 00:40:25.300 assessment. Whatever it is. The ABFW is currently the most reliable test for Brazilian. NOTE Treffsikkerhet: 82% (H?Y) 00:40:25.300 --> 00:40:33.700 So this is my a ABFW, that's my gold standard. In total we're talking about... 00:40:33.700 --> 00:40:40.100 This is of course that cell, thousand in total, 00:40:40.100 --> 00:40:48.600 108 kids fail the screening test, a total of 120 children failed the ABFW test, 00:40:48.600 --> 00:40:56.500 99 children failed both the screening test and the ABFW. So now you're going to try how to populate this test. NOTE Treffsikkerhet: 91% (H?Y) 00:40:56.500 --> 00:40:59.300 Have a look at it. NOTE Treffsikkerhet: 80% (H?Y) 00:40:59.400 --> 00:41:10.200 And so that is my ABFW and this is how it looks if I empty it. Now, I want to put those numbers 00:41:10.200 --> 00:41:17.900 in these cells, in these crosstabs. That means if I look that thousand, these kids go to 00:41:17.900 --> 00:41:25.800 that corner. Yeah, that's the total population. The 108 that failed go here. They failed the 00:41:25.800 --> 00:41:32.050 screen. This is the screen. They failed because they identified, there is my 108. NOTE Treffsikkerhet: 75% (MEDIUM) 00:41:32.050 --> 00:41:36.700 Then the next cell that I've got is, I've got 120 00:41:36.700 --> 00:41:46.700 kids failing the ABFW, there we are, the 120. And then it says, 90 failed both the 00:41:46.700 --> 00:41:52.900 language test and ASHA. That's the information I've got. But based on this information I can 00:41:52.900 --> 00:41:58.700 complete the full crosstabs. I know these numbers, because the total should be 108, 00:41:58.700 --> 00:41:59.899 so that's the nine. NOTE Treffsikkerhet: 85% (H?Y) 00:41:59.899 --> 00:42:03.399 The total should be 120, so that's a 28. NOTE Treffsikkerhet: 83% (H?Y) 00:42:03.399 --> 00:42:09.000 Well, the next cell I can complete then it's that cell because I know the difference is sum 00:42:09.000 --> 00:42:14.800 should be a thousand, this cell as well, sum should be thousand, there they are. So I've got 00:42:14.800 --> 00:42:21.600 now, these cells, that means it's a piece of cake to complete the last cell. Based on these data 00:42:21.600 --> 00:42:27.600 I can determine any diagnostic performance. So you just need some data, you fill it in 00:42:27.600 --> 00:42:31.600 in your crosstabs, and you can determine diagnostic performance. NOTE Treffsikkerhet: 90% (H?Y) 00:42:31.600 --> 00:42:38.850 This is the same crosstabs. Same numbers. I just added another letters to positive Etc in it. 00:42:38.850 --> 00:42:43.400 Exactly same crosstabs. Here are my formulas. NOTE Treffsikkerhet: 90% (H?Y) 00:42:43.700 --> 00:42:53.700 And very easy again I completed the true positives are always on top. 00:42:53.700 --> 00:43:03.000 So 99 divided by the sum of this column is sensitivity. And the other one is true negatives is a lot 872, 00:43:03.000 --> 00:43:12.600 divided by 88 and look at these numbers, that is high. Yeah, that is really high sensitivity and specificity. 00:43:12.600 --> 00:43:20.800 That is a fantastic screen. I would absolutely use the screen. So, again, sensitivity, how well does it 00:43:20.800 --> 00:43:26.100 for presence. Because they're talking about those in top. And the other one for absence because we're 00:43:26.100 --> 00:43:32.900 talking about these on top. Okay, so that's a very good screen. 00:43:32.900 --> 00:43:42.100 This is the results. So if you say, %82 %82.5 sensitivity, that means... NOTE Treffsikkerhet: 83% (H?Y) 00:43:43.900 --> 00:43:51.600 And that is specificity, I'm not going to read all. So that means both are fantastic. Yes, if you 00:43:51.600 --> 00:43:56.800 would have many false negatives, you would have many missing patient, but this is actually very good 00:43:56.800 --> 00:44:04.800 and you don't have too many false negatives. So this is actually good sensitivity, good specificity. NOTE Treffsikkerhet: 88% (H?Y) 00:44:05.500 --> 00:44:11.000 Okay, I'll give you another example. Is this clear so far? NOTE Treffsikkerhet: 76% (H?Y) 00:44:12.500 --> 00:44:15.750 And I'll give you another example. NOTE Treffsikkerhet: 90% (H?Y) 00:44:15.750 --> 00:44:22.700 Have a look at this one and see if you can complete it. Just take a moment to 00:44:22.700 --> 00:44:25.300 understand how it works. NOTE Treffsikkerhet: 87% (H?Y) 00:44:37.100 --> 00:44:41.900 Could everybody complete this cross tabs? NOTE Treffsikkerhet: 91% (H?Y) 00:44:43.000 --> 00:44:46.600 Anybody running into trouble? NOTE Treffsikkerhet: 88% (H?Y) 00:44:48.200 --> 00:44:55.900 I don't hear anything. So I expect it's going well. So that means if I go to the cell, this is how 00:44:55.900 --> 00:45:02.900 you complete. Of course, 240 and 60 must be 300. This is 300 and that means 00:45:02.900 --> 00:45:10.150 340 and so you can complete this relatively easy. Next step that you can do is you use your formulas 00:45:10.150 --> 00:45:17.300 Keep everything the same order, and there is my sensitivity, that means 240 divided by the 00:45:17.300 --> 00:45:18.700 total and the other one is 600 NOTE Treffsikkerhet: 91% (H?Y) 00:45:18.700 --> 00:45:24.950 divided by the total of that column total, and again, beautiful. NOTE Treffsikkerhet: 84% (H?Y) 00:45:24.950 --> 00:45:32.600 Again, now I knew it that I was going to do it quicker than I actually planned, but let me stop for now. 00:45:32.600 --> 00:45:40.800 And let me stop the recording cord. NOTE Treffsikkerhet: 71% (MEDIUM) 00:45:40.800 --> 00:45:51.850 So cut-offs, different story. Cut-offs, where are we? So what is good enough and that's got to do with, 00:45:51.850 --> 00:45:58.900 we did a systematic review and we talked about a sensitivity and specificity of 70 versus 60 percent. 00:45:58.900 --> 00:46:07.400 Now I said that time already, that is arbitrary. Because what about 71 or 72? When do you 00:46:07.400 --> 00:46:11.350 decide that a cut-off is good enough? NOTE Treffsikkerhet: 81% (H?Y) 00:46:11.350 --> 00:46:20.150 So cut-offs are arbitrary and it's 60, 70 percent is kind of what is common, but it is not black and white. 00:46:20.150 --> 00:46:27.500 So, let's have a talk about cutoffs. So you've got a frequency and condition, and on 00:46:27.500 --> 00:46:33.700 the left is healthy, no languages disorder on the right is language disorder. Now, this is the healthy 00:46:33.700 --> 00:46:41.250 distribution and there is my disease or my language distribution and this is NOTE Treffsikkerhet: 89% (H?Y) 00:46:41.250 --> 00:46:50.500 painful bit. Where do I use a cut-off? When do I decide that something that is belongs to the 00:46:50.500 --> 00:46:55.800 healthy distribution or a certain score belongs to the disease distribution? Because there's an 00:46:55.800 --> 00:47:07.800 overlap. So cut-offs are important. Now these are the false positives. This little corner here, 00:47:07.800 --> 00:47:10.950 this triangle are the false positives. NOTE Treffsikkerhet: 84% (H?Y) 00:47:10.950 --> 00:47:17.000 Those are the persons without the disease and a positive test. NOTE Treffsikkerhet: 83% (H?Y) 00:47:17.100 --> 00:47:24.200 Now, that makes on the other side. I've got the false negatives, meaning, if I were to use that 00:47:24.200 --> 00:47:31.050 cutoff, then these persons that actually have the disease, actually belong to the red distribution, 00:47:31.050 --> 00:47:40.700 but I decide well it's unlikely, so I consider them to be as a negative test. So you use this 00:47:40.700 --> 00:47:46.400 cut off to decide. Well, even though this little bit is actually, the red distribution, NOTE Treffsikkerhet: 91% (H?Y) 00:47:46.400 --> 00:47:52.200 I add them to the green one, which is an error actually. And this is the bit that I miss from 00:47:52.200 --> 00:47:57.900 the green distribution, false positives, but somewhere I need to draw the line. 00:47:57.900 --> 00:48:06.700 Now, that is the complexity is like, okay, so where to decide to draw that line. 00:48:06.700 --> 00:48:14.400 Because now if I draw it here then these are my false negatives, but if I move that one 00:48:15.350 --> 00:48:22.600 to there, then these are my false positives. 00:48:22.600 --> 00:48:33.750 You can see if I move that line from this, it goes from there to there, 00:48:33.750 --> 00:48:40.800 this increases of course a lot and false negatives are smaller. The bigger the overlap here, the more 00:48:40.800 --> 00:48:44.450 difficult it is to use your cutoff of wisely. NOTE Treffsikkerhet: 91% (H?Y) 00:48:44.450 --> 00:48:52.000 So different cut-offs, different cut points yield different sensitivities and specificities. 00:48:52.000 --> 00:48:58.450 That is actually the most important thing that you can see happening here. Now I'm going to give you an example. NOTE Treffsikkerhet: 81% (H?Y) 00:48:58.450 --> 00:49:04.000 So we've got subjects are screened and I've got a questionere 00:49:04.000 --> 00:49:10.300 and I use different cut-offs, different cut points. So on the left, 00:49:10.300 --> 00:49:18.650 I use a low cut point and on the right, I use a high cut point. That means if I use a low cut point, NOTE Treffsikkerhet: 87% (H?Y) 00:49:18.650 --> 00:49:26.100 then I get this, so you got cancer. So I've got here disease and no disease, disease, no disease. 00:49:26.100 --> 00:49:32.400 This is my cut-off and here is my cut-off there, which means of course in real life they are 00:49:32.400 --> 00:49:36.900 mixed, the two groups, disease and not disease. But this is easier. It looks a little bit like a 00:49:36.900 --> 00:49:44.250 cross tabs. So what I do, I can now, I've got my cut off, I've got that 3 goes there, 00:49:44.250 --> 00:49:48.800 this 17 goes there. NOTE Treffsikkerhet: 77% (H?Y) 00:49:48.800 --> 00:49:56.500 So you can see that if I compare it and do the color coding, then you can see where 00:49:56.500 --> 00:50:04.600 they go. So, this is my cut-off. So, anything above my screen says is identified as having the 00:50:04.600 --> 00:50:11.900 disease and this is below. Now if I compare that to my gold standard, you can see because actually 00:50:11.900 --> 00:50:18.750 in real life, these do not have the disease. So 14 should not have the disease. NOTE Treffsikkerhet: 83% (H?Y) 00:50:18.750 --> 00:50:26.900 17 are correctly identified, my true positives and I've got here six, my true 00:50:26.900 --> 00:50:34.600 negatives. But this is pretty bad. You can see sensitivity if I calculate that one, that is 00:50:34.600 --> 00:50:42.600 pretty good. But my specificity if I were to calculate that one is only 40%. Now if I use a high 00:50:42.600 --> 00:50:48.750 cut-off like that, then these numbers here I've got only have five true positives and NOTE Treffsikkerhet: 79% (H?Y) 00:50:48.750 --> 00:50:56.200 I have a lot of true negatives, but I still have got some false positives there and these other false ones. 00:50:56.200 --> 00:51:00.500 So you put them in the numbers, you calculate, you do your trick with sensitivity and 00:51:00.500 --> 00:51:12.200 specificity and then you get a low sensitivity but a very high specificity. So changing these 00:51:12.200 --> 00:51:18.950 cut-offs gives you totally different calculations, different sensitivity and specificity. NOTE Treffsikkerhet: 88% (H?Y) 00:51:18.950 --> 00:51:25.100 So, different cut points, you'll have different sensitivities and specificities. That's for sure. 00:51:25.100 --> 00:51:31.900 And the cutoff determines how many subjects will be considered as having the disease. NOTE Treffsikkerhet: 72% (MEDIUM) 00:51:32.400 --> 00:51:42.900 The cut point that identifies more true negatives will also identify more false negatives and that 00:51:42.900 --> 00:51:51.600 you can see the true negatives are those. So a cut point that has more true negatives will also 00:51:51.600 --> 00:52:00.450 identify more false negatives. That goes hand in hand. You can't change that. NOTE Treffsikkerhet: 70% (MEDIUM) 00:52:00.450 --> 00:52:07.600 But the cut point that identifies more true positives... There we go, way more true positives 00:52:07.600 --> 00:52:17.000 compared to that one, will also identify more false positives, that's life, nothing to do about. 00:52:17.000 --> 00:52:19.250 So that's the consequence. NOTE Treffsikkerhet: 84% (H?Y) 00:52:19.250 --> 00:52:26.500 So where to draw the line? That is the question. What can we do about this? NOTE Treffsikkerhet: 91% (H?Y) 00:52:26.800 --> 00:52:34.800 Well, if the diagnostic test is expensive or invasive, we said already we want to minimize the false 00:52:34.800 --> 00:52:44.800 positives. We use a cut point with high specificity. Remember, specificity how well this screen 00:52:44.800 --> 00:52:53.800 test for absence of disease. If the penalty for missing this is high, for instance you die 00:52:53.800 --> 00:52:56.649 if they don't identify you, you want to maximize NOTE Treffsikkerhet: 91% (H?Y) 00:52:56.649 --> 00:53:05.000 the true positives, you need to use a cut point with high sensitivity. There we go, sensitivity, 00:53:05.000 --> 00:53:13.200 you want to be sure you test well for presence of disease, so that will influence where you will use 00:53:13.200 --> 00:53:15.200 your cut-offs. NOTE Treffsikkerhet: 86% (H?Y) 00:53:15.200 --> 00:53:26.000 You need to balance therefore severity of false positives against false negatives. Now, this 00:53:26.000 --> 00:53:35.600 is a Roc Curve. What is a Roc Curve? A Roc Curve is, as you can see, sensitivity as function of hundred minus 00:53:35.600 --> 00:53:42.500 specificity. This is a hundred because this is percentage. You can do sensitivity and specificity, 00:53:42.500 --> 00:53:45.600 usually see it in percentages. NOTE Treffsikkerhet: 84% (H?Y) 00:53:45.600 --> 00:53:51.400 And that's why they use a 100. Sometimes they use to divide by 100 but 00:53:51.400 --> 00:54:00.100 percentages is most common. Now a Roc Curve stands for Receiver Operating Characteristics. I did 00:54:00.100 --> 00:54:07.200 not think of that, it's a fact, Roc. And it's an evaluation classifier performance, that's what it is. 00:54:07.200 --> 00:54:15.649 Well, what does that need mean? It decides on whether your test is actually useful or not. 00:54:15.649 --> 00:54:24.800 So it's a trade-off between sensitivity and specificity. Again, the x-axis is 100 minus specificity, 00:54:24.800 --> 00:54:30.500 and that is just sensitivity. Okay. I'm going to give you examples because what does this mean 00:54:30.500 --> 00:54:35.000 actually and how to interpret it. But this is what you do. NOTE Treffsikkerhet: 89% (H?Y) 00:54:35.000 --> 00:54:39.100 I'm going to give you an example. NOTE Treffsikkerhet: 86% (H?Y) 00:54:39.100 --> 00:54:47.200 So you've got, these are the threshold. Remember, we can change thresholds. We did that with the 00:54:47.200 --> 00:54:56.800 previous examples. So here is my Roc Curve. If I complete these numbers, I add these there. 00:54:56.800 --> 00:55:05.850 If I've got a threshold of one, I get a sensitivity of zero and one of a specificity of 100, 00:55:05.850 --> 00:55:09.500 that's this point. If I've got a NOTE Treffsikkerhet: 81% (H?Y) 00:55:09.500 --> 00:55:21.200 sensitivity of 50, specificty is 75 so that would make a 100 minus specificty is 25. 00:55:21.200 --> 00:55:31.300 So then you've got 50, 25, this is sensitivity 75, specificity and 00:55:31.300 --> 00:55:39.750 100 minus specificity, then you've got 75, 50. 75, 50, 100 and 0, 100 and 0 NOTE Treffsikkerhet: 91% (H?Y) 00:55:39.750 --> 00:55:47.000 and you draw the lines. Now you've got a curve and you can see the 00:55:47.000 --> 00:55:51.900 different cut-offs and then you can decide well, okay, how does that look? What we're going to do 00:55:51.900 --> 00:55:58.300 with it, how to interpret? But first, this is how you create a Roc Curve. Now, what does this mean? 00:55:58.300 --> 00:56:06.900 Okay, here we go. I've got a Roc Curve of the random classifier, if he's here, it cuts two 00:56:09.350 --> 00:56:17.000 symmetric bits and pieces, then it is a total random bit. It doesn't matter. On the right is 00:56:17.000 --> 00:56:27.300 perfect classifier. That is what you want. Now in real live this is what happens. So, if I am going 00:56:27.300 --> 00:56:35.400 to have a look at that one, that is random separation, whatever cut-off you give it doesn't 00:56:35.400 --> 00:56:39.399 help anything. If I use that one, NOTE Treffsikkerhet: 89% (H?Y) 00:56:39.399 --> 00:56:47.800 that is good separation. Fantastic. It goes into the corner. Good sensitivity and specificity. If I 00:56:47.800 --> 00:56:55.400 go to that one, the green one, that is reasonable poor separation because it's actually getting more 00:56:55.400 --> 00:57:01.400 close to this line. And that one is reasonable and very often you've got something like that. 00:57:01.400 --> 00:57:05.500 This is fantastic. If you've got the Roc Curve like that, man, you've got beautiful 00:57:05.500 --> 00:57:09.450 sensitivity and specificity. Now, if I link that to this, NOTE Treffsikkerhet: 73% (MEDIUM) 00:57:09.450 --> 00:57:12.200 this distributions again, NOTE Treffsikkerhet: 85% (H?Y) 00:57:12.200 --> 00:57:21.000 then area under the curve. Here we go. So, this is my Roc Curve. The suppose this that line, then the 00:57:21.000 --> 00:57:27.400 area under the curve here it's 50%. I've got 50% below and I've got 50% percent. So 0.5, 00:57:27.400 --> 00:57:35.100 this is 0.5. Now area here, you've got more under the curve, 0.6 and here 0.7, the more under 00:57:35.100 --> 00:57:42.000 the curve, the better it is. Now if it is at random that means your distributions are NOTE Treffsikkerhet: 78% (H?Y) 00:57:42.000 --> 00:57:43.800 totally overlapping. NOTE Treffsikkerhet: 85% (H?Y) 00:57:43.800 --> 00:57:50.200 If I'm looking here, I've got here my healthy ones and my condition ones. They are kind of there's 00:57:50.200 --> 00:57:58.600 an overlap. This one has got minimal overlap. So I've got my cut-off that is beautiful. So this does 00:57:58.600 --> 00:58:07.050 not distinguish at all between having the disease or not having the disease. This is kind of okay. 00:58:07.050 --> 00:58:13.100 This is much, much, much better. You've got only minimal overlap with your screen. NOTE Treffsikkerhet: 73% (MEDIUM) 00:58:13.100 --> 00:58:28.900 That is really a good one. I would absolutely go for this screen. Now, that one is ideal. 00:58:28.900 --> 00:58:36.250 That one is a nightmare and that one is usually how it looks like. NOTE Treffsikkerhet: 89% (H?Y) 00:58:36.250 --> 00:58:45.900 Okay, now I'm really going to stop because I'm going way too fast. I know that.