WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 84% (H?Y) 00:00:01.700 --> 00:00:10.600 There we are. Okay, instrument development. So today is about how to develop a measure and we will 00:00:10.600 --> 00:00:18.400 have three different topics. We will talk about what is measurement, 00:00:18.400 --> 00:00:24.400 what is the process of instrument development and we will talk briefly about how to select a screen or a measure. 00:00:24.400 --> 00:00:29.500 We talked about that, it's kind of a repeat that I just summarize again. 00:00:29.500 --> 00:00:32.250 So, to start with measurement, NOTE Treffsikkerhet: 81% (H?Y) 00:00:32.250 --> 00:00:40.500 first of all, why would one measure, why even bother? Because you need an accurate measurement, you need 00:00:40.500 --> 00:00:47.300 something that's much more valid than all these expert opinions. Expert opinions can be biased, there are 00:00:47.300 --> 00:00:53.700 usually biased. It's a one person only and even though it may be an expert, it's still not an 00:00:53.700 --> 00:00:59.700 objective measurement. We want to determine the outcome ourselves. Now, of course 00:00:59.700 --> 00:01:01.900 there's always a problem, some phenomena, NOTE Treffsikkerhet: 81% (H?Y) 00:01:01.900 --> 00:01:09.200 some domains are very difficult to measure. Quality of life, yes we've got quality of life 00:01:09.200 --> 00:01:16.400 measures but for example measuring the height of a person is easier than 00:01:16.400 --> 00:01:24.800 measuring shyness. So some things are very difficult to measure but still we give it a try. 00:01:24.800 --> 00:01:29.400 Now if you've got your research going, you've got your idea, you design it, and then you need to think 00:01:29.400 --> 00:01:31.949 about what measures are you going NOTE Treffsikkerhet: 91% (H?Y) 00:01:31.949 --> 00:01:38.600 to use to objectify the outcome, to get your results. So first, you've got your research design, then you 00:01:38.600 --> 00:01:43.500 need to think about your measures. And when you think about measures you immediately need to think 00:01:43.500 --> 00:01:49.200 about the psychometrics of your measures. And we talked about that, that is about the psychometric 00:01:49.200 --> 00:01:56.200 properties of your measures. So we've got these domains, validity, reliability, responsiveness. 00:01:56.200 --> 00:02:01.950 You don't use a measure with poor psychometrics. So it's all linked to each other. NOTE Treffsikkerhet: 91% (H?Y) 00:02:01.950 --> 00:02:08.800 Now, pictures like this are not really new. You must have seen them before. So, we all talk 00:02:08.800 --> 00:02:14.550 about evidence-based practice and in the middle you've got evidence based practice and that is 00:02:14.550 --> 00:02:22.600 actually based on science, scientific knowledge, on expertise, clinical experience, 00:02:22.600 --> 00:02:28.950 the circumstances of the target population and practice, all of that feeds into evidence based practice. 00:02:28.950 --> 00:02:31.800 Now, measurement, is the foundation. NOTE Treffsikkerhet: 74% (MEDIUM) 00:02:31.800 --> 00:02:37.700 Without objective measurement, how do you want to know what is best practice? And it gives 00:02:37.700 --> 00:02:43.600 clinicians and researchers and educationalists and anybody who you want to fill in there, you give 00:02:43.600 --> 00:02:52.300 them a tool to accurately observe, record, describe, objectify, determine, etc, the outcome. 00:02:52.300 --> 00:03:00.750 That's what you want to do. That's why you need good measures. Now, what is measurement? 00:03:00.750 --> 00:03:01.850 It's a process NOTE Treffsikkerhet: 84% (H?Y) 00:03:01.850 --> 00:03:10.300 of observing and recording observations. And you do that by assigning numbers to a particular 00:03:10.300 --> 00:03:16.800 attribute. Particular attributes of a set of things and you are interested in the relationship 00:03:16.800 --> 00:03:23.800 between numbers that reflect the relationship of the attributes being measured. 00:03:23.800 --> 00:03:30.750 So what you do, you observe something, you want to assign a number to the observation, 00:03:31.750 --> 00:03:38.400 very high or a low number, and you're going to have a look at relationships between those numbers and the 00:03:38.400 --> 00:03:45.200 attribute. And you collect all your data and that is part of a research effort, part of your 00:03:45.200 --> 00:03:53.800 thesis or whatever you are going to do. Now, why does it matter? The very you choose to measure 00:03:53.800 --> 00:03:59.700 something must be relevant, accurate and meaningful. And this is actually a little bit 00:03:59.700 --> 00:04:03.850 like an open door, but of course it should be relevant. NOTE Treffsikkerhet: 90% (H?Y) 00:04:03.850 --> 00:04:06.700 But sometimes measures have got items that are not relevant and that is a problem. 00:04:06.700 --> 00:04:12.400 Because if you've got only a total score and too many irrelevant items, that is 00:04:12.400 --> 00:04:18.200 actually adding noise to your outcome. So, it should be relevant to the construct, the concept, the 00:04:18.200 --> 00:04:24.500 domain you want to measure, to what we want to know. It should be relevant for reality. It should be 00:04:24.500 --> 00:04:30.300 accurate. So you don't want error, you want to minimize any error because that is also adding noise 00:04:30.300 --> 00:04:31.900 to your data. NOTE Treffsikkerhet: 74% (MEDIUM) 00:04:31.900 --> 00:04:39.300 It should be precise. It should be meaningful. It should tell us something beyond abstract relationship 00:04:39.300 --> 00:04:46.600 between numbers, how to interpret it, give it a educational clinical meaning. And that has got also 00:04:46.600 --> 00:04:54.350 to do with interpretability, giving meaning to these data, to these numbers. NOTE Treffsikkerhet: 86% (H?Y) 00:04:54.350 --> 00:05:03.000 Now, we talked already about differences between discrete versus continuous data. This is 00:05:03.000 --> 00:05:09.400 just a repeat, you've got discrete data, they can only take certain values, 00:05:09.400 --> 00:05:15.600 you can't split them unit of the measurement, whereas is continuous data can take any value within a 00:05:15.600 --> 00:05:21.700 certain range. So examples of discrete data were numbers of students, or you use a dice. 00:05:21.700 --> 00:05:24.900 It can't be three and a half. NOTE Treffsikkerhet: 78% (H?Y) 00:05:24.900 --> 00:05:32.700 It's 1 to 6 or if you've got to dices, it's 2 to 12. So continuous data, that can be height, 00:05:32.700 --> 00:05:39.500 that can be time. These are differences between discrete and continuous data, but we also discussed 00:05:39.500 --> 00:05:43.900 levels of measurement and we said, you've got different levels of measurement. You've got on 00:05:43.900 --> 00:05:50.000 topthe nominal ones, and if you talk about two categories, then you call it dichotomous. 00:05:50.000 --> 00:05:54.250 If you only talk about men and women, that is a dichotomous variable. NOTE Treffsikkerhet: 82% (H?Y) 00:05:54.250 --> 00:06:01.900 Ordinal, interval and ratio. Now the differences were, nominal you can sum them up. You can do 00:06:01.900 --> 00:06:08.200 anything with it. It's attributes. Its could be colors. It could be professions, but you can't some 00:06:08.200 --> 00:06:15.800 them or divide them. Ordinal, there is a ranking. Interval, that means the distance between the 00:06:15.800 --> 00:06:22.100 different levels is important. And ratio, you've got that 0 existing. So from top, you've got 00:06:22.100 --> 00:06:24.450 things like eye color. If you look at NOTE Treffsikkerhet: 77% (H?Y) 00:06:24.450 --> 00:06:32.300 ordinal, for example patient satisfaction survey, 00:06:32.300 --> 00:06:38.800 very satisfied, a little bit satisfied, not at all satisfied... That is ordinal. There's an order in 00:06:38.800 --> 00:06:45.300 the ranking. There's a ranking. Now interval, that is for instance, temperature, with every degree is 00:06:45.300 --> 00:06:53.000 the same distance. The distance between one to two degrees or 15 to 16 degrees is always the same distance. 00:06:53.000 --> 00:06:54.400 That is interval. NOTE Treffsikkerhet: 90% (H?Y) 00:06:54.400 --> 00:07:03.200 And ratio, there is that absolute zero that makes it ratio, like my bank account. 00:07:03.200 --> 00:07:09.000 Although you could go in the minus. But usually we talk about a absolute zero that is a ratio 00:07:09.000 --> 00:07:15.500 level. You can see, there is this hierarchy on top, the ratio level. That is the highest level of 00:07:15.500 --> 00:07:22.000 measurement and the lowest one is the attributes one, the nominal or dichotomous one. And here, 00:07:22.000 --> 00:07:24.400 again, you've got these examples, the ones we NOTE Treffsikkerhet: 89% (H?Y) 00:07:24.400 --> 00:07:31.000 already discussed and I hope that is clear. I'm sure you've got a lot like this during your math 00:07:31.000 --> 00:07:38.700 classes or your stats. Now, why do we care? Why is measurement important? Well, if you look at 00:07:38.700 --> 00:07:44.600 clinically or educationally, you need to have the ability to label or describe what's 00:07:44.600 --> 00:07:51.900 happening for your clients or students or kids. You need to have something needs to help you 00:07:51.900 --> 00:07:54.350 determine the change. You want to know it if NOTE Treffsikkerhet: 89% (H?Y) 00:07:54.350 --> 00:08:01.400 this is child changing in performance. And you want to be precise in the measurement and you want 00:08:01.400 --> 00:08:05.800 to know which one is most successful for instance if you are comparing different 00:08:05.800 --> 00:08:14.100 interventions. Now, if we look in research, then it's about quantifying phenomena. It is about how 00:08:14.100 --> 00:08:19.600 to communicate between findings, between different studies. You want a common language, and if 00:08:19.600 --> 00:08:24.450 everybody design their own measure, it's difficult to compare. NOTE Treffsikkerhet: 87% (H?Y) 00:08:24.450 --> 00:08:32.900 You need to agree and discuss about how to operationalize a successful outcome. Meaning, what do we 00:08:32.900 --> 00:08:40.400 define as success? And in stats we use for instance, is your finding significant or not, or is it 00:08:40.400 --> 00:08:47.000 only a trend, or maybe not even a trend, and all of that, then it's a study, 00:08:47.000 --> 00:08:53.950 is useful and contributes to evidence-based. So then you can determine what is best practice. NOTE Treffsikkerhet: 88% (H?Y) 00:08:53.950 --> 00:09:00.600 Now, these are frameworks that are out there, this is development framework. So you start on top. 00:09:00.600 --> 00:09:08.600 You have this idea. A concept, conceptualization. You're going to think about test construction. 00:09:08.600 --> 00:09:15.200 How should it look like? How many items, how many subscales? What is the idea? You need to 00:09:15.200 --> 00:09:20.400 try it out to do it in a trial to see whether people understand what you are asking 00:09:20.400 --> 00:09:23.849 them or whether people can complete your survey or they got lost. NOTE Treffsikkerhet: 84% (H?Y) 00:09:23.849 --> 00:09:29.500 You do some item analysis. You use statistics. You're going 00:09:29.500 --> 00:09:35.000 to have a look at are all items being used the way I think they're going to be used, are there 00:09:35.000 --> 00:09:43.900 floor and ceiling effects. Can they discriminate between high and poor performance? All of that and 00:09:43.900 --> 00:09:50.100 then after your statue you may delete a few items because nobody uses them or 00:09:50.100 --> 00:09:53.800 everybody uses only the highest level, like everybody NOTE Treffsikkerhet: 81% (H?Y) 00:09:53.800 --> 00:10:00.200 agrees it does not differentiate between in your population, Etc. And then you go back and you 00:10:00.200 --> 00:10:07.300 try to reconceptualize and you can do the same thing once more. This is in short, development 00:10:07.300 --> 00:10:12.300 framework, but we will get in way more detail. This is just an example. NOTE Treffsikkerhet: 89% (H?Y) 00:10:12.300 --> 00:10:18.900 Now measures can have different functions. If we talk about prognosis we talk about can you measure, 00:10:18.900 --> 00:10:27.700 predict a later outcome. Can it predict that children will end up 00:10:27.700 --> 00:10:37.200 having reading problems, or can it predict that someone will be at risk of that they won't finalize 00:10:37.200 --> 00:10:41.900 their schools, that they leave earlier, all of that, school adherence. NOTE Treffsikkerhet: 73% (MEDIUM) 00:10:41.900 --> 00:10:48.200 So it can determine suitability for a particular intervention. It can determine the amount of intervention 00:10:48.200 --> 00:10:56.200 you may need and it may report on responsiveness to a particular intervention and we discussed that 00:10:56.200 --> 00:11:02.800 responsiveness is whether a measure is sensitive to change or not. So you want to know you've got an 00:11:02.800 --> 00:11:08.800 intervention and can your measure give you feedback on whether there is any change in the condition 00:11:08.800 --> 00:11:12.000 of the child. If you look at a more analytical NOTE Treffsikkerhet: 72% (MEDIUM) 00:11:12.000 --> 00:11:20.200 level then we are looking at for example can we classify subgroups. Can we look at relations 00:11:20.200 --> 00:11:29.200 between factors? Can we distinguish within subject change or subgroup differences? So you need stats here. 00:11:29.200 --> 00:11:36.050 Can you compare groups of interest to other population, subgroups or norms or general 00:11:36.050 --> 00:11:41.950 population? So those are all more analytical questions that uses statistical NOTE Treffsikkerhet: 91% (H?Y) 00:11:41.950 --> 00:11:50.400 analysis for you try to confirm or identify, find the answer to your questions. NOTE Treffsikkerhet: 76% (H?Y) 00:11:50.400 --> 00:11:57.700 Now, if you talk about measures, you talk about psychometrics, in this case we talked 00:11:57.700 --> 00:12:03.250 about the cosmin framework. We said, well, they came up with this International 00:12:03.250 --> 00:12:10.800 consensus based framework in psychometrics. And this was the website and in short we had this 00:12:10.800 --> 00:12:17.500 framework in reliability, validity and responsiveness. We said there are three domains if you talk 00:12:17.500 --> 00:12:19.750 about psychometrics, you need to cover NOTE Treffsikkerhet: 77% (H?Y) 00:12:19.750 --> 00:12:25.500 of all three domains, you can't just say my measure is very reliable. Yeah, and what about validity 00:12:25.500 --> 00:12:30.400 and responsiveness? Or this is very valid. Yeah, that you still need to need to cover the other 00:12:30.400 --> 00:12:37.800 domains. And we also know that the domains have different measurement properties within it. 00:12:37.800 --> 00:12:43.600 So reliability, internal consistency, reliability of a measurement error. There are several 00:12:43.600 --> 00:12:49.400 types of content validity and then there is responsiveness. All of that is 00:12:49.900 --> 00:12:56.100 Psychometrics. That is the whole framework of Cosmin and we discussed this already. And on the 00:12:56.100 --> 00:13:02.800 left here, you can see as we said for interpretability and that one gives qualitative meaning to the 00:13:02.800 --> 00:13:12.100 numbers. So interpretability is important as well. It's just not a psychometric property. 00:13:12.100 --> 00:13:18.650 Let's move on what I would like to focus on. If you are talking about instrument development, 00:13:18.650 --> 00:13:19.650 We are in fact NOTE Treffsikkerhet: 81% (H?Y) 00:13:19.650 --> 00:13:26.900 talking about content validity and content validity by Cosmin was the degree to which the 00:13:26.900 --> 00:13:32.600 content of your instrument is an adequate reflection of what you are measuring. It was the most 00:13:32.600 --> 00:13:38.600 important one because if your instrument fails to measure what you want to measure, because you ask 00:13:38.600 --> 00:13:44.300 the wrong questions, because it's not comprehensive enough, because the children don't 00:13:44.300 --> 00:13:49.700 understand your items, you've got poor content validity. Yeah, content validity NOTE Treffsikkerhet: 90% (H?Y) 00:13:49.700 --> 00:13:55.700 is tstep number one, that is instrument development. Any questions so far? NOTE Treffsikkerhet: 81% (H?Y) 00:13:57.000 --> 00:14:05.900 Melissa: When we're doing our proposal... Renee: I'm going to stop sharing. NOTE Treffsikkerhet: 91% (H?Y) 00:14:08.800 --> 00:14:14.500 So we talked about content validity and we said, well, if you talk about instrument development, you 00:14:14.500 --> 00:14:20.900 talk in fact about content validity because that is why you need to make sure that you are in fact 00:14:20.900 --> 00:14:27.400 measuring what you would like to measure. Now, ocus on instrument development, we talked about what 00:14:27.400 --> 00:14:32.800 is measurement, what type of measurements or type of variables and let's move on to instrument 00:14:32.800 --> 00:14:38.950 development, content validity. So what happens if we want to construct a test? Now, this is out of NOTE Treffsikkerhet: 87% (H?Y) 00:14:38.950 --> 00:14:47.800 Cohen and I want to clarify something. Here's my cursor. This is Cohen 2007. 00:14:47.800 --> 00:14:57.500 You've got Cohan 2018 I believe. So, chapter 19 is called tests. But in your book, it is chapter 20 something, 00:14:57.500 --> 00:15:03.600 but the title should be the same. Yeah, so don't panic if this is wrong. 00:15:03.600 --> 00:15:08.800 It's just the old version of the same chapter. So this comes out of that book. So just NOTE Treffsikkerhet: 88% (H?Y) 00:15:08.800 --> 00:15:14.500 to give you an example like, constructing a test, and what is important? Well, first 00:15:14.500 --> 00:15:22.100 of all, the purpose of the test for answering your questionsbecause you want to know that it is 00:15:22.100 --> 00:15:28.900 measuring the right thing. The type of test, and then we talk about diagnostic achievement, 00:15:28.900 --> 00:15:35.000 attribute, criterion reference, norm-referenced Etc. Now, to give you a little bit more on that, so we 00:15:35.000 --> 00:15:39.650 have norm reference, Criterion reference and domain reference. NOTE Treffsikkerhet: 75% (MEDIUM) 00:15:39.650 --> 00:15:48.100 Norm reference means that it compares students achievement relative to other students achievement. 00:15:48.100 --> 00:15:55.300 So you compare them. Then Criterion is the student needs to fulfill a number of criteria that you 00:15:55.300 --> 00:16:01.700 preset number of criteria. Before you take the test, you already define this is what 00:16:01.700 --> 00:16:08.849 student must do, must meet. And the third one is that you have the domain of a particular field of NOTE Treffsikkerhet: 84% (H?Y) 00:16:08.849 --> 00:16:15.300 area of a subject that is being tested and the tests are selected for full depth and breadth of that 00:16:15.300 --> 00:16:21.600 domain. And the students achievements are computed to yield a proportion of the maximum score. 00:16:21.600 --> 00:16:27.500 So you've got a hundred items covering a certain domain and you say well you should 00:16:27.500 --> 00:16:36.200 at least have sixty percent of all questions approved that the passes a certain percentage. 00:16:36.200 --> 00:16:39.250 So, different ways of constructing a test, NOTE Treffsikkerhet: 79% (H?Y) 00:16:39.250 --> 00:16:47.400 back to what we were. So purposes, the type of test you want to do the objective and that means that 00:16:47.400 --> 00:16:53.600 you use specific terms. So that the content of the test items can be seen to relate to specific 00:16:53.600 --> 00:16:59.200 objective of a curriculum for instance, of our educational curriculum. So it should be linked 00:16:59.200 --> 00:17:05.800 to it, should cover that bit. So what do you want to be tested and what are the test items? 00:17:05.800 --> 00:17:08.500 How does that measure actually look like? NOTE Treffsikkerhet: 85% (H?Y) 00:17:08.500 --> 00:17:15.099 Then we've got a little bit of the construction and that's got to do with item analysis. The items 00:17:15.099 --> 00:17:20.599 need to discriminate between students that perform very well and students that perform not very 00:17:20.599 --> 00:17:26.599 well. You've got a bad test or poor test for instance, if 00:17:26.599 --> 00:17:33.700 all students pass my exam and they've got all A's. Then I did a very bad exam because 00:17:33.700 --> 00:17:38.800 actually, you want the high achievers, you want to discriminate high achievers NOTE Treffsikkerhet: 88% (H?Y) 00:17:38.800 --> 00:17:44.100 from poor achievers. So someone who is really excellent should get an A and someone who's not that 00:17:44.100 --> 00:17:50.000 excellent should maybe get the B instead. If your test does not discriminate between any of that, 00:17:50.000 --> 00:17:59.600 then there is low item discriminability. 00:17:59.600 --> 00:18:07.200 Items should differentiate between levels of difficulty. NOTE Treffsikkerhet: 73% (MEDIUM) 00:18:07.900 --> 00:18:17.000 The format, instructions, you need to have manual. Its layout, is going to be oral, is it written 00:18:17.000 --> 00:18:24.700 instruction? How you're going to do it? Nature of the piloting of the test, validity and reliability, 00:18:24.700 --> 00:18:31.800 all of that is being discussed in more detail within cohan's chapter, but I just can't help 00:18:31.800 --> 00:18:38.100 myself, validity and reliability is one thing, but you know also responsiveness, so he doesn't 00:18:38.100 --> 00:18:38.850 mention it, but should NOTE Treffsikkerhet: 91% (H?Y) 00:18:38.850 --> 00:18:45.650 add it in his next version of the book. And here we are talking about, 00:18:45.650 --> 00:18:51.200 actually I repeated it already, but the manuals of instructions, if you want it more standardized 00:18:51.200 --> 00:18:56.300 then there is a standardized instruction of what to do. Very often, if you take an exam 00:18:56.300 --> 00:19:03.500 it tells you, you've got 10 items, you've got so much time, you need to use, you need to 00:19:03.500 --> 00:19:08.800 achieve or at least so many out of so many schools to pass. You have an idea NOTE Treffsikkerhet: 85% (H?Y) 00:19:08.800 --> 00:19:15.800 off all of the all items are the same weight or what? They give you a little bit of 00:19:15.800 --> 00:19:22.400 information. Now, a lot of things you need to think of when constructing a test. I just 00:19:22.400 --> 00:19:28.600 wanted to link briefly to it because it's your book, your Cohen book, and Cohen also says in the end 00:19:28.600 --> 00:19:35.700 so what to do when planning a test? And I don't think it's that simple. I hope that 00:19:35.700 --> 00:19:38.850 is actually my main topic here, constructing a test NOTE Treffsikkerhet: 78% (H?Y) 00:19:38.850 --> 00:19:45.600 is not simple or constructing a survey is not simple. You need to identify your purposes, 00:19:45.600 --> 00:19:54.000 describe that, test specifications, the content, the layout, the form. All of that. Feasibility is 00:19:54.000 --> 00:20:01.000 important. We are in instrument development and one of our instruments has got over a hundred items 00:20:01.000 --> 00:20:07.400 and that is just the draft measure, but that takes way too much time and it's not feasible to 00:20:07.400 --> 00:20:08.750 implement in NOTE Treffsikkerhet: 90% (H?Y) 00:20:08.750 --> 00:20:14.900 education or clinics. So you need to keep that in mind. So you wanted to be comprehensive your measure at 00:20:14.900 --> 00:20:21.000 the same time you don't want to be overwhelmed and have 200 items because no one will complete your 00:20:21.000 --> 00:20:30.700 survey. Okay, back to instrument development, away from Cohen, but that was just Cohen's chapter. If we 00:20:30.700 --> 00:20:35.800 talk about instrument development, then there are a couple of phases and I'll talk you 00:20:35.800 --> 00:20:38.850 through it. The first phase is if you've got NOTE Treffsikkerhet: 82% (H?Y) 00:20:38.850 --> 00:20:45.000 an area, you've got your research proposal, you want to measure something then you you need to 00:20:45.000 --> 00:20:50.900 decide are there existing measures? So you need to know are there existing 00:20:50.900 --> 00:20:55.800 measures that are good enough or should I develop a new measure? NOTE Treffsikkerhet: 81% (H?Y) 00:20:55.800 --> 00:21:02.300 If you develop a new measure, you need to conduct a so-called development development study. We talk 00:21:02.300 --> 00:21:08.500 about all of this later on. Next you need to conduct a Content validity study. Then you need to 00:21:08.500 --> 00:21:14.900 determine all the psychometrics and that is it and that it gives you your measure. But let's 00:21:14.900 --> 00:21:22.500 go back to where we started. So Phase 1, determine status of existing measures, because why would you 00:21:22.500 --> 00:21:26.100 develop a new measure Which takes a lot of time effort NOTE Treffsikkerhet: 83% (H?Y) 00:21:26.100 --> 00:21:31.500 if maybe there are good measures out there already? So we're going to have a look at 00:21:31.500 --> 00:21:39.000 existing measures. Now, how do you do that? You should probably already know that, there they are. 00:21:39.000 --> 00:21:43.700 You do a systematic literature search. And that's what we're doing at the moment. We are 00:21:43.700 --> 00:21:50.100 looking, for instance measures that caregivers use for children with 00:21:50.100 --> 00:21:55.900 intellectual disabilities. And we are now just conducting a systematic review, only to retrieve the 00:21:56.100 --> 00:22:02.100 measures that are out there. So you go to the databases, you run your search strategies, you 00:22:02.100 --> 00:22:09.300 identify all published measures, and then you need to do a mapping exercise, meaning you need 00:22:09.300 --> 00:22:14.700 to have a look at all the measures and say "This is more about maybe health and this is more 00:22:14.700 --> 00:22:22.400 about education and this is more about social pressure..." So you try to map and group them and see 00:22:22.400 --> 00:22:26.150 what constructs are actually covered by these existing measures. NOTE Treffsikkerhet: 75% (MEDIUM) 00:22:26.150 --> 00:22:33.700 Then you need to do have a look at the methodological study quality and determine what 00:22:33.700 --> 00:22:40.100 is known about psychometric properties of your measure. So this is 00:22:40.100 --> 00:22:46.300 where Cosmin pops in. So, this is actually the Cat. Now remember, the Cat and that is the 00:22:46.300 --> 00:22:52.400 cosmic checklist if we talk about psychometrics and you need to retrieve from the literature, 00:22:52.400 --> 00:22:56.100 do we know anything about reliability, NOTE Treffsikkerhet: 87% (H?Y) 00:22:56.100 --> 00:23:03.600 responsiveness, validity, what do we know about these measures? And that is the best way to decide 00:23:03.600 --> 00:23:09.300 Okay, I've got that measure, I want to use that or you say, well actually either the content is not 00:23:09.300 --> 00:23:16.600 what I want to use or it's not the right target population, or maybe it's just a psychometrically 00:23:16.600 --> 00:23:22.800 poor measure. So then we need to move on and develop a new measure and that brings us to the next stage. 00:23:22.800 --> 00:23:26.100 And next stage or next phase NOTE Treffsikkerhet: 91% (H?Y) 00:23:26.100 --> 00:23:33.600 is about developmental studies. Now, how does that work? That means you conduct a developmental 00:23:33.600 --> 00:23:41.050 study and that is, you need to start by thinking about item generation. So what 00:23:41.050 --> 00:23:48.200 items should be included in your study and usually that you just go to literature. You use 00:23:48.200 --> 00:23:55.700 existing measures, you select items and then you need to also try that in a pilot. You've got your 30 00:23:56.050 --> 00:24:01.200 items or whatever you think should be included there, you need to Pilot that one. Is it working? 00:24:01.200 --> 00:24:07.450 But the items should be relevant, the items and the pilot test should have a look at 00:24:07.450 --> 00:24:14.700 comprehensiveness. Meaning is, are all topics covered. What we just discussed during the break, 00:24:14.700 --> 00:24:20.650 that we said, well if you come up with a survey and you want to trial it, you want to Pilot test it, 00:24:20.650 --> 00:24:26.050 you ask the target, you ask representatives from the target population NOTE Treffsikkerhet: 91% (H?Y) 00:24:26.050 --> 00:24:32.700 is anything missing, is it comprehensive, did we miss certain topics that you think are very important? And the 00:24:32.700 --> 00:24:39.000 other one, does your target population understand your items the way you formulated them? 00:24:39.000 --> 00:24:47.300 So, relevance, comprehensiveness, and comprehensibility. Now, if you look at relevance and we've got 00:24:47.300 --> 00:24:54.600 comprehensiveness, and comprehensibility. All three items are being used by cosmin. 00:24:54.600 --> 00:24:56.050 Then the first one relevance is, NOTE Treffsikkerhet: 88% (H?Y) 00:24:56.050 --> 00:25:01.700 all items you including your measure should be relevant and they should be relevant for the 00:25:01.700 --> 00:25:07.800 construct of interest in that particular Target population. So that's why it's very different if 00:25:07.800 --> 00:25:12.900 you design a measure for adults or children. Also, your language should be very different. 00:25:12.900 --> 00:25:20.300 And that's got to do also with comprehensiveness, are all key aspects of the constructs included? 00:25:20.300 --> 00:25:26.100 And comprehensibility, of course especially if you are working with children, you need to adjust your NOTE Treffsikkerhet: 90% (H?Y) 00:25:26.100 --> 00:25:34.400 language. And also is the measurement for teachers or parents that you need to trial your measure in 00:25:34.400 --> 00:25:40.000 the Target population that should actually complete these measures in the future. Relevance, 00:25:40.000 --> 00:25:46.700 comprehensiveness and comprehensibility, all three all important. So, back to this one. So if you do 00:25:46.700 --> 00:25:52.300 your item generation and that is very often from the literature. And you ask experts is this 00:25:52.300 --> 00:25:55.850 relevant, Etc. We talk about the bit later. Then you've got your pilot NOTE Treffsikkerhet: 85% (H?Y) 00:25:55.850 --> 00:26:03.200 testing about comprehensiveness and comprehensibility. In the end that brings you to a rating scale. 00:26:03.200 --> 00:26:11.000 Now, all of that is content validity. Now, if we go back to the Cosmin, Cosmin had these criteria, 00:26:11.000 --> 00:26:18.300 10 criteria for good content validity. And you can see the top ones, you can relevance and 00:26:18.300 --> 00:26:24.000 comprehensiveness, that bit you can determine by running a delfi study. 00:26:24.000 --> 00:26:26.050 We talk about that later. NOTE Treffsikkerhet: 90% (H?Y) 00:26:26.050 --> 00:26:32.100 And piloting in your target population that you can ask: Well, is it comprehensible? Do you understand it? 00:26:32.100 --> 00:26:37.700 Of course, also comprehensiveness can be there, but in the pilot study absolutely need to ask 00:26:37.700 --> 00:26:45.500 are the items understood by the population of interest. So this is what we did in one of our 00:26:45.500 --> 00:26:51.550 ongoing delfi studies running, and this is how we decided to develop our measures. 00:26:51.550 --> 00:26:56.000 So just in brief, what is it Delfi technique? Delfi technique is NOTE Treffsikkerhet: 82% (H?Y) 00:26:56.000 --> 00:27:02.900 all about achieving International expert consensus. Now, this is for the ones who want to know 00:27:02.900 --> 00:27:10.700 where delfi term came from. It's from the Greek, but Delfi technique means you invite experts, 00:27:10.700 --> 00:27:18.700 International expert usually, I only run International Delfis, you have repeated online surveys 00:27:18.700 --> 00:27:25.400 where you every time ask for feedback from the experts. It is the anonymous which is important and 00:27:25.400 --> 00:27:26.000 you agree, NOTE Treffsikkerhet: 89% (H?Y) 00:27:26.000 --> 00:27:33.300 You decide if 70% of all statements, whatever is in your delfi is accepted by your expert, you say 00:27:33.300 --> 00:27:39.900 we achieved consensus. You will never have a hundred percent. So 70% is pretty good. Okay, our 00:27:39.900 --> 00:27:45.600 delfi again, If I look at this picture, you've got your problem, you invite your 00:27:45.600 --> 00:27:51.700 participants. You have your initial questionnaire. Then that question is goes for instance, I did 00:27:51.700 --> 00:27:55.950 that based on the literature, that one goes to my expert panel. NOTE Treffsikkerhet: 75% (MEDIUM) 00:27:55.950 --> 00:28:02.500 I ask them for feedback. I summarized it back. I check for consensus. If not, it goes back in 00:28:02.500 --> 00:28:08.100 the second round. You do that a couple of times. So the red one is your Delfi and then you've got 00:28:08.100 --> 00:28:17.100 your final results. Same thing here. There is my delfy. So you facilitate the expert discussion. 00:28:17.100 --> 00:28:24.500 That's what you do. So while you've got all these repeated surveys and usually 2 is minimum, 3 is 00:28:24.500 --> 00:28:26.000 normal, you usually need NOTE Treffsikkerhet: 87% (H?Y) 00:28:26.000 --> 00:28:34.200 3 rounds before you come to International consensus and that is also after every round to give 00:28:34.200 --> 00:28:40.500 feedback to the expert. You say, okay, we agreed now on this, we've got this bit is consensus, but 00:28:40.500 --> 00:28:48.100 these new items still lack consensus, you revise items based on Expert feedback. And again, you ask 00:28:48.100 --> 00:28:54.200 the experts, can we come to agreement now? Also they will save all this is a stupid item. Don't 00:28:54.200 --> 00:28:55.950 want it in, it's not relevant, NOTE Treffsikkerhet: 73% (MEDIUM) 00:28:55.950 --> 00:29:02.700 I don't understand how you formulate it. All of that happens in the Delfi. And we always 00:29:02.700 --> 00:29:09.600 have these open boxes that we say: Well, any other comments that you would like to add? 00:29:09.600 --> 00:29:17.000 The time between designing a delfi and publishing is easily a year. 00:29:17.000 --> 00:29:23.700 Anonymous is very important in Delfi method because otherwise you get that 00:29:23.700 --> 00:29:25.950 one or two experts can really dominate NOTE Treffsikkerhet: 90% (H?Y) 00:29:25.950 --> 00:29:31.700 the whole discussion and you don't want that, you want International consensus and not one or two 00:29:31.700 --> 00:29:38.300 person that can take over the all discussion. So, anonymous is important that people can say 00:29:38.300 --> 00:29:42.800 whatever they want and they know that they're not going to be punished for it or that they don't get 00:29:42.800 --> 00:29:49.900 the whole group against them. It is anonymous. So what is it a Delfi? It's a technique to 00:29:49.900 --> 00:29:55.750 achieve expert consensus from a panel and that is on a NOTE Treffsikkerhet: 83% (H?Y) 00:29:55.750 --> 00:30:02.100 well-defined topic and usually this is used for instance in instrument development. You've 00:30:02.100 --> 00:30:08.199 got your draft measure, you send all these items that is always more than you want to end up with, 00:30:08.199 --> 00:30:13.900 you send all these draft items to your expert and you ask for feedback. Do I need to reformulate it? 00:30:13.900 --> 00:30:20.600 Do I need to delete items? Do I need to add items? And even if you still have got too many 00:30:20.600 --> 00:30:26.350 items, that's okay. Because you can later on do statistical analysis to reduce numbers. NOTE Treffsikkerhet: 89% (H?Y) 00:30:26.350 --> 00:30:32.900 But anyhow, so this you go to your group. You facilitate the group discussion by having different 00:30:32.900 --> 00:30:39.200 rounds and giving everybody feedback and that is very useful in at early stages of instrument 00:30:39.200 --> 00:30:44.900 development. All our instruments that we are developing at the moment are starting the delfi 00:30:44.900 --> 00:30:53.700 studies. Now one more time, first you go to the literaturedefine your topic, ask experts for 00:30:53.700 --> 00:30:55.600 feedback already there, NOTE Treffsikkerhet: 83% (H?Y) 00:30:55.600 --> 00:31:02.500 Identify, invite your experts. So in case of instrument development, based on in Step 1 I retrieve 00:31:02.500 --> 00:31:08.300 measures in the area that I want to develop a new measure and that I make sure that I've 00:31:08.300 --> 00:31:15.500 got all these items covered and all these areas covered. But I then identify my experts, I invite 00:31:15.500 --> 00:31:22.750 them. Usually step one is about definitions. Do you agree with these definitions? Do you agree with 00:31:22.750 --> 00:31:25.800 all these concepts that we want items on? NOTE Treffsikkerhet: 77% (H?Y) 00:31:25.800 --> 00:31:31.400 Do we agree that we are talking about the same thing? Again in my area, we talked about Dsyphagia, 00:31:31.400 --> 00:31:36.700 and we said: Do you agree that drooling is not Dsyphagia. Do we agree? So I've got a 00:31:36.700 --> 00:31:42.000 definition of this and asked feedback from the experts. In the next round I will come with the 00:31:42.000 --> 00:31:50.500 revised definition and ask: Okay, do you now agree? Etc. Then I summarize all the answers. I will give 00:31:50.500 --> 00:31:55.900 them feedback. I said, well, this is what most of you agree on and this is what we don't agree on. NOTE Treffsikkerhet: 79% (H?Y) 00:31:55.900 --> 00:32:00.900 So I'll revise these ten items. Could you please have another look at it? And again it goes back to them. 00:32:00.900 --> 00:32:06.200 You keep doing that. You keep revising, your keep asking for feedback and in the end, but 00:32:06.200 --> 00:32:11.000 that's not necessary round two, that could be round three and if you're unlucky, you've got 00:32:11.000 --> 00:32:18.300 round four or maybe even around five, you need to be nice to your participants. If your delfi takes 00:32:18.300 --> 00:32:24.700 several several hours to complete you will lose many many participants during the process. 00:32:24.700 --> 00:32:25.900 So and ideal Delfi NOTE Treffsikkerhet: 80% (H?Y) 00:32:25.900 --> 00:32:31.200 should not take longer than an hour in the ideal world. I must admit I was invited in a very 00:32:31.200 --> 00:32:37.900 complex psychometric delfi that took hours and I did not finish the second round. It was just a 00:32:37.900 --> 00:32:46.500 nightmare. Then what you also do, you pilot it, you've got 00:32:46.500 --> 00:32:51.600 semi-structured interviews of cognitive interviewing, you take your measure, you've got 00:32:51.600 --> 00:32:55.850 something like 10 or more person from the target population NOTE Treffsikkerhet: 91% (H?Y) 00:32:55.850 --> 00:33:04.200 and you ask feedback from every single item, anything missing, does this apply to your 00:33:04.200 --> 00:33:13.900 situation? Try to understand from their perspective, maybe change the order of the items. 00:33:13.900 --> 00:33:20.700 They understand it? Again, remember comprehensibility is important. Anything missing? 00:33:20.700 --> 00:33:25.900 And comprehensiveness? Okay, cognitive interviewing techniques, NOTE Treffsikkerhet: 90% (H?Y) 00:33:25.900 --> 00:33:30.900 you probably know more about that than I do. But that is what you do with a selected group of 00:33:30.900 --> 00:33:37.900 patients or students or children. Now, what happens? So the first you give them items. And first 00:33:37.900 --> 00:33:45.550 thing it's comprehension. They interpret the question, you discuss, are there terms or anything that 00:33:45.550 --> 00:33:53.300 could relate to unclarity. Then retrieval response, search memory for relevant information, its 00:33:53.300 --> 00:33:55.850 recall difficulty than judgment. NOTE Treffsikkerhet: 85% (H?Y) 00:33:55.850 --> 00:34:03.700 You respond, evaluate, estimate response. It could be bias or sensitive. I mean if you ask do you 00:34:03.700 --> 00:34:09.100 drink a lot. You know, you get biased answers there. So you need to think, well, is this question useful? 00:34:09.100 --> 00:34:17.400 Should I rephrase it? And then you need to make sure, is the format okay of your response? 00:34:17.400 --> 00:34:24.500 Or is it incomplete response options that if you say yes or no, maybe that is an item that 00:34:24.500 --> 00:34:25.900 should not be dichotomous, NOTE Treffsikkerhet: 88% (H?Y) 00:34:25.900 --> 00:34:32.500 you want an ordinal scale, that too is part of the whole cognitive interviewing and discussion 00:34:32.500 --> 00:34:35.650 in your pilot. NOTE Treffsikkerhet: 81% (H?Y) 00:34:35.650 --> 00:34:45.300 Okay, all of that, the delfi, the piloting you type of population is considered development study. 00:34:45.300 --> 00:34:55.300 Now the next phase is to conduct a Content validity study. Okay, here we go. So if we conduct a 00:34:55.300 --> 00:35:00.600 Content validity study, we've got again, these ten criteria. It's still got to do with content 00:35:00.600 --> 00:35:05.600 validity, but you've seen this one already. I move on. What to do when you do a Content validity study? 00:35:06.200 --> 00:35:12.100 Now, again, you've got target populations and professionals. And that depends on, of course, for who 00:35:12.100 --> 00:35:18.700 did you design your measure? But if your target population, for instance, in my case, patients or 00:35:18.700 --> 00:35:25.200 persons with swallowing problems or shy children then the professionals, could be clinicians that 00:35:25.200 --> 00:35:32.900 actually work with these children or complete the questionnaire, maybe or the professional could 00:35:32.900 --> 00:35:36.149 also be the teacher. Now, both Target NOTE Treffsikkerhet: 84% (H?Y) 00:35:36.149 --> 00:35:43.000 population and professionals, need to have agree on relevance, need to agree on comprehensiveness, and 00:35:43.000 --> 00:35:48.700 the target population, the ones that are going to complete your measure need to agree on 00:35:48.700 --> 00:35:56.800 comprehensibility. And then that gives you a rating skill development. NOTE Treffsikkerhet: 91% (H?Y) 00:35:57.000 --> 00:36:04.700 Again, that is very short, but first you have your development of your tool, then you run 00:36:04.700 --> 00:36:09.800 it in bigger numbers and then you decide this and that needs 00:36:09.800 --> 00:36:15.900 changing, you do that. And then you've got your final measure. Now, once you've got your final 00:36:15.900 --> 00:36:23.000 measure, I hope you understand by now why it takes a long time to design a measure or why you don't 00:36:23.000 --> 00:36:26.050 just do that over the weekend, then you need NOTE Treffsikkerhet: 79% (H?Y) 00:36:26.050 --> 00:36:32.050 to look into psychometrics and that is when Cosmin or any other segment of the framework comes in. 00:36:32.050 --> 00:36:37.300 So then you've got your measure but you don't know yet whether it's valid or reliable or 00:36:37.300 --> 00:36:44.000 responsive. So there comes your psychometric studies and your psychometric studies need to, of course 00:36:44.000 --> 00:36:49.550 cover the reliability, we know there's internal consistency between reliability and measurement error. 00:36:49.550 --> 00:36:55.200 It needs to cover validity with content validity, but you covered that one already, construct validity 00:36:55.200 --> 00:36:56.100 and Criterion validity, NOTE Treffsikkerhet: 87% (H?Y) 00:36:56.100 --> 00:37:06.600 and we had responsiveness. Now that asks for two types of statistical analysis. 00:37:06.600 --> 00:37:13.200 Now you have been trained as far as I know in classical test theory, that is about testing. That is 00:37:13.200 --> 00:37:21.100 about the t-test, things that is is yes or no significant. We talk about that a little bit later. 00:37:21.100 --> 00:37:26.050 And on the other end, you've got item response theory or rationalize, that is NOTE Treffsikkerhet: 91% (H?Y) 00:37:26.050 --> 00:37:33.000 different, but I will get back to that. So what you ticked off already is the content validity 00:37:33.000 --> 00:37:39.900 because that was the whole idea. You've done that already and cross-cultural validity is missing in 00:37:39.900 --> 00:37:45.700 this one, because that is only interesting if you translate a measure into another language. So what 00:37:45.700 --> 00:37:51.200 you need still need to do is the whole reliability bit, the construct validity and the Criterion bit, 00:37:51.200 --> 00:37:56.250 and the responsiveness bit, meaning, you need to use your measure in an intervention NOTE Treffsikkerhet: 82% (H?Y) 00:37:56.250 --> 00:38:03.900 or in a change of a population before or after surgery, before after dyslexia training, whatever 00:38:03.900 --> 00:38:10.300 you do but there needs to be an intervention, something needs to happen in between. Now just this, 00:38:10.300 --> 00:38:16.300 you do that. You need a lot of statistical analysis. You've seen that, and what you also will do, 00:38:16.300 --> 00:38:22.600 when determine all the psychometrics, we will also have a look at the ordinary statistics that is on 00:38:22.600 --> 00:38:26.050 this side as well as item response Theory. And I only have NOTE Treffsikkerhet: 81% (H?Y) 00:38:26.050 --> 00:38:33.500 one slide on this because what is the difference? We are trained in this. We all are, it's 00:38:33.500 --> 00:38:41.100 relatively simple. It talks about whether the whole test, it looks at not at item level, but 00:38:41.100 --> 00:38:46.900 the test level. So you use the whole test and then the outcome of classic test theory is yes 00:38:46.900 --> 00:38:53.500 significant, not significant. Why do we need something like item response Theory? 00:38:53.500 --> 00:38:56.000 Because it can look at item, NOTE Treffsikkerhet: 89% (H?Y) 00:38:56.000 --> 00:39:05.500 it doesn't has the test as the unit of analysis, but items so it can actually say this item is 00:39:05.500 --> 00:39:13.300 not working. This item is not being used the way it should be used. For instance, everybody says 00:39:13.300 --> 00:39:20.200 maximum agreement on an item, that is not differentiating at all. So it's a not useful item. So we 00:39:20.200 --> 00:39:26.000 want to get rid of that or maybe there's an item and everybody says totally disagree, or they don't NOTE Treffsikkerhet: 79% (H?Y) 00:39:26.000 --> 00:39:34.300 understand it. That is a problem. In classic, I can't see which item is giving me a problem, 00:39:34.300 --> 00:39:41.150 in item response I can look at items and delete specific items. You'll also will use by the way 00:39:41.150 --> 00:39:47.700 stats to get rid of things like redundancy. If you got too many overlapping items, item response 00:39:47.700 --> 00:39:56.150 can see, do participants use all scales, if you've got NOTE Treffsikkerhet: 84% (H?Y) 00:39:56.150 --> 00:40:02.200 five-point scale and they only used fourth and five level, then you can collapse scales, you can 00:40:02.200 --> 00:40:08.800 collapse level. So it does different things. Unfortunately. We are not really trained in that 00:40:08.800 --> 00:40:14.700 but you knew you both. You need to have a look at with ... 00:40:14.700 --> 00:40:20.700 that's on the left. And then you need to have more looking at the construct being measured and you 00:40:20.700 --> 00:40:25.950 use then the item analysis, the item as unit of analysis. NOTE Treffsikkerhet: 86% (H?Y) 00:40:25.950 --> 00:40:31.600 I will not teach you any further with that. And that is the whole thingof instrument 00:40:31.600 --> 00:40:38.100 development. Now, we've got 20 minutes still left. So there is time to finish it and some time for 00:40:38.100 --> 00:40:43.300 questions at the end. So I would like to see, okay, so how to select your screens and your 00:40:43.300 --> 00:40:48.200 measures? We said what is measurement? We said well how to 00:40:48.200 --> 00:40:55.100 develop and do you need actually to develop? So if you want to know whether you need to develop a 00:40:55.100 --> 00:40:55.950 new tool or not, NOTE Treffsikkerhet: 80% (H?Y) 00:40:55.950 --> 00:41:04.400 that depends on things like diagnostics and psychometrics. So first have a look at screens. 00:41:04.400 --> 00:41:12.650 Now, I teased you with diagnostic performance. So if we talk about screens, is a screen good or not. 00:41:12.650 --> 00:41:18.500 Then we talked about you looked at study quality, describing the screen, diagnostic 00:41:18.500 --> 00:41:25.900 performance. And if those two are bad you don't want the screen. If they pass both, NOTE Treffsikkerhet: 85% (H?Y) 00:41:25.900 --> 00:41:31.000 then you can have a look at whether you want to implement it in research, clinics, education 00:41:31.000 --> 00:41:38.200 anywhere. So how to select your screen and assessments? Then you need to look at it what 00:41:38.200 --> 00:41:45.750 and who quality assessment. Let me explain that. Screening what and who. That would be 00:41:45.750 --> 00:41:52.400 what is the construct you're interested in, is it just swaloowing again? What is your target population? 00:41:52.400 --> 00:41:56.050 And the other thing is then, what is NOTE Treffsikkerhet: 88% (H?Y) 00:41:56.050 --> 00:42:01.200 study methodology and then you need your critical appraisal tool that can be different types. But 00:42:01.200 --> 00:42:07.400 you need a Cat and you look at your screening tool and then you look at diagnostic performance. You 00:42:07.400 --> 00:42:14.600 look at content validity, reliability, and feasibility. That all needs to be covered actually if 00:42:14.600 --> 00:42:21.300 if you wanted to select a screen. Of course, if I start here, it should be 00:42:21.300 --> 00:42:25.950 the construct you're interested in. It should be useful for your age group, or whatever you are NOTE Treffsikkerhet: 86% (H?Y) 00:42:25.950 --> 00:42:31.900 looking at and this is all about quality assessment. If it is about a screening tool with poor 00:42:31.900 --> 00:42:37.700 diagnostic performance it's not useful. If the content validity is poor that is linked to your 00:42:37.700 --> 00:42:43.300 construct up there. If the reliability is not good, well, why would you bother? And feasibility, if screen 00:42:43.300 --> 00:42:49.100 takes half an hour, well, that sounds more like an assessment to me. So all of this needs to 00:42:49.100 --> 00:42:51.050 kind of be covered. NOTE Treffsikkerhet: 89% (H?Y) 00:42:51.050 --> 00:42:57.300 Now, if I look at the assessments, we've got kind of similar chat. Assessment, we said, first you 00:42:57.300 --> 00:43:02.100 look at a Cat and that was the cosmic checklist, the methodological quality assessment of that 00:43:02.100 --> 00:43:08.300 measure. Then we said, you look at quality criteria. Are the psychometric quality assessment of 00:43:08.300 --> 00:43:14.100 these measurement properties for study. You had predefined quality criteria. How high should a 00:43:14.100 --> 00:43:19.800 correlation be before I think it's okay. How high should a Chronback's Alpha be before I say 00:43:19.800 --> 00:43:21.000 internal consistencies are okay. NOTE Treffsikkerhet: 77% (H?Y) 00:43:21.000 --> 00:43:28.600 And then we said, you combine all the data. You've got per measurement property, per assessment 00:43:28.600 --> 00:43:36.100 and you look at the different levels of evidence. Not going to get in deep, 00:43:36.100 --> 00:43:41.800 we've done this before, but the first step is just a Cat, critical appraisal tool. The second one 00:43:41.800 --> 00:43:47.600 is evaluation of psychometric properties of that measure based on literature. 00:43:50.500 --> 00:43:57.649 Third one is where you summarize any detail, any psychometric evidence. And you look which measure 00:43:57.649 --> 00:44:03.150 is the best one. If you got the luxury that you can choose between measures, 00:44:03.150 --> 00:44:09.800 take the one with the most robust or most promising segment properties. Now, doing that again in 00:44:09.800 --> 00:44:16.600 one final slide. How does that work? On top what and who refers to the construct, to your 00:44:16.600 --> 00:44:21.050 target population, but also to the respondent because the screen is NOTE Treffsikkerhet: 86% (H?Y) 00:44:21.050 --> 00:44:26.900 usually to the teacher or so, but it could also be that you've got a survey for a child or 00:44:26.900 --> 00:44:33.300 for the clinician or for the teacher. So who's responding? And then if you go to quality assessment, 00:44:33.300 --> 00:44:39.000 well, that should be of course a little bit familiar by now, that is your Cosmin risk of bias 00:44:39.000 --> 00:44:44.700 checklist. And if we look at the assessment, then we look at all the measurement properties. 00:44:44.700 --> 00:44:50.550 But again also feasibility and reliability, is of course included here. You've got validity, 00:44:50.550 --> 00:44:50.950 reliability and NOTE Treffsikkerhet: 89% (H?Y) 00:44:50.950 --> 00:44:57.600 responsiveness. And that together gives you the psychometric properties. Again, a measure, 00:44:57.600 --> 00:45:03.700 like what we are doing now in instrument development, a measure with over a hundred items is 00:45:03.700 --> 00:45:12.600 ninety-nine out of a hundred is too many, you need to reduce time. You can't ask children to complete 00:45:12.600 --> 00:45:18.700 hundred items. They are gone off at 10 items or whatever. Do I really need all 00:45:18.700 --> 00:45:21.050 these items to get a good idea of the NOTE Treffsikkerhet: 89% (H?Y) 00:45:21.050 --> 00:45:28.350 construct I want to measure? So, get rid of any redundancy in your measure. Now that is about it. 00:45:28.350 --> 00:45:36.300 But just list some do's, and don't... So look at good methodological quality. Look at good diagnostic 00:45:36.300 --> 00:45:45.100 performance, good psychometric properties, or maybe unknown properties, but not poor. That is out. 00:45:45.100 --> 00:45:50.600 If they've got poor segment properties, do not use that. You don't know what you're measuring. 00:45:50.950 --> 00:45:57.600 Use intervention based on highest level of evidence and please don't do that. Now, you've got already some 00:45:57.600 --> 00:46:02.900 references. I just click that through. And this was actually the literature that was planned, 00:46:02.900 --> 00:46:09.500 but that one is going to be replaced by exam preparation. So just the instrument development that 00:46:09.500 --> 00:46:20.100 could be that chapter on test by Cohan and that is where I would like to stop sharing, and stop recording.