episode 7: Lively discussion about tests with Elena Plante: why we use them, what makes one valid, is more better, how are they different now?
Note: If you listen to the podcast on this website (seehearspeak), its best to use the Chrome internet browser. Best listening comes through the apps listed below because you can 'follow' the podcast to receive notifications of new episodes and see time stamps that correspond with the podcast transcripts.
For the Episode 7 Transcript, Click "Read More" below
Tiffany Hogan: Welcome to See Hear Speak Podcast Episode 7. In this Episode I talk with Elena Plante, Professor of Speech, Language and Hearing Sciences at the University of Arizona and current American Speech-Language-Hearing Association Vice President for Science and Research. We have a lively discussion about the use of tests. As educators and clinicians we use tests often but how often do we stop and consider why we use certain tests, how tests are created, what makes a test valid, is it better to administer more than one test to determine a child’s ability, and how do test results play into our decisions for treatment? We talk about the importance of theory-driven assessment, key concepts for determining risk for dyslexia to be in compliance with dyslexia laws that are sweeping the nation state by state and how current tests differ from past tests.
Per usual, we end our conversation with Elena describing her current most exciting project and favorite book.
Thank you for listening! And don’t forget to check out www.seehearspeakpodcast.com to sign up for email alerts for new episodes and content, read a transcript of this podcast, access articles and resources that we discussed, and find more information about our guests. Also don’t forget to subscribe to the podcast in apple podcast or wherever you are listening.
Tiffany Hogan: Welcome to see, hear, speak podcast. Elena Plante, I'll have you start by introducing yourself.
Elena Plante: I'm Elena Plante and I am a certified and licensed speech language pathologist. I am a professor of speech language and hearing sciences at the University of Arizona.
Tiffany Hogan: And you have a new position with ASHA recently, correct?
Elena Plante: Yes! If you're an ASHA member, I am your vice president for science and research.
Tiffany Hogan: Oh, fantastic. So you've spent a big portion of your career thinking about psychometrics and the appropriate use of tests and SLP practice. You said you're currently teaching evaluation right now, so you've got your head in it even more this semester. So what are the main purposes for using tests and how are those purposes you informing tests are chosen by SLPs?
Elena Plante: You know, this is a really, really important concept that many speech language pathologists aren't really getting in depth in their graduate programs. And that is that a test isn't valid or invalid. It's only valid relative to our purpose. And so it speech language pathology, we actually have a lot of different purposes that we give tests for. So we may give one to screen people who are at risk for having a disorder. We may give one to identify a disorder. We may give a test because we already know that the child or adult has a disorder, like aphasia. We don't need to give a test to determine whether someone has aphasia, we pretty much know when they walk in the door. But, we might want to know how severe it is or what the profile is, what the strengths and weaknesses are. These are all completely different purposes and require completely different sources of evidence to prove validity and to provide that evidence base for using the test in that way.
Tiffany Hogan: So if I hear clinicians say, "Oh, my favorite test is X, Y,Z , and I use it all the time for everything," what's the likelihood that test is appropriate for every purpose?
Elena Plante: Slim to none. And here's the reason: When you're asking different questions, sometimes the evidence for one to support one kind of diagnostic purpose, is diametrically opposed to the evidence you need for another diagnostic purpose. So let's take the issue of, ranking severity against identifying a disorder. So when we want to identify the disorder, it's essentially we're answering a yes/no question. Is this person impaired or is this person normal? In an optimal world, you would have something like if you do "this," you're impaired 100% of the time. If you do "that," you're normal a 100% of the time. So the distribution is actually not a normal curve. It's two lines. So you can think of this in terms of a medical test that. You know, if you have this pathology in you, you have this disease. And if you don't, you don’t. So that's really kind of a bidding exercise. You're one thing or you're the other thing. With severity ranking, what you're trying to do is determine where a score falls in a very wide distribution of scores. So for one case you need this wide distribution. And for the other case you need a distribution of one point optimally. Now, of course in our field, it doesn't quite work out quite that cleanly, but you can see from the example that a test is really good for ranking people within the normal range. When you develop that test, you're going to develop it in a way that it really does that well. But conversely, it might make this same test, not so good at answering the yes/no question.
Tiffany Hogan: Yes, that makes a lot of sense. And then I think there's also the purpose of wanting to better understand what treatment targets to use. So then that would be even a different purpose.
Elena Plante: Oh, absolutely. And I did a study with a master student 20 years ago where we looked at tests that had the same content. So we were looking at morphosyntax. Test A would test third person agreement and is verbing and plural s. Test B would test the same things. But we looked at how often if you fail plural s on the first test, would you also fail it on the second test? And the answer was not that frequently. And the reason why is the context in which those tests imbed those items will make it more or less difficult. Here's a great example. This was an actual test, although it is not on the market anymore so I won't name it. But, one of the test items was the pronoun "me," and "me" is acquired it about oh, age two.
Elena Plante: Okay. But the way they tested "me" was an item that said, "You want a cookie. But the woman gave me a ___." The woman is holding a piece of pie, and then there's the thought bubble over her head that has cookies in it. And you're supposed to say, "but the woman gave me a piece of pie." I give this test to Grad students all the time, and two thirds of the class failed the item! And they should have acquired "me" at two. So just by embedding it into a context that makes difficult, you can pass or fail that item on any given test. Now that's an extreme example, but the reality is there are lots of things that will affect performance that are much more subtle. So for example, do you have a four picture array or a three picture array? That turns out to make a difference. If you're using colors, the palette of the color and the relative intensity of the colors across the pictures can draw a response even when the test taker knows the correct answer. So there are lots and lots of subtle ways in which this plays out, that we as clinicians can't know whether or not the item is actually acquired, or whether it's just in a context that is masking true performance in to some degree or another. That's why you don't take, therapy targets straight off of tests. You really need to use the test. You can say, okay, we're looking at this. These are the kinds of items that are being passed and failed. Let me follow up on that. Maybe I should get a little language sample analysis and see if they're able to use them spontaneously. Maybe I should get in there with some clinical probes and see if I set things up so that there's no obligatory context for the child to use the item. Those sorts of things. But just going straight from a test right to therapy targets is usually going to be really problematic.
Tiffany Hogan: Right. And I can even think in the case, extreme case again of the example you gave me, that if that test was given, the clinician could think erroneously, oh, they don't understand that concept of me and then maybe even work on it. I mean, it could go even that extreme. It really could.
Elena Plante: Right, right. But in the in the master's thesis we did with Andy Merrill, what we found was that even for tests that didn't have extreme examples like that, again, they could pass it on one test and fail it on the other.
Tiffany Hogan: Wow. And then you can see how that would be used erroneously in treatment. But I also think clinicians are told "you need to use a standardized norm referenced assessments." And they're, told, you know, these homemade assessments aren't good or even functional. But there is that balance between choosing targets that are functional and using appropriate standardized assessments.
Elena Plante: Well again, this gets back to purpose. So you need to be really clear about why you're using that particular test. And it's probably not to select therapy targets. So under the IDEA, which all school based clinicians are constrained by in the practice, the IDEA does not mandate giving any standardized tests. You can qualify a child under the Ida without giving a single standardized test. So I think what clinicians think they're supposed to do actually always isn't the case. So it certainly isn't the case under federal law. What the IDEA does say is that if you use an assessment method, it must be valid for the purpose for which it is used. And that's the kicker, for the purpose for which it is used. So again, you might use a standardized test to make the identification decision, but use other methods to decide what treatment is needed. So one of my favorite things to do for school aged kids is just go get their worksheets, and then look at the content of the worksheets and say, okay, to pass, to get a good grade on this worksheet, what skills do I need to have in language? And if you go over a whole bunch of worksheets, those skills that are showing up in the schoolwork that the kids can't do, will come to the surface. You will see them over and over again that when a poor grade, is happening, what the kid needs to do that is phonological decoding or metalinguistics or semantics, you know, or relational semantics. You can even get that specific just by looking at the content. So clinicians can use what they already know about the nature of language and language disorders to look at other materials to determine where the need is. And that's completely appropriate under the IDEA. And in fact, the IDEA mandates that the children have to have an educationally handicapping condition. And by looking at their work, you're already linking it directly to their educational needs.
Tiffany Hogan: And I liked that too, as someone who thinks often as you do about the connection between oral and written language, is if you want to make it something that's functional in the school setting, you can't deny that connection between oral and written language. It is critical for academic success.
Elena Plante: Absolutely. And we see this all the time that the things that we've cleaned up as clinicians, you know, therapy works, that's the good thing. We can clean up early morpheme omissions and things like that and wacky morphosyntax. But it will show back up in the writing. And then it will show back up a weakness in phonological system in decoding. So, you know, this is not rocket science. You just have to give up the idea that somehow language in its written form is some how unique and special.
Tiffany Hogan: Absolutely. And also embrace the knowledge we have about language and how it just directly applies to written language. I like what you mentioned about cleaning up certain aspects and feeling like we've made improvement. That's true. But I love Hollis Scarborough has a paper where she talks about illusionary recovery. And we know that language disorders is a lifetime disability that needs to be addressed and considered over time and the different manifestations.
Elena Plante: Yeah. You know, there's a really nice chart that came out of the department of Ed, that shows there is a systematic decrease in the use of speech language impaired as the handicapping condition label over time in schools. And simultaneously, there's an increase in the use of learning disabilities as the handicapping label. And that's basically a reflection of the crossover point. Which is exactly where you'd expect it, where children start having to use literacy as an educational tool and their language disorder is now impacting their academic progress very obviously. And you start seeing kids who had been in maybe speech therapy for unintelligible speech. We all cleaned all that up in therapy, and all of a sudden their previously undetected language deficits are now impacting reading and writing and then they come back into the special ed system under that LD label.
Tiffany Hogan: Yeah, that's really interesting. And I think it also relates to, as you said, the purpose of the assessment, but also having a theory and scientific basis behind your assessment protocol. That's something I've been really hitting at with reading is that you can't detect a certain aspect that's difficult in reading if you don't measure it or you don't look for it. You have to have a theory that says these are the key components, I'm going to systematically assess those key components to uncover the deficits. And then address it appropriately, whether it's through the functional and even tying in if it's appropriate, an assessment that's comprehensive.
Elena Plante: Yeah. Let me just comment about the word theory there. Because I think the public really thinks about theory as a guess or conjecture. In science, a theory is the very best thing you can have because it is an explanation that ties together a plethora of known facts, and then allows you to predict future facts. So that's much, much different than a guest or a conjecture. And in fact it is a tool that allows you to say, when you've got a child in front of you, what things should I be looking for here? If I see this, should I also be looking for that? If I see deficits in oral language, should I be keeping my eye on literacy? Why, yes, yes you should. So these are very important things. The current crop of tests that are coming out these days are different than tests that used to come out when I was coming up through the profession as a young speech language pathologist.
Elena Plante: So back then, tests were largely built on models of normal language. They didn't perform that well for identifying language disorders. And the reason why is that language disordered kids are not bad across the board at all aspects of language. They're in fact differentially bad at certain things. And so as time has gone on, you've seen tests that are more oriented to loading the test with items that are more sensitive to disorders rather than items that capture the breath of language in typical kids. And generally speaking, those kinds of tests are going to be better identifiers. Now the side effect of that is that the models you learned when you came out of school may not reflect the best practice today. So for one one example, when I was raised as a speech language pathologist, we were really taught expressive versus receptive. And we were told that the vast majority of kids with language disorders have expressive deficits only. And then there's a subgroup that have receptive plus expressive. Well that that turns out to be not true. We've got about five or so really large scale studies that have looked at how language skills cluster. And you know what? They don't cluster that way. They don't that receptive and expressive is not the divide. The divide is things that happened at the sound or the word level, and things that happened at the sentence level. And if you test it, things that happened at the discourse level. And some of these studies tested discourse and some didn't. But that's very, very different than what was in the field 30 years ago. And so if you're an old dog like me, you really have to stay up with what the current conceptualization is, what the current data says is going on in order to do your best in terms of what your theory is guiding your assessment
Tiffany Hogan: Right. So I was raised in the same thing, so it's receptive, expressive, what type of impairment they have. And of course the DSM I think still has the receptive expressive tied into it. But ultimately, these new assessments, and the way I do my research studies is, you can collapse across receptive expressive tasks if you're looking at a certain domain. Like you said, like at the word level, the sound level, the sentence level. It's really just what is the appropriate assessment for that child's age and the task that you're completing. It doesn't have to do with the receptive expressive. It's tapping the same construct that just didn't slightly different ways,
Elena Plante: You know, someone said something that has stuck with me that's just a logical argument about this. Unless you have a mental model of what you're trying to produce expressively, you're not going to be able to produce it expressively. So the fact that you have an expressive deficit implies to a very large degree that you also have some problem at the mental model level, which would be receptive.
Tiffany Hogan: Absolutely. Just as that paper, Larry Leonard wrote, I believe, that talks about the receptive expressive component and tries to tie this together. And say, there really is not a distinction, the research is not playing out in that way. So that paper is a good resource.
Elena Plante: Yeah. That sounds like it probably was him that said it!
Tiffany Hogan: Maybe it was! It might have been in that paper because he goes systematically through the evidence showing that there should not be this distinction, that the research just does not support this distinction. So it may be, and that's a good resource. I was asked recently, speaking diagnoses and purposes, I was asked recently by a colleague who is an editor of a reading journal. And she's getting more work on DLD, language impairment. And she said to me, okay, so what is the standard criteria to diagnose childhood language disorders? And of course, you know what I did? I said we should talk, and what I did is share the paper that you wrote with Tammy Spaulding in the LSA HSS 2006 paper "Eligibility criteria for language impairment: is the low end of normal, always appropriate?" Because I told her, there's really no way to obtain a standard criteria. It has to do with the distribution and the test itself. So can you tell us about that paper and what is the main take home for clinicians?
Elena Plante: Yeah. If you're an ASHA member and you were reading the recent issues of the ASHA leader, you also saw quite a few letters to the editor hitting back on a paper that also said the same thing, that there's a standard. Um, well no. Okay. So what we found was that when we looked across all child language tests that had been published by that time, I think there was like 42 or 43 of them, that the average score difference between a normal group and an impaired was 1.2 standard deviations. Which means half of them were above that. I mean, that's what the average means. So for half of the tests, the scores lie between, you know, 1.2 standard deviations in the mean. Um, so that's just fundamentally not true at the population level of tests. And one of the things that that probably feeds into this gets back to something I brought up earlier and that is how sensitive the items are to the disorder.
Elena Plante: So are they testing language broadly? Or are they testing language in a way that really hones in on those, those aspects, that reflect the disorder? The more the test hones in on the aspects reflecting the disorder, the more sensitive and specific and it will get. So the more accurate it will be in saying you are impaired or you are not. Because normal kids will pass those certain items and impaired people will fail them to a much greater degree. This is kind of one of those things that people thought back in the, seventies and eighties, that if you're language impaired, you're just going to be at the low end of normal. That's how we would identify you on a test. But as I said, the data doesn't actually prove that. So you actually need a different piece of evidence, and that is sensitivity and specificity.
Elena Plante: So sensitivity is the rate at which a test calls people with impairment impaired. And specificity is the opposite. It's the rate at which people without a disorder are identified by the test as not having a disorder. So you need to know both of those things so that you don't under identify a disorder and you don't over-identify normal as disorder. When I came in the field, none of the tests had this. None of them did. Now, as newer tests are coming out, I'm pretty routinely seeing this. Not always, but more often than not, I'm seeing it. So right now if you take all the tests that might be on a shelf in a university clinic, probably about a third of them have it. But two thirds of them don't. That's the problem, that two thirds of the time, if you just randomly pick a test, it will not have the evidence you need to determine whether you can correctly diagnose somebody.
Elena Plante: Now getting back to the question is there a standard? No, there is not. There is a unique score for every single test that will maximally differentiate normal from disorder and it's not the same score across tests. It can be anywhere. It could be negative one standard deviation for one test and negative three standard deviations for another test. So I'll give you an example. So one of our masters students is out in a school placement and she said to me that her supervisor at the school had a test in her hand and said "this test always over identifies" and it turns out that their school system was using a criteria, a standard criteria across all tests of negative 1.5 standard deviations and everybody was impaired. My response to that was that's because the cut score is somewhere else for that test. And so if you're routinely over identifying, it means that the cut score of 1.5 standard deviations is probably too high for that test. So the test manual should provide you this information. If the test manual doesn't provide it, and you've just ordered the test, send it back. If you've had the test for a while, you could use it to prop up tables that rock or something like that. But otherwise, you don't have any evidence to justify using it for identification. Maybe it's good for another purpose, but certainly not for identification. So that sort of gets to a related question, which I'm sure the listeners are already asking in their mind: What if my school district makes me use the cut score? Well, first of all, they're probably not making you. Where that cut score originally came from, was probably you or your colleagues 30 years ago, with less information about diagnostic practices.
Elena Plante: So it's up to you to borrow a phrase from the late Tom Hixon to keep your corner of the cave clean. Yes. Okay. This is your practice. You need to be abiding by the ethical principles of your profession and it's up to you to keep things up to date. Now, this may seem really daunting, but it's not. It truly is not very often in school districts, the directors are not speech language pathologists. So they are waiting for us to come and say to them, you know, "This is really badly out of date. We've got to update these." So in our state we did this. We went to the state director of special education and said, you know, your best practice guidelines are really out of date, and response was not "well that's the way we do it." The response was, "oh really, perhaps we should put together working meeting on this." And you know, it took a while for us to work through all the issues and get all everybody on board and headed in the same direction. But we got it done. And so now our state guidelines specifically say that using an arbitrary cutoff score across tests is not appropriate. So if the school district is still doing that, they can change it tomorrow and still be in line with the state best practices. There is no barrier to doing that.
Tiffany Hogan: And I think too, what I do for my class when we talk about this article in particular is, I have little pieces of paper and I give them out. I have 10 people stand up in front of the class and I give them scores. Each person has a score and I say, okay, you're now in this district and their cut score is one is 1.5. So then I go through and I say, now we're going to cut here. So all of you below this, you're impaired and all of you aren't by this district standard. Okay now you move to a different district and that cutoff score is 1.2 so then you give the assessment and you know they, I say, okay, now if you gave that same assessment then you say okay, now you aren't impaired anymore, and you are. Because that also creates this kind of confusion if students are moving or parents are checking in with peers from another district. If you can go to one district and be impaired and another one not. The child is not changing, it's the criteria, and that just it doesn't make any sense.
Elena Plante: Well it absolutely. Absolutely. There are two ramifications for this. Let's see if I can remember both. Okay. One is that, the procedure for both of those school districts, is not only not evidenced base. It's not like there's just no evidence pro or con. There's actually evidence against doing it. So if you went to due process, or God forbid landed in court, you would lose big time, big time. And I find that school officials are really responsive to the legal ramifications of bad practice. So, then the other ramification of this is that what we're doing, is we're basically setting cut points that assure that the most severe kids are the ones who get treatment. Well, the other half of that is that the kids who we couldn't move from under-performing, at risk for not graduating, low SES consequences for their life. Those people who really could shift into a middle class life, they get nothing. Right. And the tax basis that those people could generate versus the taxes they're going to use because we couldn't move them into a functional range because they weren't eligible, is enormous. So the societal consequences of these are quite strong.
Tiffany Hogan: Absolutely. And the shame associated with having difficulties and yet being told, well, you're not trying hard enough, you're lazy, because you haven't been identified as requiring services. Those ramifications for everything. and all the decisions you make are strong. Listeners might be interested to know that in longitudinal studies, in larger studies, even in smaller studies, we find that only about 30% of children with language impairment are even being identified in the current practice. So there's a real problem with identifying these children and getting them services. They're just not being seen. They're missed. And I see this all the time in the studies I do that. I talk to parents and they'll say, I knew something was going on. The things that are told about their child when they bring forth some concerns and try to fight for it, the things they are told, are really demeaning. Things like, you know, your child's just not academically inclined. And it's, it's ridiculous. And it has to do with these practices. It's frustrating. I tell my students in that example that if they're in a district that has a certain cut point, they have to find a test that has the same cut point with good sensitivity and specificity.
Elena Plante: Yeah. And that's a hard job to do. I mean, occasionally you can do it, but it would be much more functional and you'd have a lot more options in front of it if you and your colleagues would just take care of the problem, just take care of it.
Tiffany Hogan: I mean, I love to hear your story that that worked in Arizona. I think that's really empowering and, and fantastic that you've been able to move the dial there and get more evidence based when it comes to, um, the diagnosis. We're really dealing with this in my work with dyslexia. So there's been this nationwide grassroots movement to say dyslexia in schools. So teachers are getting educated on dyslexia and schools are being mandated to identify dyslexia early. So we have a new law saying in Massachusetts for instance, that this coming fall, 2019 schools need to determine risk for dyslexia in all kindergartners. But what I find is that, teachers, schools, administrators don't have the education and the appropriate use of assessment and just data interpretation. And so concepts such as false positives, false negatives, sensitivity, specificity, as you mentioned, a positive predictive value, which I know you said you discuss your classes as well. And it's important to know how all these tie together and that's something I've been trying to get the word out about is what does this mean and how, how does that inform your decisions? Can you tell us about the relationship between these variables? What do they mean, and how would that inform a mandate such as you have to determine risk for dyslexia and kindergarten?
Elena Plante: Yeah. When you're talking about determining risk, you're really talking about a screening. Okay. So you're not talking about does this person have the impairment or not? It's really you're trying to identify a cohort of people who deserve a closer look. So when you're dealing with screenings, this is one thing that students struggle with and clinicians struggle with it as well. They're not really clear about what constitutes a screening. So a screening is not a test that is short. Okay. Short has nothing to do with it. Do you think about mammography? That is a screening. Okay. It's not short. We're certainly not short enough. Right? It not only takes your time, but then there's this whole back end where a radiologist is reading these things. It's a screening.
Elena Plante: Why? Because it's very good at telling physicians who needs a closer look as a biopsy or an ultrasound or some other diagnostic procedure to determine what's going on. So, if it's not short, what is it? It's how accurately does it determine who's likely to fail a diagnostic test. So, um, when you're screening, the issues of sensitivity and specificity still come into play because you want to identify everybody who's likely to fail that later diagnostic because those are the people who need help. Okay. And to get that you're willing to tolerate some over referral. So for a diagnostic test, you at the best balance between sensitivity and specificity and you want both of them to be high. For a screening, you can give a little bit on the specificity, but the sensitivity has gotta be high because you don't want to miss people, and not have them get the services they need.
Elena Plante: There’s a third metric, positive predictive validity, which is based on sensitivity and specificity, but it also calculates how frequent the disorder is in the population. This helps you gauge how good your screening is. So basically the concept is the more frequent the disorder is, the easier it is to find it. The more rare the disorder is, the more you're searching for that needle in the haystack. That is going to affect your overall accuracy of correct referrals and over referrals. But again, it all boils down to the foundation of this is still sensitivity and specificity. So if those things aren't high, particularly sensitivity, you're not going to have a functioning screener. The idea that you can sort of just make up some items, it'll serve as a screener is frankly ludicrous. You know, as a researcher, I can tell you when I'm trying to, to build a new experiment, I can be wrong seven, eight times before the thing works.
Elena Plante: So you know, the idea that I could just come up with something and it'll work the first time with, you know, the 3,000 kids that I'm going to screen is just ridiculous. So you really do need a tool that is built as carefully as a full diagnostic test. The bad news is that right now we don't have a lot of great options for screeners. But one of the things that we do know works pretty darn well, is parent and teacher report. There is a screener, it's not one of mine, the student language scale that has north of 80% sensitivity and specificity. It takes about three minutes for a teacher to fill it out about that long for a parent to fill it out too. It's really accurate compared to every other screener that's on the market right now. So, in the next 10 years, we're going to see better and better screeners coming out. But you have to know how to determine which is a good screener in which is a poor screener. Sensitivity, specificity and positive predictive validity are the three main elements. If I don't see those, I don't even bother looking at reliability. Because if it's not accurate, why do I care if it's reliable, you know? Many of us when we went through school, were taught to look at a checklist of psychometric criteria. I don't do that. I go for the main evidence. And if that's not there, I return the test to the publisher because all of them will allow you to do that. All of them. Or as I said, I use it to fix rocking tables or doorstop or something like that. If there's no evidence to support my use of it, I'm not in compliance with the federal law at that point.
Tiffany Hogan: That’s right. And when you think about, so the sensitivity being the test's ability to detect impairment, you have to have an outside measure. So I think what's not always understood, and where I've appreciate even more and more as I've done it myself, is that you have to have to, a sure diagnosis to get that sensitivity and specificity data. Right? And so that sure diagnosis has to be given, separate of the screener, and it has to be taking in lots of information and you feel very confident about that diagnosis. And then you give the screener and then you look at how well the screener is able to match up. If the screener says risk and the child has the impairment, that's the sensitivity. Versus if the full battery says, nope, the child's typically developing and the screener says the risk is low, then that is specificity. But like you said, whenever you have these decisions you're making about sensitivity and specificity, you're willing and screaming to have a few more false positives because that there's a kind of Yin and Yang. There're some great online tools where you can change your sensitivity and specificity and see how the curve moves.
Tiffany Hogan: So if you change sensitivity, usually take a hit and specificity and so forth, you want to have a good sensitivity. That means your specificities going to go down. You're going to have more false positives. That means you're going to have some children that you said are at risk, but they really aren't, but you'd rather over identify and then that's just a red flag and then you're going to further follow up with them. I was, I often use the example of cholesterol, so if you go in and you have this cut point, it's arbitrary as well, 200 for cholesterol and it's above, then it's not automatic that you're going to have a heart heart attack. It means that you are at risk, and you need to further evaluate your food intake, your exercise, your genetics. And then from there, you do some response to intervention typically.So if you think, oh, I've had a lot of ice cream before I did this test, you know. I think that's what it is. Then you need some time where you get your diet in check, and then look at your cholesterol. If it goes down, well then you have a little bit more of a causation going on. But what if you do all, you know, you're someone who exercises you really, you eat well and you have a family history of heart attack. Your doctor's probably going to skip over some of these other tweaks and say, oh, we need to get you right on some medicine or something. You know? I try to think of these kind of more medical examples because we all live them, you know, we're going to get checkups, right? Or like the mammograms, a great example. You know, we live these things every day and because of those experiences we go, oh yeah, of course if I have over 200 cholesterol, it doesn't mean I'm automatically gonna have a heart attack. There's something I can do about it, and figure it out further.
Elena Plante: Right. That's a really good example of what risk is. And again, when you're screening, you're really talking about risk you're not talking about has or has not the diagnosis.
Tiffany Hogan: Absolutely. So one piece of advice, thinking again on this line of dyslexia risk, is that I hear often that it's better to determine risk if you use multiple measures. So there was a recent piece in the international dyslexic association by Richard Wagner, and I'll quote what he says, "individuals with dyslexia are commonly misdiagnosed or even missed entirely. Part of the problem is unreliability and diagnostic that occurs for definitions that feature a single indicator. A promising solution to this problem is use of hybrid models that combine multiple indicators or criteria, improving reliability of diagnosis." So is it always the case that if you add more measures, you're going to increase your sensitivity and specificity?
Elena Plante: No! Absolutely not. There are actually two issues at play here, but let's deal with the sensitivity and specificity issue first. So the, the problem with a sensitivity and specificity metrics is that, um, if you have one test and it is bank'n good, it's 100% sensitive and 100% specific. Okay. I'd love to know about that test. I don't know about that test, but you know, let's say that test exists. Okay. And you give that test and you get an answer. Is your answer gonna be any better by adding a second test? Well let's take a test that is more realistic. It has 85% sensitivity. And just so I can do math in my head, 85% specificity. So now if you say you have to fail test one, that's 100% and 100% and test two that's 85 and 85% what happens is the error. The over and under identification rates add. So it's straight for those of you who had any programming back in the olden days, I actually had a basic course, the good thing is it teaches logic. So when you say this and this, you're combining those two things and the error combines. So your total error now is 15% when it was 0% before. And so if I had two tests that were 85 and 85 and 85 and 85 the total possible error now is 30%. Now your sensitivity rate is as poor as 70% and that's below what I'm comfortable using clinically. That's just too much error. So if you are arbitrarily combining tests, this is a real problem that you're better off using just the best test and going with that than trying to say, Oh, I've got to fail this one and I've got to fail this one.
Elena Plante: And I got to say, I still see this a lot in district eligibility criteria is where they say you've got to fail two tests at some arbitrary cutoff point. So those school districts are even worse than the school districts that are just saying, well, they have to fail this at this arbitrary cutoff point. There's another point here though that's a little bit more subtle about in the Wagner quote. And that is the idea of trying to stick to the phenotype. So it is the case that with, you know, dyslexia, again, they're not bad at all aspects of reading. They're bad at some aspects of reading. And they also have some others, uh, deficits that are, that are known to exist but are not quite so prominent, like working memory. So if you develop a test that combines those aspects of the phenotype, one test that combines those aspects of the phenotype, you will do better than just testing reading very broadly in terms of what normal people should do when they read.
Elena Plante: The way that these different aspects are handled, that's where the devil is in the details. Because if you make a separate test for each of those and say, you've got to fail this and you've got to fail this, you're into the error adding problem. But if you say, okay, we gave this battery of tests and we determine what's scores in combination predict being dyslexic, that's fine. What that ends up having to do though is it weights the different test scores. So you get test score a which may be is a decoding test and you weight by a factor of say, five, and then you get a working memory test and they are may or may not be bad on that, but it isn't as big a part of the phenotypes. So it gets a weighting of two and you get a vocabulary comprehension test and some dyslexic have problems with that, many do not. So that gets a weighting of one. And so when you combine all those scores multiplied by their weighting, what that means is that if you are a dyslexic and you're really bad at decoding, you're going to get identified. But if you're a little bit at decoding, but you're also a little bit bad at working memory, and a little bit bad at vocabulary, you're also going to get identified. So in those cases, paying attention to the phenotype really does help in how the test is developed. But it takes a very different kind of scoring to get the accurate sensitivity and specificity.
Tiffany Hogan: That makes sense to me in terms of thinking about structural equation modeling, right? So you have a construct and you have multiple indicators, but that model is pulling only the shared variance and leaving the air out. So it kind of speaks to this a bit, right? Like you're pulling in what's maybe shared in that phenotype, and you're giving some leeway to that weighting, and that will give you a better diagnostic accuracy. That can't be just carte blanc, give multiple tests. It's going to be a better sensitivity and specificity I think.
Elena Plante: No, no. You know, and the thing is there's, there's also kind of groundswell of using latent variable approaches to identify dyslexia. And I'm sure we'll see that bleed over into other fields of behavioral disorders. Um, but it's not good enough to just say you're high on this variable, this latent variable that represents dyslexia. You also need to know how accurate that is. So it still comes down to sensitivity and specificity. You can have a latent variable and a cut score for how much of that latent variable you have to have. But you, you in the end need to be able to say how accurate that method is.
Tiffany Hogan: Absolutely. And I think what I'm seeing is some states are saying, okay, to determine risk, you have to test, you know, these three constructs that we know are critical, but there's not a focus on the sensitivity and specificity even though theoretically, you know, yes, those are the, the phenotypic aspects that are key. It's still has to be, like you said, the sensitivity and specificity. That's what's going to determine whether that specific measure is sensitive because you can have a lot of different measures of biological processing, for instance, but that doesn't mean that they're going to be good for screening and have the diagnostic accuracy you're interested in. So I think that's the tricky part of it, it's tying it all together. Um, but really having the central part be that sensitivity and specificity.
Elena Plante: Just to reinforce something that we already covered, is that one test of phonological decoding is not equivalent to another. You can't just go to the publisher and say, well, there's a decoding test. That's the one I'm going to take you. You need to know something about how well it performs in detecting known decoding problems, which is sensitivity.
Tiffany Hogan: Well, I'm an eternal optimist so I actually am quite excited about these laws because I think it's going to force these issues more. And I think that it's rubber meets the road when you have, you know, I'm setting with districts now and they have all this data. Then you do have to say, where's the arbitrary cut point? What do I do? Where is the cut point? And it makes you think, what should I do? Or when you're thinking about, you know, how many children are at risk, you have to start thinking about sensitivity and specificity. So I think this could be a really nice byproduct that we start to see more of a focus on this evidence base. That's my positive view. I also have to say that I've been involved, and this is driven some of this positive view and something called the tools chart. And that is funded by the Department of Education. And it's where it's been going on now for a decade. So we first got together and it was like 10 of us. So the 10 of us had to sit a room in Washington DC and think of a user friendly definition for screening. It took us a full day to write one like two sentences. You can imagine, right? It's now evolved and been revised many times. But for 10 years what we've done is we created a website that has more of consumer reports and it has like, you know, full bubbles, all the evidence, good job, half bubble, empty bubble is not good. And you can kind of look at it across the chart. But what we did is solicit publishers to send us information about all of these aspects, sensitivity, specificity, reliability, validity, all of it. But it was all driven by purpose. So there's a tools chart for screening. So we that, you know, you can imagine what we're looking at as criteria for screening is very different. We look at like how did you determine the child had the impairment in the first place? And tie that to your sensitivity specificity and where are your cut points and all, what's your base rate? And what we found is initially many of the test did not have good ratings. As you can imagine, there were a lot of empty bubbles. So then what we would do is we'd send feedback to the publishers to say, here's why you didn't score well. And in the 10 years we've had this, the publishers have improved their assessments so much because they want to be on the tools chart because the tools chart is used by administrators and department of Educations at the state level to make these decisions about what tests to purchase. So I've seen this evolve over the decade that we've seen this kind of push back from, you know, that back and forth science/publisher approach. And it's, it's made me feel more optimistic and I'm hoping we'll see that same kind of evolution occur with these laws in place. I don't know.
Elena Plante: Yeah, I’m also an optimist. So from your lips to God's ears. I love the idea of these tools charts because, you don't have to go search publishers websites, and a lot of times the publishers don't actually provide the critical information you need on their website. You end up calling them, or at least I end up calling them. Which does some save some time in terms of having them shipped something that I'm just going to have to ship back. So tools charts are a good thing. That's a great thing. I'd like to see some of that done for speech language pathology just because it would just be such a handy tool to have. And publishers are very responsive to consumers. So, you know, the chart is, is a form of consumer pressure, but the other form of consumer pressure is when you return the test, stick a note on it saying why. One of the questions my students always ask me when they discover how bad many tests are, "well, how can they sell these?" And my point to them is publishers are not in the business of ethical practice. That's your business. Okay. They're in the business of selling things and as long as people like you bye them, keep them, and don't return them. Then they will keep selling them in exactly the form that you are buying them. Making the test better is costly. So before I developed my first test, I always wondered why in the world test costs so much. I don't wonder anymore. So, I spent 13 years on the tills and other people were on that team longer than I was. And by the time we were ready to publish, we were millions into the development. If you count our time, our salaries, data collection costs, all the costs of the tryouts. Yeah. Easily. But the point is that publishers will publish whatever will sell. Ethical practice is not their business. That's the practitioners' business. That's your responsibility to make sure that you are doing evidence based practice. And if the test is not providing that evidence, don't use it.
Tiffany Hogan: Absolutely. I just saw something on NPR and the headline was something like, do we need an FDA for educational practices? And I've heard you use the term educational malpractice. And I think this is kind of like, you know, we kind of think sometimes, well there must be someone watching like your students saying why are they selling this? There must be someone watching over this. No, that's our job. I mean we are the ones that need to do this. Uh, we don't have it FDA for our tests and our practices that we use. It's our ethical code and our knowledge base. Although I did like the idea actually of an FDA for educational practices, I thought that's pretty good. That would be nice. And actually, you know, speaking about the tools chart, I'm thinking hmm, I think I'm talking to someone who might be able to do something like that for ASHA.
Elena Plante: Hahah, yeah hm. Well actually we do have a committee that's relatively new that is specifically designed to be dealing with putting out information on evidence based practice to our professions. And one of the things that they're doing is developing practice guides. So I think that this is possibly one of the areas that should add to. And you know, we're saying here, there's a lot of bad tests and child language. Guess what? There are a lot of bad tests in motor speech, there are a lot of bad test in aphasia. But in each of these areas, I'm starting to see good tests too. So there are choices. So if people are throwing up their hands and saying, well, all tests are bad. No, I know that in child there is at least one choice at every age, every age. So, you know, you just have to get the right test and get rid of the tests that are not doing it. Or, you know, sometimes, if you have a hard time tracking down the information you need in the test manual, it's either because it is not there or because it's being disguised in a way that they don't want you to know the full story. So, these are the ways that red flags should go up for you. It is not that time intensive to check out. I would say that for clinicians that I speak to, 90% never opened the technical manual. For me, it's the only thing I open when I order a test. If I don't get past that, I don't even look at how to give the test or what the content of the test is.
Tiffany Hogan: That makes a lot of sense. And I think that having these resources out there, like your article, like what's being driven by these practice guidelines, it's there. And also hearing about your empowering story, changing the procedures and Arizona, I think it shows the future's bright and we can be our own advocates. Well, I know we're running out of time, but I want to ask two questions I always ask my guests. The first one is, what are you working on now that you're most excited about?
Elena Plante: Oh Gosh. I work on a lot of things. Probably what I'm most excited about right now is, we have a line of research that's designed to make treatments better and more effective and faster by using principles of learning that are grounded in how children and adults learn implicitly. Which is a very rapid form of learning. And we're just wrapping up an article, our study now for resubmission, I submitted it yesterday that shows that we can cut treatment time in half from 30 minutes to 15 minutes. We can still get kids, giving us the same levels of performance. And again, this is a rapid form of learning that doesn't require you to ask five year olds to think about, well, this is a boy, so we say him. There are no metalinguistic skills. You don't have to ask them to think about a rule and apply that rule. It's very rapid, very input driven and fun to do. We, we like it and the students love it. So I'm very excited about that. I do have a couple new tests that are in the pathway. Yeah! I have two that I'm working on now. One's a little bit further along in it than the other. We're probably about a year off for one of them and maybe a couple years off for the other.
Tiffany Hogan: That’s fantastic, I know how long that takes to get through the pipeline! And I will circle back to the statistical learning just to say that I very much appreciated the language speech hearing services clinical forum that Mary Alt led on statistical learning. You had an article in there, correct?
Elena Plante: I did! I teamed up with a developmental psychologist to talk about what are these basic fundamental principles of learning that can be incorporated into treatment. And it's not like one kind of treatment. It's any kind of treatment. So if you are doing conversational based research or treatment with a four year old, it'll work! And if you are doing worksheet based treatment with a school age child, it still will work. So we were really excited to put all that together in one place and give it some structure so that people could get their heads around it.
Tiffany Hogan: Yeah, I'm excited. I've read maybe three, I'm working through it and I just absolutely love those principles. And having been at Arizona with you several years ago, it was really mind blowing to think about the input that you're providing and how children can pick up on these grammatical structures for instance, or vocabulary learning, based on the types of stimuli you're choosing. And it didn't require meta awareness, which is quite problematic for children with difficulties anyway. So I'm excited to attach that to the podcast resources. I think that'll be great to shine a light on. The other question I asked, the last question is, uh, what is your favorite book from childhood or now?
Elena Plante: I’m going to say Pride and Prejudice, and I started reading it when I was a teenager. And I probably reread it every other year or so often around Christmas. I don't know why!
Tiffany Hogan: Most people choose like something that's more from their young childhood. I like that you're choosing more from your adolescence!
Elena Plante: Yeah! And of course you know, I've read all of Jane Austin's works. There are other really good ones in the series, but yeah. That really is my favorite.
Tiffany Hogan: Fantastic. Well, I appreciate you spending the time chatting. It was really nice, I look forward to getting it out to the podcast listeners.
Elena Plante: Okay. It was my pleasure! Thanks.
Tiffany Hogan: Thank you!
Tiffany Hogan: Check out www.seehearspeakpodcast.com for helpful resources associated with this podcast including, for example, the podcast transcript, research articles, & speakers bios. You can also sign up for email alerts on the website or subscribe to the podcast on apple podcasts or any other listening platform, so you will be the first to hear about new episodes.
Thank you for listening and good luck to you, making the world a better place by helping one child at a time.
6/13/2019 08:40:39 am
Thank you, Dr. Hogan and Dr. Plante for this very informative podcast. I appreciate the clear explanations of concepts that were not taught when I was getting my Master’s degree (when dinosaurs roamed the earth!) I plan on recommending this podcast to my graduate students. Thank you!
8/9/2022 01:53:47 am
Great reading yourr post
Leave a Reply.
Tiffany P. Hogan,
Professor, Communication Sciences and Disorders
Director, SAiL Literacy Lab