Reality check about the Bogus “Validation Studies” – Just Use your Common Sense
Start with this “common sense” CONCEPT. If you were going to test your marksmanship with a rifle, and the target was being placed at 50 yards away, do you think that you would get MORE shots within the RINGS of the target that was 8 inches in diameter or within the rings of a BIGGER target that was 10 inches in diameter? Does this REALLY take much analysis? Not for anyone looking for the truth.
Anyone not brain dead would know that the shooter would have more “hits” within the rings of the target when the area of the target is LARGER. The same exact principle applies to trying to identify drinking subjects at a LOWER BAC level than the previous higher “over the limit” BAC level used in 1977 and 1981. Yet, between 1995 and 1998, NHTSA sanctioned these bogus “validation” studies to be conducted in three states: Colorado, Florida and California. No mandatory laboratory oversight and independent observation was utilized for these police sobriety tests. No double-blind testing was done. Officers were told that their “arrests” were being monitored to determine how often they made the RIGHT decision. Consequently, the officers tended to ONLY arrest the people who were obliterated by alcohol. In one of the three studies, the median BAC level was about double the legal limit of 0.08 grams percent. In another study, the police were permitted to use hand-held breathalyzers, which was specifically forbidden by the stated rules pertaining to these validation studies. Guess who got these contracts and oversaw the “validations?” Dr. Marcelline Burns.
The “findings” of the police officers who KNEW they were being studied, yielded reliability numbers in the 90+% or better range. So, (putting logic aside) on the more difficult “target,” the officers in the three “validation studies” were darn near perfect. Because these three taxpayer-funded reports were sanctioned by the Government, this passes as being “scientific.” Anyone who believes this should be trying to buy the Brooklyn Bridge, or some Jack-in-the-Beanstalk beans. Other REAL scientists who reviewed the raw numbers from the field sobriety test validation studies have pointed out the “loaded deck” given to the officers who were part of the testing. Burns and the other apologists claim that these incredible and “fixed” numbers were the result of highly trained DUI task force officers making the arrests. Complete hogwash! Yet, police trainers across America have been brainwashing officers into believing that — due to their training — they can perform field tests at nearly 100% reliability.
Allen Trapp, a brilliant Carrollton, Georgia DUI trial attorney who passed away in September of 2015, did a comprehensive analysis of the “cooked” numbers of the so-called “validation studies for one of the author’s seminars about 10 years ago. These important points made by Trapp show specifically how NHTSA’s lack of oversight and complete abandonment of the scientific method allow this fraud to be perpetrated on the American driving public:
However, the most interesting statistics from the 1981 study as discussed by Cole and Nowaczyk, (The Champion, August 1995) involve the “dosing differential” of the subjects tested. Most of the subjects (78 percent) were dosed with either high BAC (about 0.15) or low BAC (0.05 and below). (Source: 1981 report, page 15, Table 4) These should have been easy decisions since it should, as a practical matter, be easy for the officers to score an individual as being above a 0.10 BAC when they are 0.15 BAC and above. The same would be true of someone 0.05 and below. NHTSA claims an overall accuracy rate of 0.80 when using the three-test battery, however, this overall accuracy rate of .80 is questionable when over two-thirds (78 percent) should be considered “gimmies” (either dosed high or low, hence the “dosing differential”). In other words, the data of the individuals dosed between 0.05 and 0.15 would undoubtedly have an accuracy rate of much less, however, that data is unavailable. Cole and Nowaczyk opine that one factor in determining the “improval” of the false arrest numbers (47 percent in 1977 down to 32 percent in 1981) could be due in part to the dosing differential.
The number of subjects dosed in the mid-range (0.05 to 0.15) went down from 27 percent (Source: 1977 report, page 19, Figure 5) in the 1977 study to 22 percent in the 1981 study. In other words, only 22 percent of the subjects in the 1981 study were in the more difficult to determine range of between 0.05 to 0.15 BAC. The 1981 study claims a “reliability study” as part of the research in 1981. Reliability basically refers to consistency, or the ability get the same results each time. The reliability portion consisted of asking 145 of the subjects back for retesting two weeks after the original study. The “reliability factor” was a 0.77. This “reliability correlation coefficient” is based on a scale from almost zero to a 1.00. It is interesting to note that a correlation coefficient of 0.9 or above is expected for academic reading tests such as the SAT. This inter-rater reliability coefficient dropped to 0.57 (Source: Page 35, Table 14) when done by different officers. So, when different officers tested the same subjects at the same dose level, the reliability level was very pathetic, and far below scientific acceptability. Dr. Spurgeon Cole states that the scientific community expects reliability coefficients to be in the high 0.80s or 0.90s for a test to be scientifically reliable. This statistic is quite significant and is one of the reasons that judges should not allow an officer to testify that the accused failed the particular test.
The age and gender of the subjects used in the 1981 project, as with the 1977 study, are highly significant when considering any interpretation of the results. In the 1981 study a whopping 80% of the subjects were between the ages of 21 and 34. Again, as with the 1977 study about two thirds of them were male. (Source: 1981 report, page 14, Table 2) The use of a predominately male population in their twenties means that we should question the applicability of the test results to the population as a whole.
Source: Demolishing Police Testimony about SFST Reliability and Accuracy, Allen M. Trapp. Jr., 2006
This is the introduction written by Burns for the Florida “study (which is undated):”
During the years 1975 – 1981, a battery of field sobriety tests was developed under funding by the National Highway Traffic Safety Administration (NHTSA), U.S. Department of Transportation (Burns and Moskowitz, 1977; Tharp, Burns, and Moskowitz, 1981). The tests include Walk-and-Turn (WAT), One-Leg Stand (OLS), and Horizontal Gaze Nystagmus (HGN). NHTSA subsequently developed a training curriculum for the three-test field sobriety test battery, and initiated training programs nationwide. Traffic officers in all 50 states now have been trained to administer the Standardized Field Sobriety Tests (SFSTs) to individuals suspected of impaired driving and to score their performance of the tests.
At the time the SFSTs were developed, the statutory blood alcohol concentration (BAC) for driving was 0.10% throughout the United States. The limit now has been lowered in a number of states to 0.08% for the general driving population. “Zero tolerance” is in effect in some jurisdictions for drivers under age 21, and commercial drivers risk losing their licenses at a BAC of 0.04%. It is likely that additional states will enact stricter statutory limits for driving. In light of these changes, a re-examination of the battery was undertaken by McKnight et al. (1995). They reported that the test battery is valid for detection of low BACs and that no other measures or observations offer greater validity for BACs of 0.08% and higher.
Source: http://www.drugdetection.net/NHTSA%20docs/Burns%20Florida%20Study.pdf
Despite Dismal Reliability, the Field Sobriety Test “Battery” was put in Print and Distributed multiple times in the NHTSA Field Sobriety Test Manuals.
Although six field sobriety tests were used and were a part of the 1977 NHTSA study, none were selected as being indicators of anything, let alone as indicators of alcohol intoxication. Some interesting statistics came out of the 1977 study. Of primary significance was the error rate of the 10 officers involved in the study. Their error rate (false positives, meaning arrest decision was made, yet the person was under a 0.10 BAC) was an astounding 47 percent! You have to read the report, because you will not find this in a NHTSA training manual used by law enforcement in giving police sobriety tests. That is to say, in the 1977 study the officers made the decision to “arrest” a total of 101 people. Of those people “arrested”, 47 percent had a BAC under 0.10 percent. (Source: 1977 report, page 25) This high false positive percentage was totally unacceptable, even according to the author(s) of the study. Marcelline Burns first tried to attribute the high error rate to the inexperience of the officers used in the study.
If this was true, it would seem inexplicable that Burns would again use inexperienced officers in the 1981 NHTSA study. It is significant that approximately 80% of the subjects used in the 1977 study were in their twenties, and about two thirds of the test subjects were male. (Source: 1977 report, page 18, Figure 4) Physical dexterity clearly wanes as people age, with rare exceptions for those who are fanatics about staying fit.
So, the stated task of NHTSA was to find suspected impaired drivers who put public safety at risk. The target group was impaired from imbibing too much alcohol. The study did not test or even consider estimating percentages of drugged drivers who “failed” the field sobriety tests. NHTSA focused only on those test subjects who were dosed with alcohol.
NHTSA, a division of the United States Department of Transportation, eventually approved a group of scientists at the Southern California Research Institute (SCRI) based in Los Angeles, California. Not surprisingly, the winning bid was awarded to Dr. Burns. The primary authors of the final field sobriety test study and report were Dr. Marcelline Burns, Ph.D. and Dr. Herbert Moskowitz, Ph.D., both of whom operated the Southern California Research Institute.
The Standardized Field Sobriety Tests are no Longer Standardized.Ironically, the creators of the standardized field sobriety tests similarly posted a disclaimer (like the AAA disclaimer above) within their training manuals — in all of them released for police training from 1984 through 2006. In 2013, NHTSA abandoned entirely this all-important “standardization” disclaimer. As can be seen below, backlash from prosecutors, DRE officers and standardized field sobriety test instructors has caused NHTSA to put it back in the manual.
The latest 2013 version of the NHTSA SFST manual TOTALLY deleted this critically-important warning to officers that if they DON’T follow every field sobriety testing protocol and SFST screening procedure meticulously, then the validity of all field sobriety test exercises is COMPROMISED. So, thousands of police officers were defectively trained on this manual for two years. It is of the utmost importance that the reader understand that this paragraph was the ONLY “standardization” language in the entire NHTSA SFST manual. Seeing their flagrant and inexcusable error, after criminal defense attorney cross-examination of arresting officers, at trials pointed out the deception and duplicity involved in removing the BOLD TYPEFACE admonition in the field sobriety test manual, NHTSA issued a new sobriety test manual in October of 2015. Significantly, the “admonition” has been taken out of BOLD print and buried in Section VIII, on page 13 of the newest field sobriety test manual.
Thus, these official, “so-called “tests” are anything BUT tests, when you compare these evaluations to highly reliable, important testing such as SAT, ACT, IQ and other truly standardized testing. Imagine that you are going to take the SAT, or the MCAT or the LSAT and the time-keeper shortens your group’s time to finish, decides that she will need to not take the lunch break and does not control loud and disruptive students who are throwing spit balls across the rows of seats, and one test subject has a boom box that is belting out rap music at 80 decibels.
REAL “scientific” education competency tests are standardized against a “norm,” and are repeatable and reliable — to better than 90% repeatability. When you take an IQ test, thousands of sample tests and controlled test monitoring and administration goes into the effort, because we all know that a score of 100 is the benchmark for having an “average” IQ. Without doing the crucial establishment of norms, how would a person ever trust that their IQ test was yielding accurate information? Yet, NOT A ONE of the field sobriety tests were exposed to comparison to any “norms,” and none of the sobriety tests come close to achieving these lofty percentages when predicting who is above the legal limit and who is not. In fact, due to the very nature of what the 3-test battery of sobriety tests seek to score, none of these police sobriety tests will EVER approach 90% or greater repeatability.
The Standardization Paragraph that was REMOVED in 2013
Below is the now-omitted “standardized” language. It was always printed in BOLD print (the only bold print in the entire manual) and in ALL CAPS until the release of the 2013 Standardized Field Sobriety Test participant guide:
IT IS NECESSARY TO EMPHASIZE THIS VALIDATION APPLIES ONLY WHEN:
It took Research from OUTSIDE the USA to Point out the Flaws in NHTSA’s SFSTs
A British study reported in 2009 highlighted the folly of Burns’ oversights and failure to establish “norms” for various age groups. In an article published in Accident Analysis and Prevention, vol. 41, p. 412 to 418 (2009), three medical researchers (Dixon, Clark and Tiplady) found the following statistics when comparing reliability of FIT (field impairment tests like WAT and OLS) and RITA (roadside impairment testing apparatus, an analytical device like our portable breath testers):
One hundred and twenty two healthy volunteers aged 18–70 years took part in this two-period crossover evaluation. The volunteers received a dose of alcohol and placebo, in the form of a drink, on separate days. Doses were calculated to produce blood alcohol concentrations of 90 mg/100 ml and RITA and FIT testing was carried out between 30 and 75 min post-drink. FIT was found to have a diagnostic accuracy of 62.7%. However, there was a substantial age effect for FIT scores, with volunteers aged over 40 showing failure rates on placebo similar to the failure rates on alcohol of younger volunteers. The accuracy of RITA was between 66 and 70%, not significantly higher than that of FIT. However, RITA did not show a marked age effect. Advantageously, this could result in fewer false positives being recorded if RITA were deployed at the roadside. Horizontal gaze nystagmus (HGN) was also investigated and posted an accuracy of 74%. The inclusion of HGN as one component of a UK roadside impairment test battery warrants further exploration with other drugs.
Title: Evaluation of Roadside Impairment Test device using Alcohol
Source: http://www.sciencedirect.com/science/article/pii/S0001457509000049
Like being at a Carnival, you can now Understand the “Shell Game” Played by NHTSAA carnival barker will try to lure you over to his table and let you wager a few dollars and try to guess which walnut shell of the three in front of you is covering the pea underneath it.
It looks like an easy bet to win. But, you don’t know that the carnival barker has rigged the game. The same principle applies to the NHTSA “standardized” field sobriety tests, except instead of losing a small bet at a carnival, you go to jail, arrested for DUI. After the “standardization” paragraph was “lifted” in 2013, NHTSA had to correct this planned deception, because someone told the folks in Washington that this crucial paragraph was the ONLY provision in the manual to provide the AURA of these field sobriety tests being “scientific” in any way.
Knowing the foregoing information, you clearly can see the “shell game” of the now “RE-Standardized” field sobriety tests. The seven steps to NHTSA abandoning the myth of standardization in 2013, over the course of three decades followed this path:
This “shell game” had to be changed because scores of DUI lawyer specialists around the country who have taken the full police training courses from NHTSA-trained field sobriety instructors, starting in July of 1994, in Atlanta, GA. Now, dedicated DUI attorneys like the author have taken the police training courses multiple times — both the practitioner course and the instructor course – to be able to run circles around the police-trained officers who are regurgitating information from the course that will not withstand cross-examination.
The best DUI lawyers know when the officer is performing the evaluations incorrectly, and these attorneys have been using cross-examination to neutralize the bogus tests, in the eyes of the jury. So, when you follow the federal money, the proponents of the tests made a bundle off foisting the tests on the American public. American taxpayers paid out hundreds of millions of dollars to train officers on the pseudo-tests. Courts and local governments racked up billions of dollars in fines, and poor old John Q. Public still THINKS that he has to attempt to perform the optional, voluntary “fixed” tests. Until word is spread to NEVER do the field sobriety tests, the cycle of dishonesty and deception surrounding the so-called roadside sobriety tests will continue.
How Court Decisions and Appeals have helped Debunk the False Claims of the Field Sobriety Test Proponents
One of the most important cases in the last two decades in analyzing and debunking the so-called “field sobriety test” battery is United States v. Horn, 185 F. Supp.2d 530 (D. Md., 2002), which dealt with a case that occurred on a military base (and was therefore handled by a federal judge).
The exhaustive review of the reliability of field sobriety test evidence led the federal court to rule as follows:
Horn has filed a motion in limine to exclude the evidence of his performance on the field sobriety tests, asserting that it is inadmissible under newly revised Fed.R.Evid. 702 and the Daubert/Kumho Tire decisions. The Government has filed an opposition, and Horn has filed a reply. In addition, a two day evidentiary hearing was held, pursuant to Fed.R.Evid. 104(a), on November 19 and 20, 2001, and additional testimonial and documentary evidence was received, which is discussed in detail below. At the conclusion of this hearing, the following ruling was made from the bench, the Court also announcing its intention subsequently to issue a written opinion on this case of first impression:
This federal judge in Horn took two days of scientific evidence from DUI expert witnesses in the Maryland cases before ruling on the scientific reliability of the standardized field sobriety tests. Multiple REAL experts in the fields of testing and measurement and medicine were called upon to testify about the statistics yielded in the NHTSA studies.
One of those experts was an expert in the field of “testing and measurement,” Dr. Spurgeon Cole. In his testimony and published writings, the former Clemson University clinical psychology professor was highly critical of the claimed reliability of the SFSTs if used to prove the precise level of a suspect’s alcohol intoxication or impairment. His 1994 article “Field Sobriety Tests: Are They Designed for Failure?” (co-authored with Professor Ron Nowaczyk) published in the journal Perceptual and Motor Skills, analyzed the 1977 Report, the 1981 Final Report, and the 1983 Field Evaluation report published by NHTSA regarding the SFSTs.
The Cole and Nowaczyk study observed the following:
In most states, like Georgia, no such scrutiny has ever been given to the field sobriety test “battery.” To the contrary, in Georgia, even where the arresting officer ADMITS doing the field sobriety tests incorrectly, the Georgia Court of Appeals will not uphold the trial court’s exclusion of these “voodoo” evaluations. State v. Pierce, 266 Ga. App. 233, 596 S.E.2d 725 (2004). The most shoddy field sobriety test, even the pseudo-scientific HGN evaluation, is routinely allowed to be heard by a jury for whatever weight and credibility the jury wants to give it. This not only abrogates the prior rulings of the Georgia Court of Appeals (e.g., State v. Pastorini, 222 Ga. App. 316, 474 S.E.2d 122 (1996), but completely ignores the concept of proper “scientific evidence.”
Legal decisions from other states, like Tennessee, State v. Murphy, 953 S.W.2d 200 (Tenn. Supreme Court, 1997); Ohio in State v. Homan, 89 Ohio St. 3d 421, 732 N.E.2d 952 (2000); State v. Lasworth, 131 N.M. 739, 42 P.3d 844 (N.M. Ct. App. 2001) for three opinions written by appellate courts that have no vested interest in seeing that every DUI charge is prosecuted to the maximum degree, regardless of the unscientific, bogus procedures behind the tests.
Numerous other states, including Texas, Alabama and Mississippi, do not permit HGN evidence to be admitted at trial. Other states admit it ONLY if an expert lays a proper foundation showing that this psycho-physical field sobriety test was done correctly, and following good scientific procedures. State v. Murphy, 953 S.W.2d 200 (Tenn. 1997). Skilled DUI attorneys who specialize in DUI defense call these the “NHTSA SFSTs.”
How should people faced with a DUI investigation cope with the inexplicable differences in judicial interpretation of the fairness of field sobriety testing? It is NEVER taken any roadside evaluations, since these are 100% optional and voluntary. Simple solution: JUST SAY NO. Any knowledgeable DUI lawyer will tell anyone who listens that taking bogus field sobriety tests that can lead to both your arrest and possible conviction is insane. If you submitted to sobriety tests, and are facing trial, seek out the best DUI lawyer you can, and let him or her bring in a DUI expert witness to educate your jury.
Important Scientific Analysis and Articles Reviewing the Reliability of Standardized Field Sobriety Tests
Statistical Evaluation of Standardized Field Sobriety Tests, by Hlastala, Polissar & Oberman, Journal of Forensic Science, vol. 50, issue 3, 2005
Standardized Field Sobriety Tests (SFSTs) are used as qualitative indicators of impairment by alcohol in individuals suspected of DUI. Stuster and Burns authored a report on this testing and presented the SFSTs as being 91% accurate in predicting Blood Alcohol Concentration (BAC) as lying at or above 0.08%. Their conclusions regarding accuracy are heavily weighted by the large number of subjects with very high BAC levels. This present study re-analyzes the original data with a more complete statistical evaluation. Our evaluation indicates that the accuracy of the SFSTs depends on the BAC level and is much poorer than that indicated by Stuster and Burns. While the SFSTs may be usable for evaluating suspects for BAC, the means of evaluation must be significantly modified to represent the large degree of variability of BAC in relation to SFST test scores. The tests are likely to be mainly useful in identifying subjects with a BAC substantially greater than 0.08%. Given the moderate to high correlation of the tests with BAC, there is potential for improved application of the test after further development, including a more diverse sample of BAC levels, adjustment of the scoring system and a statistically-based method for using the SFST to predict a BAC greater than 0.08%.
Source: http://www.astm.org/DIGITAL_LIBRARY/JOURNALS/FORENSIC/PAGES/JFS2003386.htm
Steven J. Rubenzer, The Standardized Field Sobriety Tests: A Review of Scientific and Legal Issues, Law and Human Behavior, American Psychology, vol. 32, issue 4, August 2008
Source: http://link.springer.com/article/10.1007%2Fs10979-007-9111-y#/page-1
“[T]he research that supports their (field sobriety test) use is limited, important confounding variables have not been thoroughly studied, reliability is mediocre, and their developers and prosecution-oriented publications have oversold the tests.”
“The theory that alcohol affects SFST performance is clearly subject to falsification if BAC is the primary criterion, and there are numerous studies that correlate SFST performance and BAC level. The proposition that SFSTs are related to driving impairment is also falsifiable but more difficult to test. Whereas impairment on a closed driving course might readily be correlated with SFST performance, some significant performance deficits occur only in response to rare events or in interaction with other vehicles or drivers (e.g., road rage). The theory that SFST performance is related to driving performance is falsifiable, but as yet untested.”
Cole and Nowaczyk, Are the Field Sobriety Tests Designed to Fail? Perception and Motor Skills, vol. 79, August 1994
Field sobriety tests have been used by law enforcement officers to identify alcohol-impaired drivers. Yet in 1981 Tharp, Burns, and Moskowitz found that 32% of individuals in a laboratory setting who were judged to have an alcohol level above the legal limit actually were below the level. In the 1977 study, the number was 46% improperly arrested. In the Cole and Nowaczyk study, two groups of seven law enforcement officers averaging over 12 years of experience in DUI arrests each viewed videotapes of 21 sober individuals attempting to perform a variety of field sobriety tests or normal-abilities tests, e.g., reciting one’s address and phone number or walking in a normal manner. Officers judged a significantly larger number of the individuals as impaired when they performed the field sobriety tests than when they performed the normal-abilities tests. The need to reevaluate the predictive validity of field sobriety tests is discussed.
Source: http://www.ncbi.nlm.nih.gov/pubmed/7991338
The Horizontal Gaze Nystagmus Test: Fraudulent Science in the American Courts, J.L. Booker, vol. 44, No. 3, p. 133-139 (2004) Science and Justice: The Journal of Forensic Science Society
Bypassing the usual scientific review process and touted through the good offices of the federal agency responsible for traffic safety, it was rushed into use as a law enforcement procedure, and was soon adopted and protected from scientific criticism by courts throughout the United States. In fact, research findings, training manuals and other relevant documents were often held as secrets by the state. Still, the protective certification of its practitioners and the immunity afforded by judicial notice failed to silence all the critics of this deeply flawed procedure. Responding to criticism, the sponsors of the test traveled the path documented in this paper that led from mere (if that word can ever truly apply to a matter of such gravity) carelessness in research through self-serving puffery and finally into deliberate fraud – always at the expense of the citizen accused.
In 1998 the integrity of the statistical evaluation of the original research upon which the validity of the tests rested was unfavorably reviewed. In 2001 new research indicated that the Horizontal Gaze Nystagmus (HGN), the cornerstone of the test battery was fundamentally flawed and that the HGN test was improperly conducted by more than 95% of the police officers who used it to examine drivers suspected of driving while intoxicated (DWI). This summary critique demonstrates that it is scientifically meretricious and that the United States Department of Transportation indulged in deliberate fraud in order to mislead the law enforcement and legal communities into believing the test was scientifically meritorious and overvaluing its worth in the context of criminal evidence…
Source: http://www.ncbi.nlm.nih.gov/pubmed/15270451
Other Booker publications:
Booker, J.L.., “The Field Test Paradox” – Voice for the Defense, 1996, vol. 25, pages 8-10.
Booker, J.L., The Application of the ‘Known and Potential Rate of Error’ Criterion to the Standardized Battery of Field Sobriety Tests – Voice for the Defense, 1998, vol. 27, pages 24-27.
Booker, J.L., End Position Nystagmus as an Indicator of Ethanol Intoxication – Science and Justice, 2001 –vol. 41, pages 113-116.
Horizontal Gaze Nystagmus: A Review of Vision Science and Application Issues, Journal of Forensic Sciences, Rubenzer, Steven J. & Stevenson, Scott. (2010). Journal of Forensic Sciences, 55(2), 394-409
Summary of the findings:
The Horizontal Gaze Nystagmus (HGN) test is one component of the Standardized Field Sobriety Test battery. This article reviews the literature on smooth pursuit eye movement and gaze nystagmus with a focus on normative responses, the influence of alcohol on these behaviors, and stimulus conditions similar to those used in the HGN sobriety test. Factors such as age, stimulus and background conditions, medical conditions, prescription medications, and psychiatric disorder were found to affect the smooth pursuit phase of HGN. Much less literature is available for gaze nystagmus, but onset of nystagmus may occur in some sober subjects at 45 degrees or less. We conclude that HGN test used by police is limited by a large variability in the underlying normative behavior. The screening methods used and the inconsistent testing environments that are encountered at the roadside often affect the reliability of reported results. Plus, the original NHTSA-funded studies in 1981 and 1977 suffer from a lack of rigorous validation in their laboratory settings.
Source: http://www.ncbi.nlm.nih.gov/pubmed/20102467
From Dr. Greg Kane, MD at his website: www.sfst.us
Simple, but it doesn’t work. This is one of those times the simple and obvious answer is wrong. It turns out:
There are two different kinds of medical tests: direct and indirect.
Each kind of test has its own formula for accuracy. If you mix formulas, if you use the wrong formula for your kind of test, the answer you get will be wrong.
For the SFST, NHTSA uses the wrong formula.
Using the wrong formula, the accuracy NHTSA calculates for the SFST is spectacularly wrong.
Dr. Kane, on his website, explains NHTSA’s “funny math” with a highly understandable grid and breakdown of what “accuracies” really mean:
Accuracy is more complicated than you think
Do the SFST 100 times and you’ll get the correct answer ACCURACY percent of the time. That’s what NTHSA teaches DUI officers. But it doesn’t work. Accuracy is more complicated than common sense makes you think.
Here’s a Field Sobriety Test accuracy table from NHTSA’s original “scientific” FST validation project,Psychophysical Tests for DWI Arrest 1977. I’ve changed the labels to make the thing easier to read; you can get the original at PDFs. In this project each person tested had two measurements made: blood alcohol and a Field Sobriety Test. The question is, “what percent of the time did the FST measurements correctly predict the alcohol measurements?” The answer to that question will be the accuracy of the SFST.
The table sorts people by test result. Look under the pink label FST coordination test. People who failed the FST were counted in the Fail ↓ column. People who passed were counted in the Pass ↓ column. Over on the side, people whose measured alcohol was high went in the ALCOHOL high → row. People whose alcohol was low went it the ALCOHOL low → row. Tables like this set up True Positive, True Negative, False Positive and False Negative results in a way that makes it easy to answer important questions about this FST.
When people were guilty, how accurate was this FST? Look at the row ALCOHOL high→. Follow the red arrow across to the “% Correct Decisions” column. See the red circle around 84? In this study when people had a high alcohol level, the FST gave the correct answer 84% of the time. When people were guilty, the accuracy of this FST was 84%.
When this FST said people were guilty, how accurate was that prediction? Look at the column Fail ↓. Follow the orange arrow down to the “% Correct Decisions” row. See the orange circle around 53? When people failed this FST, the test was correct only 53% of the time. When this test said people were guilty, the accuracy of the test was 53%—a coin toss.
Wait, wait, wait! Those two accuracies are both about people who were guilty. How come they’re different— 84%, 53%? The answer is, the two accuracies answer questions that are subtly different. One is about people who are guilty. The other is about people the test says are guilty. Those groups are subtly different. They count different groups of people. So the accuracies are different. And notice that although the difference in what groups count is subtle, the difference in accuracy—84% vs. a coin toss—is dramatic.
When people were innocent, how accurate was this FST? Blue circle, 73%.
When this FST said people were innocent, how accurate was that prediction? Green circle, 93%.
How often did this FST give the correct answer? Pink circle, 76%.