C

Standardized tests have long played a major role in allocating educational opportunities to our nation's students––opportunities that, in turn, are the gateway to success in competitive job markets and the key to economic security. But for female students, these tests frequently have been a gatekeeper, barring access to progress.

Before Title IX's enactment, many schools not only administered tests in a gender–biased manner, but also interpreted test results in a way that reflected stereotypes rather than providing real insight into students' interests and capabilities. For example, in the 1960s and early 1970s, there were two versions of the Strong Vocational Interest Blank, a commonly used vocational test: pink for young women and blue for young men. On this test, young men were asked whether they'd like to be President; in contrast, young women were asked whether they'd like to be the wife of the President.

Other less blatantly biased tests have been shown over the past 25 years to be flawed assessment tools that unfairly disadvantage girls. Title IX has provided a means for ensuring tests are designed and used in a manner that is free from gender bias. While a number of constructive steps have been taken since the law's enactment to eliminate these biases, it is imperative that such tests continue to be scrutinized closely for fairness, particularly since increased emphasis is now being placed on standardized testing in the context of national education reform.

Gender Gaps. There is a substantial record of disparities in scoring between male and female students on many standardized tests dating from before Title IX's enactment and continuing over the last 25 years, gaps that have had a harmful impact on educational and economic opportunities available to women and girls, as well as students of color. Under Title IX, tests must be valid predictors of success in the areas being tested. In other words, the test must measure what it purports to measure. If the test does not, and if it produces a scoring deficit for one sex, it has a discriminatory impact on the members of that sex and is unlawful.

Gaps in scoring have appeared on the most frequently used vocational aptitude tests in secondary schools, the Armed Services Vocational Aptitude Battery (ASVAB) and the Differential Aptitude Test (DAT), and on career interest inventories. Secondary schools have long relied on these tests for career counseling and vocational education placement, even without evidence showing that they are valid measures of future performance. Schools that rely on such tests frequently use the results to steer young women into careers that are traditional for their sex, with lower earning power and fewer opportunities for upward mobility.

The past 25 years also have seen gender gaps in college admissions tests. Since 1972, females consistently have scored lower than males on the SAT, in both the verbal and math sections of the test, with girls falling behind boys in math by as many as 61 points. In 1996, the average combined SAT score of boys was still 39 points higher than that of girls, a pattern that persisted within every racial and ethnic group. There also are disparities in the PSAT, used for college scholarships, and the ACT, used for college admissions, as well as most examinations for admission to professional and graduate school. As with the tests used in the vocational setting, there are questions regarding whether these tests accurately predict students' achievements. For example, research has shown that the SAT, which is designed to be an indicator of first–year college performance, underpredicts females' performance: while young women score lower than young men on the SATs, they earn higher grades when matched for the same courses in all subjects in their first year in college.

The Educational Testing Service (ETS) issued a report in 1997 concluding that while there are some important differences in the performance of boys and girls on standardized tests, the average differences are small. The ETS study, however, confirms that large gender disparities persist on the high–stakes tests such as the SAT and PSAT. The report does not refute ETS's earlier acknowledgment that the SAT underpredicts women's college performance while overpredicting that of male students. The ETS contends that the gaps that do exist on high–stakes tests are in part the result of differences in interests and experiences, rather than biases in testing. The fact that women earn higher grades in the same subjects appears to belie this justification.

Whatever its causes, the gender gap on the PSAT and the SAT has a demonstrable impact on girls and women in several ways. Results on these tests directly affect a student's chances of gaining admission to the college of her choice. They frequently are the basis for selecting students for participation in programs for 'gifted and talented' youth. In addition, they are a major factor in determining eligibility for valuable college scholarships. For example, each year more than one million high school juniors compete for a share of the $27 million awarded through the prestigious National Merit Scholarships, which are based solely on PSAT scores. Because girls, on average, score significantly lower than boys on the PSAT, they receive only 40 percent of the Merit Scholarship awards even though they are 56 percent of the test–takers.

Mean Combined SAT Scores
Year	Male	Female	Gender Gap
1972	959	913	46
1996	1034	995	39

Closing the Gaps. In 1997, the College Board and ETS, which administer and design the PSAT (along with the SAT), agreed to revise the PSAT to include a test of written English to better reflect important educational priorities, as part of a settlement of a complaint filed with the Education Department's Office for Civil Rights (OCR). It remains an open question whether this revision will, in fact, close or reduce the gender gap. The complaint alleged that the PSAT was gender biased in violation of Title IX and that it hurt young women because National Merit Scholarships, the eligibility for which is based on PSAT scores, were awarded disproportionately to male candidates. In addition to settling this complaint, the College Board has stated that it already eliminates questions that are determined to favor one gender unfairly over the other, in an effort to make all of its tests as fair as possible.

Room for Improvement

Scoring gaps have appeared in a wide variety of tests: the Armed Services Vocational Aptitude Battery, the Differential Aptitude Test, the SAT, PSAT, and other tests for admission to professional and graduate school.
Reliance in tests persists despite questions about their predictive validity. For example, research shows the SAT underpredicts young women's performance in college.
The gaps affect educational benefits available to girls and women. For example, girls receive only 40 percent of National Merit Scholarships, even though they are 56 percent of test–takers for the PSAT, the sole criterion for these awards.

Other efforts have been made to reduce unfair uses of standardized tests, beyond the agreement on the PSAT. Many colleges no longer require applicants for admission to submit SAT or ACT scores. And some scholarships no longer are based solely on test scores. For example, in 1989 a federal court held in Sharif v. New York State Education Department that the State of New York no longer may rely exclusively on SAT scores to determine the award of state Regents and Empire State college scholarships because such reliance had a discriminatory impact on female students in violation of Title IX: the record showed that while boys were 47 percent of the scholarship competitors, they received 72 percent of the Empire Scholarships and 57 percent of the Regents Scholarships. The court ordered the state to award these scholarships in a manner that more accurately measures students' high school achievement. As soon as the state began to take grades into consideration, the scholarship awards became more equitably distributed among male and female students.

Persistent Scoring Differentials. While these are laudable steps forward, and gender differences on many standardized tests are in fact declining, significant differences remain in many areas. For example, while the gender gap in math appears to be diminishing, there is evidence that gender differences on science tests for students aged 9, 13, and 17, as tracked by the National Assessment of Educational Progress (NAEP), have not declined and may be increasing, even though girls receive grades in science that are as high as or higher than those of boys. It is therefore critical that standardized tests continue to receive close scrutiny to ensure that their design is not biased and that they are used only for purposes for which their predictive validity has been demonstrated. The need for vigilance is particularly acute since attacks on affirmative action have prompted some colleges to rely more heavily on standardized tests in their admissions decisions, and current proposals by the Clinton Administration would make nationwide, standardized fourth–grade reading and eighth–grade math tests the centerpiece of an effort to improve this country's educational performance. Holding schools accountable for their effectiveness in educating our nation's students is a worthy objective, but the drive for education reform must not be allowed to run roughshod over our commitment to testing that is fair to all students.

Grade: C

Recommendations:

National efforts to test students' proficiency in math and reading should include rigorous examination of the proposed test instruments to ensure they are valid for their stated purposes.
OCR should monitor closely the ETS/PSAT settlement to ensure that the revised test is fair and does not perpetuate disparities in eligibility for National Merit Scholarships. OCR also should evaluate other tests, such as the armed forces vocational tests, to ensure that they are valid for their stated purposes.
Educational institutions should not rely alone on standardized tests as measures of students' achievement or academic potential; they should examine other forms of assessment that better reflect students' level of accomplishment and learning style.

References:

American Association of University Women. How Schools Shortchange Girls: The AAUW Report (researched by Wellesley College Center for Research on Women) (AAUW Educational Foundation, 1992).
Katherine Connor and Ellen Vargyas, 'The Legal Implications of Gender Bias in Standardized Testing,' 7 Berkeley Women's Law Journal (1992).
N. Medina, and D. Neill, Fallout From the Testing Explosion (National Center for Fair and Open Testing, 1990).
P. Rosser, The SAT Gender Gap: Identifying the Causes (Center for Women Policy Studies, 1989).
Sharif v. New York State Education Dept., 909 F. Supp. 345 (S.D.N.Y. 1989)