1. Why is it important to understand the historical roots of psychological testing? How do changes in psychological testing reflect changes in society? Explain. 350 words, 2 references (peer-review journal articles only)

2. The text asserts that any test can be an instrument for good or harm. How are tests used for good? How are they used for harm? Should testing be eliminated altogether?  Explain. 350 words, 2 references (peer-review journal articles only)

3. Why is it important for achievement tests to be aligned with learning outcomes? What happens if an achievement test is not aligned with learning outcomes?  Explain. 350 words, 2 references (peer-review journal articles only)

4. Why is the reliability of an instrument important? Could you write a test of your own one evening, administer it to a sample the next day, and publish the results of your study in a peer-reviewed journal? Why or why not? Explain. 350 words, 2 references (peer-review journal articles only)

5. The reliability of the Rorschach test has often been disputed. What would be the best way to evaluate the reliability of this instrument? Why? Explain. 350 words, 2 references (peer-review journal articles only)

6. Distinguish between reliability and validity. Is it more important in psychological testing to have a test with high reliability, or one with high validity? Why? Explain. 350 words, 2 references (peer-review journal articles only)

7. Consider these three types of validity: construct, content, and criterion. Which is most valuable to a psychologist in evaluating each of the following types of tests: an achievement test, a projective diagnostic test, and a job-proficiency test? Defend your answer. Explain. 350 words, 2 references (peer-review journal articles only)

8. What are the most important considerations when interpreting achievement test scores? Why? What are the most important considerations when making decisions based on achievement test scores? Why? Explain. 350 words, 2 references (peer-review journal articles only)

9. What is the difference between item discrimination and item difficulty? In what ways are these measures valuable in the evaluation of a test? Why? Explain. 350 words, 2 references (peer-review journal articles only)

10. Distinguish between the constructs of intelligence and achievement in terms of how they are defined and how they are measured. In what ways could these constructs be used to create a theoretical framework for research? Support your answer. Explain. 350 words, 2 references (peer-review journal articles only)

Professional Psychology: Research and Practice Professional Psychology: Resea 2000, Vol. 31, No. 2, 117-118 Copyright 2000 by the American Psychological Association, Inc. 0735-7028/00/S5.00 DOI: 10.1037//0735-7028.31.2.117 Psychological Testing at the End of the Millennium: A Brief Historical Review Kurt F. Geisinger Le Moyne College The author presents a brief historical overview of the origins of psychological testing 100 years ago and its continuing development. Special emphasis is placed on test administration and test standardization procedures. The author also illustrates how short-term immediate social needs have stimulated innovation and long-term development. Accommodation to test-taker differences has been a long-standing technical detail, and the development and refinement of group testing procedures has been critical to large- scale use. The history of psychological and educational testing is a rela- tively short one, extending just more than 100 years. The term mental test was first used in print in 1890 by James McKeen Cattell (1890). This history is one in which necessity repeatedly begets innovation. Problems in the French schools, during a period that could be described as the advent of public education, where teachers first had to deal with larger class sizes comprising stu- dents with diverse backgrounds, encouraged Alfred Binet to con- struct what most individuals consider to be the first modern intel- ligence test. I use the term modem because it is reported that Chinese society around 2200 B.C.E. was a “test-dominated soci- ety” (Thorndike & Lohman, 1990, p. 1). At that time in China, various civil service positions were distributed by means of formal assessments of the skills of various, privileged applicants (see DuBois, 1970). One of the most potent influences on those involved in the development and use of tests was the early German psychologist, Wilhelm Wundt, who founded the first psychology laboratory and stressed the precise control of independent variables under inves- tigation. Wundt was an experimental psychologist who eschewed differences among individuals and wanted to show how individu- als were the same, not different (Cohen, Swerdlik, & Smith, 1992, p. 46). If different individuals behaved differently when exposed to the same independent variable, he considered this difference to be an error. That is the opposite perspective from those involved in testing, who consider differences among people to be “truth” as we know it. Nevertheless, many, if not most, of the early leaders in testing were students in his laboratory. Among them were Charles KURT F. GEISINGER received his PhD in psychometrics in 1977 from Pennsylvania State University University Park Campus. He is academic vice president and professor of psychology at Le Moyne College. His research interests include the testing of language minorities and those with disabilities, admissions testing in higher education, test validity, and test fairness. THIS ARTICLE REPRESENTS the introduction of a paper that was presented at the Quest Meeting in Oswego, NY, in April 1996. CORRESPONDENCE CONCERNING THIS ARTICLE should be addressed to Kurt F. Geisinger, 322 Grewen Hall, Le Moyne College, 1419 Salt Springs Road, Syracuse, New York 13214-1399. Electronic mail may be sent to [email protected] Spearman, who developed several early theories of intelligence and the statistics to research them; Victor Henri, a coauthor with Binet of the early French test of intelligence; Emil Kraepelln, who founded the technique of word association; Edward B. Titchener, who founded the first psychology laboratory at Cornell University; G. Stanley Hall, who founded the Psychology Department at Johns Hopkins University and the fast American Journal of Psychology (which went by that very name); and James McKeen Cattell, whom, as mentioned previously, first used the term mental test. Cattell was an American who earned his doctorate in Wundt’s laboratory at the University of Leipzig. Wundt’s influence was so strong that perhaps the most common theme among the early leaders in testing was that the administration of measures needed to be strictly controlled so that they were interchangeable across individuals. With such strict controls, all differences in perfor- mance were the result of individual differences rather than differ- ences in test administrations or “error” as had been believed previously. Between 1904 and 1915, numerous of the early leaders in testing, including Binet, Lewis S. Terman, and E. L. Thorndike all called for exacting standards in the control of testing procedures and stimuli. Binet (e.g., Binet & Simon, 1905) had begun his career as an experimental psychologist who was harshly criticized for his lack of experimental rigor. His later work, however, was characterized by extreme rigor, according to some historians of psychological science (see Thorndike & Lohman, 1990). Simi- larly, Thorndike (1904) wrote, “Every extrinsic condition influ- encing that ability should be alike for all” (p. 160). Terman adapted Binet’s tests for use in the United States. After testing his first 400 individuals with the instrument he adapted or developed, Terman had three general conclusions regarding testing, and one of these was that “it was necessary to standardize directions and administrative procedures if test results were to be comparable” (Thorndike & Lohman, 1990, p. 31). It may be noted that until the second decade of the 20th century, all testing was performed individually, by psychologists and other trained professionals as- sessing individuals, such as schoolchildren, one at a time. The logic of testing follows the paradigm of the experimental method so that the variance in the test scores is reflective of intraindividual differences rather than any differences in test administration (Gei- singer, 1994b). 117 118 GEISINGER After the schools, the military was the next place where mea- sures developed by psychologists were used. “As early as 1908, Binet had advocated the use of intelligence tests by the French army” (Thorndike & Lohman, 1990, p. 41). Of course, the event of greatest historical significance during this period was the First World War. With the outbreak of this conflict, American psychol- ogists were formed into 13 committees by then-president of the American Psychological Association, Robert Yerkes, an experi- mental psychologist. Yerkes appointed himself to chair one of these committees, the Committee on the Psychological Examina- tion of Recruits, and placed Terman, among others, on the com- mittee. The goals of the testing, briefly noted, were to select those most able to move into positions of responsibility, classify recruits to positions in which they would succeed, and aid in discharging those not able to succeed. The committee borrowed heavily from the work of one psychol- ogist of the day, Arthur S. Otis, to meet their inconceivable time line. “They met for the first time in May of 1917, and by August of that year the tests were ready for a large scale tryout” (Thorndike & Lohman, 1990, p. 44). Two test forms were devel- oped by the Committee on the Psychological Examination of Recruits: the Army Alpha and the Army Beta. The Army Alpha was based on the work of Terman’s Stanford University colleague, Otis, who has often been credited with developing the first group- administered test of intelligence. This test required the reading of English and was hence administered only to literate recruits. The second test was called Form Beta or Army Beta: [ItJ was modeled on a test developed by Pintner and Paterson (1915, 1917) for use with deaf subjects. It employed a variety of form boards and mazes…. Administration of the test required no use of language. Instructions were given in pantomime. (Thorndike & Lohman, 1990, p. 45) Haney (1981) wrote, “In less than two years, the group- administered Army Alpha test was given to more than 1.7 million recruits” (p. 1022). Even this brief history demonstrates several concepts still present in psychological testing. First, from the beginning of psychological testing, the control of test administration has been of major consequence. Second, the development of the Army Beta test form makes evident that from this same early period, accom- modations were being made for individuals with special needs, be they hearing impaired or unable to communicate in English. Third, it may also demonstrate that some common accommodations are able to be used by individuals with varying needs with regard to test administration. That the same or a similar test could be used for individuals who had difficulty hearing or were not proficient in English demonstrates the possibility of accommodating different individuals in this way. Fourth, the advent of group-administered tests radically changed the testing movement in the United States. Prior to the First World War, tests were only individually admin- istered. With the advent of group tests, however, large numbers of individuals could be economically and quickly assessed. With such widespread use, though, it is also possible that the use of tests and the information learned from tests could have a greater impact, often beneficial, but certainly sometimes detrimental. Fifth, exam- inations could be adapted across languages, cultures, and national borders, albeit with significant cautions (Geisinger, 1994a). Psychological testing continues to evolve, with both the nature of the assessments that we use and the criteria against which we evaluate tests changing, at times rapidly (Geisinger, 1992). The psychological profession will need to adapt as well by developing measures to meet new needs or new conceptions of human char- acteristics, continually learning to use new measurements, and changing existing measures (e.g., into different languages or with accommodated test administrations). Such changes necessitate continued excellence in professional training in graduate school and beyond. References Binet, A., & Simon, T. (1905). Upon the necessity of establishing a scientific diagnosis of inferior states of intelligence. L’Annee Psy- chologique, 11, 163-191. Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373-381. Cohen, R. J., Swerdlik, M. E., & Smith, D. K. (1992). Psychological testing and assessment: An introduction to tests and measurement (2nd ed.) Mountain View, CA: Mayfield. DuBois, P. H. (1970). A history of psychological testing. Boston: Allyn & Bacon. Geisinger, K. F. (1992). The metamorphosis in test validation. Educational Psychologist, 27, 197-222. Geisinger, K. F. (1994a). Cross-cultural normative assessment: Translation and adaptation issues influencing the normative interpretation of assess- ment instruments. Psychological Assessment, 6, 304-312. Geisinger, K. F. (1994b). Psychometric issues in testing students with disabilities. Applied Measurement in Education, 7, 121-140. Haney, W. (1981). Validity, vaudeville, and values: A short history of social concerns over standardized testing. American Psychologist, 36, 1021-1034. Pintner, R., & Paterson, D. G. (1915). The Binet scale and the deaf child. Journal of Educational Psychology, 6, 201-210. Pintner, R., & Paffirson, D. G. (1917). A scale of performance tests. New York: Apppleton. Thorndike, E. L. (1904). An introduction to the theory of mental and social measurements. New York: Science Press. Thorndike, R. M., & Lohman, D. F. (1990). A century of ability testing. Chicago: Riverside. Received April 6, 1999 Revision received December 2, 1999 Accepted December 2, 1999 •


