Reliability and ValidityAs part of the test development process, researchers strive to create psychometrically sound instruments with acceptable levels of reliability and validity. When test items are

Hire our professional essay experts at Gradehunters.net who are available online 24/7 for an essay paper written to a high standard at an affordable cost.


Order a Similar Paper Order a Different Paper


Reliability and Validity

As part of the test development process, researchers strive to create psychometrically sound instruments with acceptable levels of reliability and validity. When test items are good-fit ones for the construct of an investigation, tests are much more likely to render results that lead to sound decision making than when they are not.

In a minimum of 200 words, respond to the following:

  • Identify the psychometric properties of a test including a description of the various types of reliability and validity.
  • Is it possible for a test to be reliable but not valid? Valid but not reliable?
  • Describe ways in which a test constructor can boost the levels of reliability of a test.
  • Identify a psychological measuring tool (such as a standardized test of intelligence) that has high predictive validity.
  • Explain the pros and cons of using test scores from this measure to make predictions in real-world settings.

1. Cite all sources in APA format.2. Use attached sources and additional credible sources when needed.

Reliability and ValidityAs part of the test development process, researchers strive to create psychometrically sound instruments with acceptable levels of reliability and validity. When test items are
Types of Reliability PSY3700 Multimedia Assessment and Psychometrics ©20 16 South University 2 Types of Reliability Reliability If a test administrator is concerned about the error due to testing candidates at different times, the test -retest method can be used. In this method, the same test is administered twice and scores across test takers are compared on both administrations to obtain the correlation between the two. High correlations are favored as the test shows consistency across two or more a dministrations. However, if the concern is over error due to having a small sample, the split -half method is preferred. In this method, the test is split into halves and each half administered separately. Scores are correlated using the Kuder -Richardson 20 (KR20) in some cases or Cronbach’s coefficient alpha in other cases. These measures of internal consistency are often used as an estimate of a test’s reliability (Kaplan & Saccuzzo, 2013). Item sampling may be used rather than test -retest. This method addresses the error variance due to selection of a subset of test items within the domain under investigation. Choosing a new subset of items to compile a test and comparing the scores to the old constitute s the alternate, or parallel, form method of estimation. This estimate can be calculated using the Pearson’s r. For tests that include behavioral observations, such as those used in criterion -referenced tests, agreement among the raters or scorers can be estimated through the Kappa statistic (Kaplan & Saccuzzo, 2013). The Standard Error of Measurement (SEM) can b e computed indirectly from the reliability coefficient. Taken from the normative sample, the standard deviation of test scores and the reliability coefficient can yield the SEM. The SEM is the standard deviation multiplied by the square root of (1 –r) for t he normative sample (Gregory, 2013). The larger the SEM, the less certain we can be that the test is accurate. Different types of tests render acceptable reliability indices. For projective tests, such as the Rorschach Inkblot Test, reliability indices ma y be as low as r = 0.2 whereas for objective measures, such intelligence quotient (IQ) or personality tests, they are much higher (0.9 to 0.7, respectively) (Rust & Golombok, 2009). In our example of testing the components of creativity, it would be diffic ult to expect a reliability index score of 0.7 or 0.9 due to the subjective nature of interpretations by raters on a test of creativity. Here, the reliability of the test would be expected to fall somewhere between that of projective and objective tests (R ust & Golombok, 2009). PSY3700 Multimedia Assessment and Psychometrics ©20 16 South University 3 Types of Reliability Reliability References Gregory, R. (2013). Psychological testing: History, principles, and applications (7th ed.). Boston, MA: Pearson. Kaplan, R., & Saccuzzo, D. (2013). Psychological testing: Principles, applications, & issues (8th ed.). B elmont, CA: Wadsworth. Rust, J., & Golombok, S. (2009). Modern psychometrics: The science of psychological assessment (3rd ed.). New York, NY: Taylor & Francis. © 201 6 South University
Reliability and ValidityAs part of the test development process, researchers strive to create psychometrically sound instruments with acceptable levels of reliability and validity. When test items are
Bias in Testing and Assessment PSY3700 Multimedia Assessment and Psychometrics ©20 16 South University 2 Bias in Testing and Assessment Characteristics and Properties Test bias may occur due to errors in test items and in the scoring of tests, alterations in standardized conditions of test administration, and test scores leading to unsound decisions. In particular, bias can be introduced when the norms of a test are used on a sample of individuals who do not match the population for whom the test was intended. In this case, there is a lack of measurement equiva lence between the two groups (Fisher, 2013). A test is biased when its validity is different for various subgroups within a population (Gregory, 2013). Test bias is a statistical concept , whereas test fairness is based on values that uphold the rights of t est takers regardless of group membership. In the assessment of test fairness, subjective appraisals take precedence over objective ones (Gregory, 2013). Statistical tools to identify bias in tests include procedures such as factor analysis, multiple regr ession equations, intergroup comparisons of the difficulty levels for items previously labeled as “biased” or “unbiased,” and rank ordering of item difficulties. Because there can be biases in the evidence obtained for content, construct, and criterion val idity, this is an important issue for test developers and administrators (Gregory, 2013). We will revisit these topics in Week 8 . PSY3700 Multimedia Assessment and Psychometrics ©20 16 South University 3 Bias in Testing and Assessment Characteristics and Properties References Fisher, C. (2013). Decoding the ethics code: A practical guide for psychologists (3rd ed.). Thousand Oaks, CA: Sage. Gregory, R. (2013). Psychological testing: History, principles, and applications (7th ed.). Boston, MA: Pearson. © 201 6 South University
Reliability and ValidityAs part of the test development process, researchers strive to create psychometrically sound instruments with acceptable levels of reliability and validity. When test items are
Types of Validity PSY3700 Multimedia Assessment and Psychometrics ©20 16 South University 2 Types of Validity Validity Validity refers to the evidence used to support inferences that are made about a test score and its relationships to other variables. The main types of evidence for validity include: face validity, content validity, construct validity, and criterion validity. Although The Standards for Educational and Psychological Testing (NCME, 2012, as reported in Kaplan & Saccuzzo, 2013) did not recognize face validity as a component of validity, historically, it has been identified as a type of validity that referred to the “face value” of the test. Accordingly, the appearance of the t est and the acceptance of its items as being appropriate for the test at hand were factors contributing to face validity. Nevertheless, because it is based on appearance and does not lend empirical support for the conclusions drawn from the data of the tes t, it has lost importance in comparison to other types of validity (Kaplan & Saccuzzo, 2013). Content validity refers to the degree to which a test’s items effectively map onto the overall database of items that attempt to measure the construct or trait under investigation. If they do, the items are believed to be valid. Content validity is difficult to quantify a nd does not depend on statistical evidence to support its existence; rather, in searching for evidence for content validity, a researcher is more likely to use logic, intuition, and hard work. Further, items can be presented to a panel of expert judges for evaluation of their content. High interrater agreement on items can be used to justify the inclusion of the items in the test (Gregory, 2013). Evidence for construct validity is secured when a hypothetical entity, or construct, rather than a specific cri terion, is the topic of investigation. In this case, constructs are nebulous and ill -defined; hence, investigators must seek to find evidence in terms of the associations of the test with other measures of the same construct. Construct validity is subdivid ed into two areas, convergent and discriminant evidence. The first looks at the convergence of the test with other measures of the construct whereas the second looks at the uniqueness of the test when compared to measures of unrelated constructs. For a tes t to have incremental utility, it must offer something new to the array of pre -existing tests; otherwise, it is redundant. Thus, both similarity and differences between measures are important parts of construct validity and calculating correlations to asse ss degrees of similarity and differences is part of the evidence gathering process of construct validity (Gregory, 2013). Criterion validity represents how well a test correlates with a specific criterion, the criterion being the standard against which th e test is compared (Kaplan & Saccuzzo, 2013). An example of criterion validity is the correlation between a student’s score on the quantitative part of the Graduate Record Examinations (GRE) and the student’s grade point average (GPA) in a psychology gradu ate program. Here, the criterion would be defined as academic success in graduate school as manifested through GPA. The GRE math score becomes a “stand in” for predicting the success of the student at a future point in time. The predictive validity of the test is evidenced by the correlational coefficient between the predictor and criterion variables and is an example of the use of PSY3700 Multimedia Assessment and Psychometrics ©20 16 South University 3 Types of Validity Validity test scores to make high -stakes decisions. That is, if graduate schools use GRE scores to determine who gains entrance into the ir programs, they must also justify their decisions based on the predictive validity of the GRE. In contrast to predictive validity, concurrent validity estimates the extent to which a test score can effectively reflect an individual’s current position on a criterion (Gregory, 2013). As noted, both concurrent and predictive validity of a test can be used to justify high -stakes decisions that may positively or negatively affect the lives of many individuals. PSY3700 Multimedia Assessment and Psychometrics ©20 16 South University 4 Types of Validity Validity References Gregory, R. (2013). Psychological te sting: History, principles, and applications (7th ed.). Boston, MA: Pearson. Kaplan, R., & Saccuzzo, D. (2013). Psychological testing: Principles, applications , & issues (8th ed.). Belmont, CA: Wadsworth. © 201 6 South University

Writerbay.net

Everyone needs a little help with academic work from time to time. Hire the best essay writing professionals working for us today!

Get a 15% discount for your first order


Order a Similar Paper Order a Different Paper