Deck 17: Tests and Measurements
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/69
Play
Full screen (f)
Deck 17: Tests and Measurements
1
An example of an internal consistency reliability method would be the
A) test retest
B) split half
C) alpha omega
D) alpha beta
A) test retest
B) split half
C) alpha omega
D) alpha beta
split half
2
When test scores have been shown to accurately predict later performance, then the test is considered to have
A) predictive reliability
B) predictive validity
C) internal consistency
D) both a and b, but not c
A) predictive reliability
B) predictive validity
C) internal consistency
D) both a and b, but not c
predictive validity
3
If the proportion of persons answering a given item correctly were to reach 95%, then the item would be assigned
A) a .95 confidence level
B) a .95 alpha error
C) a .95 discrimination index
D) a .95 difficulty index
A) a .95 confidence level
B) a .95 alpha error
C) a .95 discrimination index
D) a .95 difficulty index
a .95 difficulty index
4
The test-retest method can be used to assess a test's
A) predictive validity
B) reliability
C) construct validity
D) all of these, depending on what statistical tests are used
A) predictive validity
B) reliability
C) construct validity
D) all of these, depending on what statistical tests are used
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
5
Unless the Spearman-Brown is used, split-half reliability values tend to be
A) lower than test-retest values
B) higher than test-retest values
C) identical with test-retest values (for the same test)
D) sometimes higher and sometimes lower, depending on which trait is being
A) lower than test-retest values
B) higher than test-retest values
C) identical with test-retest values (for the same test)
D) sometimes higher and sometimes lower, depending on which trait is being
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
6
The Spearman-Brown prophecy formula is used to correct for
A) an over-estimate of the original correlation
B) an under-estimate of the original correlation
C) the lack of any degrees of freedom
D) having previously eliminated the alpha error
A) an over-estimate of the original correlation
B) an under-estimate of the original correlation
C) the lack of any degrees of freedom
D) having previously eliminated the alpha error
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
7
For the majority of the items in any given test, the difficulty index should be kept at approximately
A) .99
B) .50
C) .01
D) none of these, since item difficulty is not a quantifiable value.
A) .99
B) .50
C) .01
D) none of these, since item difficulty is not a quantifiable value.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
8
The Spearman-Brown prophecy formula should not be used unless the correlation has been obtained on the basis of the
A) test-retest technique
B) alternate forms technique
C) predictive validity technique
D) split-half technique
A) test-retest technique
B) alternate forms technique
C) predictive validity technique
D) split-half technique
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
9
The split-half method can be used to assess a test's
A) predictive validity
B) construct validity
C) reliability
D) all of these, depending on what statistical tests are used
A) predictive validity
B) construct validity
C) reliability
D) all of these, depending on what statistical tests are used
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
10
When a correlation is established between the odd and even-numbered items, then the procedure was used to assess
A) predictive validity
B) construct validity
C) concurrent validity
D) none of these, since the procedure outlined above is used to assess
A) predictive validity
B) construct validity
C) concurrent validity
D) none of these, since the procedure outlined above is used to assess
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
11
An attempt to determine whether a given item separates the high from the low performers on the overall test is called the
A) discrimination index (DI).
B) item difficulty level (IDL)
C) item response test (IRT)
D) discrimination of performance evaluation (DOPE)
A) discrimination index (DI).
B) item difficulty level (IDL)
C) item response test (IRT)
D) discrimination of performance evaluation (DOPE)
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
12
An item would be considered extremely difficult if it had a difficulty index of
A) .99
B) .50
C) .01
D) none of these, since item difficulty is not a quantified value.
A) .99
B) .50
C) .01
D) none of these, since item difficulty is not a quantified value.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
13
When engaging in item analysis techniques, the researcher treats each item as
A) a criterion based score
B) a criterion against which all other items must be compared
C) a subtest in a larger composite test.
D) a robust correlation
A) a criterion based score
B) a criterion against which all other items must be compared
C) a subtest in a larger composite test.
D) a robust correlation
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
14
One popular statistical technique for evaluating the level of item discrimination is the
A) standard error of measurement
B) standard error of the mean
C) standard error of estimate
D) point biserial correlation
A) standard error of measurement
B) standard error of the mean
C) standard error of estimate
D) point biserial correlation
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
15
The Kuder-Richardson formula (KR21) presented in the chapter is used for establishing
A) test validity
B) internal consistency reliability
C) test retest reliability
D) predictive validity
A) test validity
B) internal consistency reliability
C) test retest reliability
D) predictive validity
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
16
When a test has been shown to be measuring what it purports to measure, then we may assume that the test is
A) reliable
B) valid
C) containing only homogeneous items
D) containing only heterogeneous items
A) reliable
B) valid
C) containing only homogeneous items
D) containing only heterogeneous items
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
17
For a given item, when the performances of those whose overall scores put them among the highest third of the group are compared with those who scored among the lowest third of the group, the resulting difference is called the
A) discrimination index (DI).
B) item difficulty level (IDL)
C) item response test (IRT)
D) discrimination of performance evaluation (DOPE)
A) discrimination index (DI).
B) item difficulty level (IDL)
C) item response test (IRT)
D) discrimination of performance evaluation (DOPE)
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
18
When a test has been shown to be reliable, it means that its results
A) are stable and consistent
B) are measuring what the test was designed to measure
C) are automatically assumed to be valid
D) both a and b, but not c
A) are stable and consistent
B) are measuring what the test was designed to measure
C) are automatically assumed to be valid
D) both a and b, but not c
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
19
When test scores are correlated with an already established and accepted measure of the construct under study, then the procedure was used to assess
A) predictive validity
B) construct validity
C) concurrent validity
D) none of these, since the procedure outlined above is used to assess
A) predictive validity
B) construct validity
C) concurrent validity
D) none of these, since the procedure outlined above is used to assess
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
20
All tests that have been shown to be statistically reliable, must also be
A) valid
B) norm-referenced
C) criterion-referenced
D) none of these
A) valid
B) norm-referenced
C) criterion-referenced
D) none of these
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
21
If a certain test had a reliability value of 2.95, then we know that
A) its validity must also be high
B) its validity must be low
C) the test is measuring something very consistently, even though we don't necessarily know what it is that's being measured
D) none of these, since a test could never have a reliability of 2.95
A) its validity must also be high
B) its validity must be low
C) the test is measuring something very consistently, even though we don't necessarily know what it is that's being measured
D) none of these, since a test could never have a reliability of 2.95
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
22
When predicting all possible scores a given individual could have obtained around the actual score that was obtained, we can use the
A) standard error of measurement
B) standard error of the mean
C) standard error of estimate
D) point biserial correlation
A) standard error of measurement
B) standard error of the mean
C) standard error of estimate
D) point biserial correlation
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
23
A researcher wishes to estimate the reliability of the new test of a person's sense of humor, the LOL test (Lambda Orientation Level). The test is composed of a total of 40 items, and the researcher splits the scoring in order to compare each subject's score on the odd numbered items with those on the even numbered items. A total of 10 adult males was randomly selected and their scores were as follows:
Estimate the reliability of the whole test.

Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
24
Was the test statistically valid?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
25
In order to predict how much an individual's score might change on a retest, we can use the
A) standard error of measurement
B) standard error of the mean
C) standard error of estimate
D) point biserial correlation
A) standard error of measurement
B) standard error of the mean
C) standard error of estimate
D) point biserial correlation
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
26
The correlation between the odd and even numbered items for a given test provides an estimate of the test's reliability.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
27
Correlating test scores with an independent measure of the trait being measured provides an estimate of the test's validity.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
28
The test re-test method is used to assess a test's validity.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
29
Test validity provides an estimate of whether or not the test is measuring what it has been designed to measure.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
30
The split-half method is used to assess a test's construct validity.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
31
Correlations that are used to assess reliability typically result in lower values than those used to assess validity.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
32
What type of validity procedure was used in this example?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
33
When the internal consistency of a test is being determined on the basis of the Kuder-Richardson formula (KR21), then it should NOT be followed by
A) any attempts to determine predictive validity
B) any attempts to determine concurrent validity
C) the Spearman-Brown prophecy formula
D) none of these, since the KR21 is NOT an internal consistency measure
A) any attempts to determine predictive validity
B) any attempts to determine concurrent validity
C) the Spearman-Brown prophecy formula
D) none of these, since the KR21 is NOT an internal consistency measure
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
34
A researcher in the area of cognitive functioning has designed a new test of creativity, called the CIA test (Cognitive Information Assessment). A random sample of ten, five-year-old children is selected and given the test. A week later the same test was administered for a second time. Estimate the reliability of this test. 

Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
35
If a certain test had a reliability value of .95, then its validity must be at least
A) .95
B) .05
C) .50
D) none of these, since the reliability of a test tells us nothing about its
A) .95
B) .05
C) .50
D) none of these, since the reliability of a test tells us nothing about its
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
36
If a certain test had a reliability value of .95, then we know that
A) its validity must also be high
B) its validity must be low
C) the test is measuring something very consistently, even though we don't necessarily know what it is that's being measured
D) none of these, since a test could never have a reliability of .95
A) its validity must also be high
B) its validity must be low
C) the test is measuring something very consistently, even though we don't necessarily know what it is that's being measured
D) none of these, since a test could never have a reliability of .95
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
37
Test reliability provides an estimate of the stability and consistency of the measures.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
38
A given test may be reliable, and yet still not be valid.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
39
The correlation between the odd and even numbered items for a given test underestimates the test's reliability unless corrected for by a technique such as the Spearman-Brown prophecy formula.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
40
A method for assessing the overall reliability of a test by splitting it into all possible halves is called the
A) point biserial
B) discrimination index (DI)
C) vix formula 44
D) Kuder-Richardson formula (KR21)
A) point biserial
B) discrimination index (DI)
C) vix formula 44
D) Kuder-Richardson formula (KR21)
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
41
A test constructor is interested in discovering how long a set of sub-tests must be to reach a reliability value of .85. For each sub-test, the following significant correlations were found:
a. .739
b. .653
c. .585
d. .790
a. .739
b. .653
c. .585
d. .790
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
42
Is the test proving to be reliable?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
43
A researcher wishes to estimate the reliability of the FYI (Functional Youth Indicator) - a new test of adolescent shyness. The test is composed of a total of 30 items, and the researcher splits the scoring in order to compare each subject's score on the odd numbered items with those on the even numbered items. A total of 10 teen-age males was randomly selected and their scores were as follows:
Estimate the reliability of the whole test.

Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
44
The standard error of measurement on a certain test has been found to be 10 with an SD of 15.00. Using the Rulon formula, find the rtt (reliability value) for the test.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
45
The reliability of a certain test is .90, whereas the reliability of the criterion is .85. What is the highest possible validity value for this test?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
46
A forensic psychologist is attempting to assess the reliability of a new test of psychopathy. A random sample of inmates from a large state prison, all of whom had been convicted of violent offenses was selected and given the test. Assume that the test is composed of five items. The test items were scored from 1 to 10, with a 10 assumed to define the highest level of psychopathy. 

Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
47
For a given administration of the KISS (Kelly Impulsivity Standards Scale), the mean was 50 with an SD of 10. The test-retest reliability was .85. Find the standard error of measurement.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
48
A certain test has been found to have an SD of 50 and a standard error of measurement of 32. Using the Rulon formula, find the rtt (reliability value) for the test.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
49
For a given administration of the welcome-back-to-school test, the KABC (Kotter Assessment Battery for Children), the mean was 100 with an SD of 15. The test-retest reliability was .90. Find the standard error of measurement.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
50
Find the whole test reliabilities for the following split-half correlations, all of which were calculated on the basis of a sample of 30 persons.
a. .63
b. .93
c. .33
d. .78
a. .63
b. .93
c. .33
d. .78
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
51
A new test, the NCAA (Number-Concept-Awareness-Ability) is being standardized for college athletes majoring in math, and it has been found to have a mean of 85 and an SD of 15. The test is composed of 125 items. Find its KR21 reliability.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
52
A researcher wishes to establish the DI (Discrimination Index) for a series of five items contained in a new test being developed to assess Whole Language Readiness (WLR). The data are as follows: Item Proportion Correct Proportion Correct
a. .70 .60
b. .70 .50
c. .50 .20
d. .95 .70
e. .45 .50 For Top Third For Bottom Third
a. .70 .60
b. .70 .50
c. .50 .20
d. .95 .70
e. .45 .50 For Top Third For Bottom Third
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
53
A test constructor is interested in determining the reliability of a certain item on a new test of aggressiveness, the ASAP (Aggressive Syndrome Assessment Profile). The items were scored 1 for showing aggressiveness and 0 for passivity. For each subject, the total score on the test is reported, high scores indicating higher levels of aggressiveness. Subj Item Score Total Score
Find the point-biserial reliability value for the selected item.

Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
54
Using the standard error of measurement found above, find the confidence interval at the .95 level for the following KABC scores:
a. 80
b. 90
c. 100
d. 110
e. 120
a. 80
b. 90
c. 100
d. 110
e. 120
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
55
Suppose that you as a statistical consultant were given the following test data from a researcher: M = 400, SD = 32 and the standard error of measurement = 100. You are asked to use the Rulon formula for finding the test's reliability.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
56
A new test, designed to test parental permissiveness among parents of K through 4th grade children, was developed and you as the statistical consultant have been asked to assess its reliability. The test is composed of 4 items and was given to a random sample of 5 parents. The items on the test were scored from 1 to 10, with 10 showing the most permissiveness. Using the Cyril Hoyt ANOVA, find the internal consistency reliability of the test. Items 

Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
57
Another researcher attempted to shorten the NCAA test shown above and reduced the number of items to only 20. The mean on this revision was 10 with an SD of 3. Find its KR21 reliability.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
58
Using the standard error of measurement found above, find the confidence interval at the .95 level for the following KISS scores :
a. 40
b. 45
c. 50
d. 55
e. 57
a. 40
b. 45
c. 50
d. 55
e. 57
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
59
The reliability of a certain test is .80, whereas the reliability of the criterion is only .30. What is the highest possible validity value for this test?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
60
A test specialist is attempting to find the reliability of a test designed to assess drug dependence. Ten subjects were randomly selected from a group of individuals. all of whom on a waiting list for a treatment program for drug abuse. The subjects were all given the test and then a week later were retested. The scores were as follows:
Find the test-retest reliability value.

Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
61
A psychologist wants to assess the reliability of a new test of Self Control, the higher the scores the higher the level of control. A random sample of college students is selected and given the test. In order to tap the reliability, the test is scored on the basis of comparing the results of the odd items versus the even items. The data follow: 
-What is the correlation when corrected by the Spearman-Brown?

-What is the correlation when corrected by the Spearman-Brown?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
62
A researcher is interested in whether a correlation exists between Alcohol scores and Drug Abuse scores on the MAQ (Maryland Acquisition Questionnaire). A group of inmates, all of whom had at least two previous incarcerations, were given the MAQ and their Drug and Alcohol scores were compared.

-Is it significant?

-Is it significant?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
63
A forensic psychologist is attempting to assess the validity of a new test of police aptitude. The test is given to a random sample of first-day trainees during their orientation session at the police academy. At the end of the 2 month training program each subject is evaluated by the academy's director and rank-ordered on the basis of overall proficiency. The data follow. 
-Is this a valid test? . Point out to your students that this is actually a positive correlation as far as the validity is concerned. The only reason the negative sign appears is because in the ranking process, the highest score gets a 1 (a metrically low number) and the lowest score gets a 10 (a metrically higher number). Ask your students to rank order the test scores, and then again compute the Spearman, using only the two sets of ranks. At that point, that correlation of -.733 now has a positive sign.

-Is this a valid test? . Point out to your students that this is actually a positive correlation as far as the validity is concerned. The only reason the negative sign appears is because in the ranking process, the highest score gets a 1 (a metrically low number) and the lowest score gets a 10 (a metrically higher number). Ask your students to rank order the test scores, and then again compute the Spearman, using only the two sets of ranks. At that point, that correlation of -.733 now has a positive sign.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
64
A psychologist wants to assess the reliability of a new test of Self Control, the higher the scores the higher the level of control. A random sample of college students is selected and given the test. In order to tap the reliability, the test is scored on the basis of comparing the results of the odd items versus the even items. The data follow: 
-Is it significant? .

-Is it significant? .
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
65
Is the test proving to be reliable? .
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
66
A forensic psychologist is attempting to assess the validity of a new test of police aptitude. The test is given to a random sample of first-day trainees during their orientation session at the police academy. At the end of the 2 month training program each subject is evaluated by the academy's director and rank-ordered on the basis of overall proficiency. The data follow. 
17-75, Is it significant? .

17-75, Is it significant? .
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
67
A forensic psychologist is attempting to assess the validity of a new test of police aptitude. The test is given to a random sample of first-day trainees during their orientation session at the police academy. At the end of the 2 month training program each subject is evaluated by the academy's director and rank-ordered on the basis of overall proficiency. The data follow. 
-Find the Spearman correlation.

-Find the Spearman correlation.
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
68
A psychologist wants to assess the reliability of a new test of Self Control, the higher the scores the higher the level of control. A random sample of college students is selected and given the test. In order to tap the reliability, the test is scored on the basis of comparing the results of the odd items versus the even items. The data follow: 
-What is the correlation between the odd and even items?

-What is the correlation between the odd and even items?
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck
69
A psychologist wants to assess the reliability of a new test of Self Control, the higher the scores the higher the level of control. A random sample of college students is selected and given the test. In order to tap the reliability, the test is scored on the basis of comparing the results of the odd items versus the even items. The data follow: 
-Is it significant? .

-Is it significant? .
Unlock Deck
Unlock for access to all 69 flashcards in this deck.
Unlock Deck
k this deck