Deck 5: Reliability

Full screen (f)
exit full mode
Question
The multiple-choice test items on this examination are all examples of

A) dichotomous test items.
B) latent trait test items.
C) polytomous test items.
D) None of these
Use Space or
up arrow
down arrow
to flip the card.
Question
Cronbach's alpha is to similarity of scores on test items as average proportional distance is to

A) difference in scores on test items
B) inter-item consistency
C) test-retest reliability
D) parallel forms reliability
Question
This variety of error has also been referred to as "noise." It is

A) systematic error.
B) random error.
C) measurement error.
D) background error.
Question
Stanley (1971)wrote that in classical test theory,a so-called "true score" is "not the ultimate fact in the book of the recording angel." By this,Stanley meant that

A) it would be imprudent to trust in Divine influence when estimating variance.
B) the amount of test variance that is true relative to error may never be known.
C) it is near impossible to separate fact from fiction with regard to "true scores."
D) All of these
Question
A confidence interval is a range or band of test scores that

A) has proven test-retest reliability.
B) is calculated using the standard error of the difference.
C) is likely to contain the true score.
D) None of these
Question
Item response theory is to latenttrait theory as observer reliability is to

A) generalizability theory.
B) domain sampling theory.
C) odd-even reliability.
D) inter-scorer reliability.
Question
A test entails behavioral observation and rating of front desk clerks to determine whether or not they greet guests with a smile.Which type of error is this test most susceptible to?

A) test administration error
B) test construction error
C) examiner-related error
D) polling error
Question
A Wall Street Securities firm that is actually located on Wall Street is testing a group of candidates for their aptitude in finance and business.As the testing begins,an unexpected "Occupy Wall Street" sit-in takes place.From a psychometric perspective in the context of this testing,the sit-in is viewed as

A) systematic error.
B) random error.
C) test administration error.
D) background error.
Question
The term test heterogeneity BEST refers to the extent to which test items measure

A) different factors.
B) the same factor.
C) a unifactorial trait.
D) a nonhomogeneous trait.
Question
Which would NOT be useful in estimating a test's inter-item consistency?

A) Cronbach's alpha
B) the Kuder-Richardson formulas
C) the average proportional distance
D) a coefficient of equivalence
Question
In an illustrative scenario described in Chapter 5 of your text,a group of 12th grade "whiz kids" in math,newly arrived to the United States from China,perform poorly on a test of 12th grade math.According to the text,what probably accounted for this?

A) lower standards in China as compared to the US for measuring math ability
B) higher standards in the US as compared to China for earning high grades
C) the ability of the Chinese students to read what was required in English
D) the reliability of the instrument used to test 12th grade math skills
Question
In classical test theory,an observed score on an ability test is presumed to represent the testtaker's

A) true score.
B) true score less the variance.
C) true score combined with extraneous factors.
D) the testtaker's true score and error.
Question
The meaning of reliability in the psychometric sense differs from the meaning of reliability in the "every day" use of that word in that

A) reliability in the "every day sense" is usually "a good thing."
B) reliability in the psychometric sense is usually "a good thing."
C) reliability in the psychometric sense has greater implications.
D) None of these
Question
The more homogeneous a test is,the

A) less inter-item consistency it can be expected to have.
B) more utility the test has for measuring multifaceted variables.
C) more inter-item consistency it can be expected to have.
D) None of these
Question
One of the problems associated with classical test theory has to do with

A) the notion that there is a "true score" on a test has great intuitive appeal.
B) the fact that CTT assumptions are often characterized as "weak."
C) its assumptions concerning the equivalence of all items on a test.
D) its assumptions allow for its application in most situations.
Question
Error in the reporting of spousal abuse may result from

A) one partner simply forgets all of the details of the abuse.
B) one partner misunderstands the instructions for reporting.
C) one partner is ashamed to report the abuse.
D) All of these
Question
Which is TRUE about reliability in the psychometric sense?

A) reliability is an all-or-none measurement
B) a test may be reliable in one context and unreliable in another
C) a reliability coefficient may not be derived for personality tests
D) alternate forms reliability may not be derived for personality tests
Question
Which of the following is NOT an alternative to classical test theory cited in your text?

A) generalizability theory
B) representational theory
C) domain sampling theory
D) latent trait theory
Question
The standard error of measurement is

A) used to infer how far an observed score is from the true score.
B) also known as the standard error of a score.
C) is used in the context of classical test theory.
D) All of these
Question
Which is TRUE of measurement error?

A) Like error in general, measurement error may be random or systematic.
B) Unlike error in general, measurement error may be random or systematic.
C) Measurement error is always random.
D) Measurement error is always systematic.
Question
A reliability coefficient is

A) an index.
B) a proportion of the total variance attributed to true variance.
C) unaffected by a systematic source of error.
D) All of these
Question
A source of error variance may take the form of

A) item sampling.
B) testtakers' reactions to environment-related variables such as room temperature and lighting.
C) testtaker variables such as amount of sleep the night before a test, amount of anxiety, or drug effects.
D) All of the above
Question
Reliability,in a broad statistical sense,is synonymous with

A) consistently good.
B) consistently bad.
C) consistency.
D) validity.
Question
What term refers to the degree of correlation between all the items on a scale?

A) inter-item homogeneity
B) inter-item consistency
C) inter-item heterogeneity
D) parallel-form reliability
Question
Which of the following types of reliability estimates is the most expensive due to the costs involved in test development?

A) test-retest
B) parallel-form
C) internal-consistency
D) Spearman's rho
Question
Which of the following might lead to a decrease in test-retest reliability?

A) the passage of time between the two administrations of the test.
B) coaching designed to increase test scores between the two administrations of the test.
C) practice with similar test materials between the two administrations of the test.
D) All of these
Question
As the degree of reliability increases,the proportion of

A) total variance attributed to true variance decreases.
B) total variance attributed to true variance increases.
C) total variance attributed to error variance increases.
D) None of these
Question
Which of the following factors may influence a split-half reliability estimate?

A) fatigue
B) anxiety
C) item difficulty
D) All of these
Question
Why might ability test scores among testtakers most typically vary?

A) because of the true ability of the testtaker
B) because of irrelevant, unwanted influences
C) All of the above
D) None of the above
Question
Internal-consistency estimates of reliability are inappropriate for

A) reading achievement tests.
B) scholastic aptitude/intelligence tests.
C) word processing tests based on speed.
D) tests purporting to measure a single personality trait.
Question
Test-retest estimates of reliability are referred to as measures of ________,and split-half reliability estimates are referred to as measures of ________.

A) true scores; error scores
B) internal consistency; stability
C) interscorer reliability; consistency
D) stability; internal consistency
Question
Computer-scorable items have tended to eliminate error variance due to

A) item sampling.
B) scorer differences.
C) content sampling.
D) testtakers' reactions to environmental variables.
Question
Which type of reliability estimate is obtained by correlating pairs of scores from the same person (or people)on two different administrations of the same test?

A) a parallel-forms estimate
B) a split-half estimate
C) a test-retest estimate
D) an au-pair estimate
Question
Which of the following is TRUE for parallel forms of a test?

A) The means of the observed scores are equal for the two forms.
B) The variances of the estimated scores are equal for the two forms.
C) The means and variances of the observed scores are equal for the two forms.
D) The means and variances of the estimated scores are equal for the two forms.
Question
Which of the following is TRUE for estimates of alternate- and parallel-forms reliability?

A) Two test administrations with the same group are required.
B) Test scores may be affected by factors such as motivation, fatigue, or intervening events like practice, learning, or therapy.
C) Item sampling is a source of error variance.
D) All of these
Question
Which of the following is usually minimized when using split-half estimates of reliability as compared with test-retest or parallel/alternate-form estimates of reliability?

A) time and expense
B) reliability and validity
C) reliability only
D) time spent in scoring and interpretation
Question
Which source of error variance affects parallel- or alternate-form reliability estimates but does not affect test-retest estimates?

A) fatigue
B) learning
C) practice
D) item sampling
Question
Which type of reliability estimate would be appropriate only when evaluating the reliability of a test that measures a trait that is presumed to be relatively stable time?

A) parallel-forms
B) alternate-forms
C) test-retest
D) split-half
Question
Which of the following is true of systematic error?

A) It significantly lowers the reliability of a measure.
B) It insignificantly lowers the reliability of a measure.
C) It increases the reliability of a measure.
D) It has no effect on the reliability of a measure.
Question
An estimate of test-retest reliability is often referred to as a coefficient of stability when the time interval between the test and retest is more than

A) 30 days.
B) 60 days.
C) 3 months.
D) 6 months.
Question
Which BEST conveys the meaning of an inter-scorer reliability estimate of .90?

A) Ninety percent of the scores obtained are reliable.
B) Ninety percent of the variance in the scores assigned by the scorers was attributed to true differences and 10% to error.
C) Ten percent of the variance in the scores assigned by the scorers was attributed to true differences and 90% to error.
D) Ten percent of the test's items are in need of revision according to the majority of the test's users.
Question
If items on a test are measuring very different traits,estimates of reliability yielded from split-half methods will typically be ________ as compared with estimates from KR-20.

A) higher
B) lower
C) similar
D) approximately the same
Question
Which of the following is generally the preferred statistic for obtaining a measure of internal-consistency reliability?

A) KR-20
B) KR-21
C) Kendall's Tau
D) coefficient alpha
Question
Which of the following is TRUE about coefficient alpha?

A) Kuder thought it to be single best measure of reliability.
B) It was first conceived by Alfalfa Alpha.
C) It is a characteristic of a particular set of scores, not of the test itself.
D) None of these
Question
Typically,adding items to a test will have what effect on the test's reliability?

A) Reliability will decrease.
B) Reliability will increase.
C) Reliability will stay the same.
D) Reliability will first increase and then decrease.
Question
Coefficient alpha is appropriate to use with all of the following test formats EXCEPT

A) multiple-choice.
B) true-false.
C) short-answer for which partial credit is awarded.
D) essay exam with no partial credit awarded.
Question
If items from a test are measuring the same trait,estimates of reliability yielded from split-half methods will typically be ________ as compared to estimates from KR-20.

A) higher
B) lower
C) similar
D) approximately the same
Question
Which is NOT an assumption that should be met in order to use KR-21?

A) Items should be dichotomous.
B) Items should be of equal difficulty.
C) Items should be homogeneous.
D) Items should be scorable by computer.
Question
Which of the following is NOT an acceptable way to divide a test when using the split-half reliability method?

A) Randomly assign items to each half of the test.
B) Assign odd-numbered items to one half and even-numbered items to the other half of the test.
C) Assign the first-half of the items to one half of the test and the second half of the items to the other half of the test.
D) Assign easy items to one half of the test and difficult items to the other half of the test.
Question
KR-20 is the statistic of choice for tests with which types of items?

A) multiple-choice
B) true-false
C) All of these
D) None of these
Question
For determining the reliability of tests scored using nominal scales of measurement,the statistic of choice is

A) Kendall's Tau.
B) the Kappa statistic.
C) KR-20.
D) coefficient alpha.
Question
The KR-21 reliability estimate was developed

A) to yield greater consistency in reliability coefficients.
B) to facilitate computation by hand.
C) for use with less homogeneous items.
D) because Kuder wanted to "one-up" Richardson's 20.
Question
The "20" and "21" in KR-20 and KR-21 represent

A) numbers held constant in the denominator.
B) numbers held constant in the numerator.
C) the order in which the formulas were created.
D) the age of Fred Kuder's son and nephew at the time the formulas were developed.
Question
The Spearman-Brown formula is used for:

A) correcting for one half of the test by estimating the reliability of the whole test.
B) determining how many additional items are needed to increase reliability up to a certain level.
C) determining how many items can be eliminated without reducing reliability below a predetermined level.
D) All of these
Question
For a heterogeneous test,measures of internal-consistency reliability will tend to be ________ compared with other methods of estimating reliability.

A) higher
B) lower
C) very similar or higher
D) more robust
Question
Error variance for measures of inter-item consistency comes from

A) fatigue.
B) motivation.
C) a testtaker practice effect.
D) heterogeneity of the content.
Question
A coefficient alpha over .9 may indicate that

A) the items in the test are too dissimilar.
B) the test is not reliable.
C) the items in the test are redundant.
D) the test is biased against low-ability individuals.
Question
Coefficientalpha is an expression of

A) the mean of split-half correlations between odd- and even-numbered items.
B) the mean of split-half correlations between first- and second-half items.
C) the mean of all possible split-half correlations.
D) the mean of the best or "alpha" level split-half correlations.
Question
When more than two scorers are used to determine inter-scorer reliability,the statistic of choice is

A) Pearson r.
B) Spearman's rho.
C) KR-20.
D) coefficient alpha.
Question
A synonym for interscorer reliability is

A) interjudge reliability
B) observer reliability
C) interrater reliability
D) All of these
Question
If traditional measures of reliability are applied to a criterion-referenced test,the reliability estimate will likely be

A) spuriously low.
B) spuriously high.
C) exactly zero.
D) None of these
Question
Which of the following would result in the LEAST appropriate estimate of reliability for a speed test?

A) test-retest
B) alternate-form
C) split-half from a single administration of the test
D) split-half from two independent testing sessions
Question
Which type(s)of reliability estimates would be most appropriate for a measure of heart rate?

A) test-retest
B) alternate-form
C) parallel form
D) internist consistency
Question
Classical reliability theory estimates the portion of a test score that is attributed to ________,and domain sampling theory estimates ________.

A) specific sources of variation; error
B) error; specific sources of variation
C) the skills being measured; variation
D) the skills being measured; content knowledge
Question
Interpretations of criterion-referenced tests are typically made with respect to

A) the total number of items the examinee responded to.
B) the material that the examinee evidenced mastery of.
C) a comparison of the examinee's performance with that of others who took the test.
D) a formula that takes into account the total number of items for which no response was scorable.
Question
Which type(s)of reliability estimates would be appropriate for a speed test?

A) test-retest
B) alternate-form
C) split-half from two independent testing sessions
D) All of these
Question
Traditional measures of reliability are inappropriate for criterion-referenced tests because variability

A) is maximized with criterion-referenced tests.
B) is minimized with criterion-referenced tests.
C) is variable with criterion-referenced tests.
D) cannot be determined with criterion-referenced tests
Question
A measure of clerical speed is obtained by a test that has respondents alphabetize index cards.The manual for this test cites a split-half reliability coefficient for a single administration of the test of .95.What might you conclude?

A) The test is highly reliable.
B) The published reliability estimate is spuriously low and would have been higher had another estimate been used.
C) The split-half estimate should not have been used in this instance.
D) Clerical speed is too vague a construct to measure.
Question
The standard deviation of a theoretically normal distribution of test scores obtained by one person on equivalent tests is

A) the standard error of the difference between means.
B) the standard error of measurement.
C) the standard deviation of the reliability coefficient.
D) the variance.
Question
Item response theory (IRT)focuses on the

A) circumstances that inspired the development of the test.
B) test administration variables.
C) individual items of a test.
D) "how and why" of the Interborough Rapid Transit line
Question
Generalizability theory focuses on which of the following?

A) the circumstances under which a test was developed
B) the circumstances under which a test is administered
C) the circumstances under which a test is interpreted
D) All of these
Question
Which estimate of reliability is most consistent with the domain sampling theory?

A) test-retest
B) alternate-form
C) internal-consistency
D) interscorer
Question
Typically,speed tests

A) contain items of a uniform difficulty level.
B) are completed by fewer than 1% of all test-takers.
C) have low validity coefficients.
D) yield high rates of false positives.
Question
If a test is homogeneous

A) it is functionally uniform throughout.
B) it will likely yield a high internal-consistency reliability estimate compared with a test-retest reliability estimate.
C) it would be reasonable to expect a high degree of internal consistency.
D) All of these
Question
The fact that the length of a test influences the size of the reliability coefficient is based on which theory of measurement?

A) classical test theory (CTT)
B) generalizability theory
C) domain sampling theory
D) item response theory (IRT)
Question
An estimate of the reliability of a speed test is a measure of

A) the stability of the test.
B) the consistency of the response speed.
C) the homogeneity of the test items.
D) All of these
Question
The Spearman-Brown formula can be used for which types of tests?

A) speed and multiple-choice
B) true-false and multiple-choice
C) speed, true-false, and multiple-choice
D) trade school and driving tests
Question
Use of the Spearman-Brown formula would be INAPPROPRIATE to

A) estimate the effect on reliability of shortening a test.
B) determine the number of items needed in a test to obtain the desired level of reliability.
C) estimate the internal consistency of a speed test.
D) All of these
Question
If a time limit is long enough to allow test-takers to attempt all items,and if some items are so difficult that no test-taker is able to obtain a perfect score,then the test is referred to as a ________ test.

A) speed
B) power
C) reliable
D) valid
Question
A Kuder-Richardson (KR)or split-half estimate of reliability for a speed test would provide an estimate that is

A) spuriously low.
B) spuriously high.
C) insignificant.
D) equal to a test-retest method.
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/169
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 5: Reliability
1
The multiple-choice test items on this examination are all examples of

A) dichotomous test items.
B) latent trait test items.
C) polytomous test items.
D) None of these
C
2
Cronbach's alpha is to similarity of scores on test items as average proportional distance is to

A) difference in scores on test items
B) inter-item consistency
C) test-retest reliability
D) parallel forms reliability
A
3
This variety of error has also been referred to as "noise." It is

A) systematic error.
B) random error.
C) measurement error.
D) background error.
B
4
Stanley (1971)wrote that in classical test theory,a so-called "true score" is "not the ultimate fact in the book of the recording angel." By this,Stanley meant that

A) it would be imprudent to trust in Divine influence when estimating variance.
B) the amount of test variance that is true relative to error may never be known.
C) it is near impossible to separate fact from fiction with regard to "true scores."
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
5
A confidence interval is a range or band of test scores that

A) has proven test-retest reliability.
B) is calculated using the standard error of the difference.
C) is likely to contain the true score.
D) None of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
6
Item response theory is to latenttrait theory as observer reliability is to

A) generalizability theory.
B) domain sampling theory.
C) odd-even reliability.
D) inter-scorer reliability.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
7
A test entails behavioral observation and rating of front desk clerks to determine whether or not they greet guests with a smile.Which type of error is this test most susceptible to?

A) test administration error
B) test construction error
C) examiner-related error
D) polling error
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
8
A Wall Street Securities firm that is actually located on Wall Street is testing a group of candidates for their aptitude in finance and business.As the testing begins,an unexpected "Occupy Wall Street" sit-in takes place.From a psychometric perspective in the context of this testing,the sit-in is viewed as

A) systematic error.
B) random error.
C) test administration error.
D) background error.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
9
The term test heterogeneity BEST refers to the extent to which test items measure

A) different factors.
B) the same factor.
C) a unifactorial trait.
D) a nonhomogeneous trait.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
10
Which would NOT be useful in estimating a test's inter-item consistency?

A) Cronbach's alpha
B) the Kuder-Richardson formulas
C) the average proportional distance
D) a coefficient of equivalence
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
11
In an illustrative scenario described in Chapter 5 of your text,a group of 12th grade "whiz kids" in math,newly arrived to the United States from China,perform poorly on a test of 12th grade math.According to the text,what probably accounted for this?

A) lower standards in China as compared to the US for measuring math ability
B) higher standards in the US as compared to China for earning high grades
C) the ability of the Chinese students to read what was required in English
D) the reliability of the instrument used to test 12th grade math skills
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
12
In classical test theory,an observed score on an ability test is presumed to represent the testtaker's

A) true score.
B) true score less the variance.
C) true score combined with extraneous factors.
D) the testtaker's true score and error.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
13
The meaning of reliability in the psychometric sense differs from the meaning of reliability in the "every day" use of that word in that

A) reliability in the "every day sense" is usually "a good thing."
B) reliability in the psychometric sense is usually "a good thing."
C) reliability in the psychometric sense has greater implications.
D) None of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
14
The more homogeneous a test is,the

A) less inter-item consistency it can be expected to have.
B) more utility the test has for measuring multifaceted variables.
C) more inter-item consistency it can be expected to have.
D) None of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
15
One of the problems associated with classical test theory has to do with

A) the notion that there is a "true score" on a test has great intuitive appeal.
B) the fact that CTT assumptions are often characterized as "weak."
C) its assumptions concerning the equivalence of all items on a test.
D) its assumptions allow for its application in most situations.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
16
Error in the reporting of spousal abuse may result from

A) one partner simply forgets all of the details of the abuse.
B) one partner misunderstands the instructions for reporting.
C) one partner is ashamed to report the abuse.
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
17
Which is TRUE about reliability in the psychometric sense?

A) reliability is an all-or-none measurement
B) a test may be reliable in one context and unreliable in another
C) a reliability coefficient may not be derived for personality tests
D) alternate forms reliability may not be derived for personality tests
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
18
Which of the following is NOT an alternative to classical test theory cited in your text?

A) generalizability theory
B) representational theory
C) domain sampling theory
D) latent trait theory
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
19
The standard error of measurement is

A) used to infer how far an observed score is from the true score.
B) also known as the standard error of a score.
C) is used in the context of classical test theory.
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
20
Which is TRUE of measurement error?

A) Like error in general, measurement error may be random or systematic.
B) Unlike error in general, measurement error may be random or systematic.
C) Measurement error is always random.
D) Measurement error is always systematic.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
21
A reliability coefficient is

A) an index.
B) a proportion of the total variance attributed to true variance.
C) unaffected by a systematic source of error.
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
22
A source of error variance may take the form of

A) item sampling.
B) testtakers' reactions to environment-related variables such as room temperature and lighting.
C) testtaker variables such as amount of sleep the night before a test, amount of anxiety, or drug effects.
D) All of the above
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
23
Reliability,in a broad statistical sense,is synonymous with

A) consistently good.
B) consistently bad.
C) consistency.
D) validity.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
24
What term refers to the degree of correlation between all the items on a scale?

A) inter-item homogeneity
B) inter-item consistency
C) inter-item heterogeneity
D) parallel-form reliability
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
25
Which of the following types of reliability estimates is the most expensive due to the costs involved in test development?

A) test-retest
B) parallel-form
C) internal-consistency
D) Spearman's rho
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
26
Which of the following might lead to a decrease in test-retest reliability?

A) the passage of time between the two administrations of the test.
B) coaching designed to increase test scores between the two administrations of the test.
C) practice with similar test materials between the two administrations of the test.
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
27
As the degree of reliability increases,the proportion of

A) total variance attributed to true variance decreases.
B) total variance attributed to true variance increases.
C) total variance attributed to error variance increases.
D) None of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
28
Which of the following factors may influence a split-half reliability estimate?

A) fatigue
B) anxiety
C) item difficulty
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
29
Why might ability test scores among testtakers most typically vary?

A) because of the true ability of the testtaker
B) because of irrelevant, unwanted influences
C) All of the above
D) None of the above
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
30
Internal-consistency estimates of reliability are inappropriate for

A) reading achievement tests.
B) scholastic aptitude/intelligence tests.
C) word processing tests based on speed.
D) tests purporting to measure a single personality trait.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
31
Test-retest estimates of reliability are referred to as measures of ________,and split-half reliability estimates are referred to as measures of ________.

A) true scores; error scores
B) internal consistency; stability
C) interscorer reliability; consistency
D) stability; internal consistency
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
32
Computer-scorable items have tended to eliminate error variance due to

A) item sampling.
B) scorer differences.
C) content sampling.
D) testtakers' reactions to environmental variables.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
33
Which type of reliability estimate is obtained by correlating pairs of scores from the same person (or people)on two different administrations of the same test?

A) a parallel-forms estimate
B) a split-half estimate
C) a test-retest estimate
D) an au-pair estimate
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
34
Which of the following is TRUE for parallel forms of a test?

A) The means of the observed scores are equal for the two forms.
B) The variances of the estimated scores are equal for the two forms.
C) The means and variances of the observed scores are equal for the two forms.
D) The means and variances of the estimated scores are equal for the two forms.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
35
Which of the following is TRUE for estimates of alternate- and parallel-forms reliability?

A) Two test administrations with the same group are required.
B) Test scores may be affected by factors such as motivation, fatigue, or intervening events like practice, learning, or therapy.
C) Item sampling is a source of error variance.
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
36
Which of the following is usually minimized when using split-half estimates of reliability as compared with test-retest or parallel/alternate-form estimates of reliability?

A) time and expense
B) reliability and validity
C) reliability only
D) time spent in scoring and interpretation
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
37
Which source of error variance affects parallel- or alternate-form reliability estimates but does not affect test-retest estimates?

A) fatigue
B) learning
C) practice
D) item sampling
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
38
Which type of reliability estimate would be appropriate only when evaluating the reliability of a test that measures a trait that is presumed to be relatively stable time?

A) parallel-forms
B) alternate-forms
C) test-retest
D) split-half
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
39
Which of the following is true of systematic error?

A) It significantly lowers the reliability of a measure.
B) It insignificantly lowers the reliability of a measure.
C) It increases the reliability of a measure.
D) It has no effect on the reliability of a measure.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
40
An estimate of test-retest reliability is often referred to as a coefficient of stability when the time interval between the test and retest is more than

A) 30 days.
B) 60 days.
C) 3 months.
D) 6 months.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
41
Which BEST conveys the meaning of an inter-scorer reliability estimate of .90?

A) Ninety percent of the scores obtained are reliable.
B) Ninety percent of the variance in the scores assigned by the scorers was attributed to true differences and 10% to error.
C) Ten percent of the variance in the scores assigned by the scorers was attributed to true differences and 90% to error.
D) Ten percent of the test's items are in need of revision according to the majority of the test's users.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
42
If items on a test are measuring very different traits,estimates of reliability yielded from split-half methods will typically be ________ as compared with estimates from KR-20.

A) higher
B) lower
C) similar
D) approximately the same
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
43
Which of the following is generally the preferred statistic for obtaining a measure of internal-consistency reliability?

A) KR-20
B) KR-21
C) Kendall's Tau
D) coefficient alpha
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
44
Which of the following is TRUE about coefficient alpha?

A) Kuder thought it to be single best measure of reliability.
B) It was first conceived by Alfalfa Alpha.
C) It is a characteristic of a particular set of scores, not of the test itself.
D) None of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
45
Typically,adding items to a test will have what effect on the test's reliability?

A) Reliability will decrease.
B) Reliability will increase.
C) Reliability will stay the same.
D) Reliability will first increase and then decrease.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
46
Coefficient alpha is appropriate to use with all of the following test formats EXCEPT

A) multiple-choice.
B) true-false.
C) short-answer for which partial credit is awarded.
D) essay exam with no partial credit awarded.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
47
If items from a test are measuring the same trait,estimates of reliability yielded from split-half methods will typically be ________ as compared to estimates from KR-20.

A) higher
B) lower
C) similar
D) approximately the same
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
48
Which is NOT an assumption that should be met in order to use KR-21?

A) Items should be dichotomous.
B) Items should be of equal difficulty.
C) Items should be homogeneous.
D) Items should be scorable by computer.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
49
Which of the following is NOT an acceptable way to divide a test when using the split-half reliability method?

A) Randomly assign items to each half of the test.
B) Assign odd-numbered items to one half and even-numbered items to the other half of the test.
C) Assign the first-half of the items to one half of the test and the second half of the items to the other half of the test.
D) Assign easy items to one half of the test and difficult items to the other half of the test.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
50
KR-20 is the statistic of choice for tests with which types of items?

A) multiple-choice
B) true-false
C) All of these
D) None of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
51
For determining the reliability of tests scored using nominal scales of measurement,the statistic of choice is

A) Kendall's Tau.
B) the Kappa statistic.
C) KR-20.
D) coefficient alpha.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
52
The KR-21 reliability estimate was developed

A) to yield greater consistency in reliability coefficients.
B) to facilitate computation by hand.
C) for use with less homogeneous items.
D) because Kuder wanted to "one-up" Richardson's 20.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
53
The "20" and "21" in KR-20 and KR-21 represent

A) numbers held constant in the denominator.
B) numbers held constant in the numerator.
C) the order in which the formulas were created.
D) the age of Fred Kuder's son and nephew at the time the formulas were developed.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
54
The Spearman-Brown formula is used for:

A) correcting for one half of the test by estimating the reliability of the whole test.
B) determining how many additional items are needed to increase reliability up to a certain level.
C) determining how many items can be eliminated without reducing reliability below a predetermined level.
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
55
For a heterogeneous test,measures of internal-consistency reliability will tend to be ________ compared with other methods of estimating reliability.

A) higher
B) lower
C) very similar or higher
D) more robust
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
56
Error variance for measures of inter-item consistency comes from

A) fatigue.
B) motivation.
C) a testtaker practice effect.
D) heterogeneity of the content.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
57
A coefficient alpha over .9 may indicate that

A) the items in the test are too dissimilar.
B) the test is not reliable.
C) the items in the test are redundant.
D) the test is biased against low-ability individuals.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
58
Coefficientalpha is an expression of

A) the mean of split-half correlations between odd- and even-numbered items.
B) the mean of split-half correlations between first- and second-half items.
C) the mean of all possible split-half correlations.
D) the mean of the best or "alpha" level split-half correlations.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
59
When more than two scorers are used to determine inter-scorer reliability,the statistic of choice is

A) Pearson r.
B) Spearman's rho.
C) KR-20.
D) coefficient alpha.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
60
A synonym for interscorer reliability is

A) interjudge reliability
B) observer reliability
C) interrater reliability
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
61
If traditional measures of reliability are applied to a criterion-referenced test,the reliability estimate will likely be

A) spuriously low.
B) spuriously high.
C) exactly zero.
D) None of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
62
Which of the following would result in the LEAST appropriate estimate of reliability for a speed test?

A) test-retest
B) alternate-form
C) split-half from a single administration of the test
D) split-half from two independent testing sessions
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
63
Which type(s)of reliability estimates would be most appropriate for a measure of heart rate?

A) test-retest
B) alternate-form
C) parallel form
D) internist consistency
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
64
Classical reliability theory estimates the portion of a test score that is attributed to ________,and domain sampling theory estimates ________.

A) specific sources of variation; error
B) error; specific sources of variation
C) the skills being measured; variation
D) the skills being measured; content knowledge
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
65
Interpretations of criterion-referenced tests are typically made with respect to

A) the total number of items the examinee responded to.
B) the material that the examinee evidenced mastery of.
C) a comparison of the examinee's performance with that of others who took the test.
D) a formula that takes into account the total number of items for which no response was scorable.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
66
Which type(s)of reliability estimates would be appropriate for a speed test?

A) test-retest
B) alternate-form
C) split-half from two independent testing sessions
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
67
Traditional measures of reliability are inappropriate for criterion-referenced tests because variability

A) is maximized with criterion-referenced tests.
B) is minimized with criterion-referenced tests.
C) is variable with criterion-referenced tests.
D) cannot be determined with criterion-referenced tests
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
68
A measure of clerical speed is obtained by a test that has respondents alphabetize index cards.The manual for this test cites a split-half reliability coefficient for a single administration of the test of .95.What might you conclude?

A) The test is highly reliable.
B) The published reliability estimate is spuriously low and would have been higher had another estimate been used.
C) The split-half estimate should not have been used in this instance.
D) Clerical speed is too vague a construct to measure.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
69
The standard deviation of a theoretically normal distribution of test scores obtained by one person on equivalent tests is

A) the standard error of the difference between means.
B) the standard error of measurement.
C) the standard deviation of the reliability coefficient.
D) the variance.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
70
Item response theory (IRT)focuses on the

A) circumstances that inspired the development of the test.
B) test administration variables.
C) individual items of a test.
D) "how and why" of the Interborough Rapid Transit line
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
71
Generalizability theory focuses on which of the following?

A) the circumstances under which a test was developed
B) the circumstances under which a test is administered
C) the circumstances under which a test is interpreted
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
72
Which estimate of reliability is most consistent with the domain sampling theory?

A) test-retest
B) alternate-form
C) internal-consistency
D) interscorer
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
73
Typically,speed tests

A) contain items of a uniform difficulty level.
B) are completed by fewer than 1% of all test-takers.
C) have low validity coefficients.
D) yield high rates of false positives.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
74
If a test is homogeneous

A) it is functionally uniform throughout.
B) it will likely yield a high internal-consistency reliability estimate compared with a test-retest reliability estimate.
C) it would be reasonable to expect a high degree of internal consistency.
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
75
The fact that the length of a test influences the size of the reliability coefficient is based on which theory of measurement?

A) classical test theory (CTT)
B) generalizability theory
C) domain sampling theory
D) item response theory (IRT)
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
76
An estimate of the reliability of a speed test is a measure of

A) the stability of the test.
B) the consistency of the response speed.
C) the homogeneity of the test items.
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
77
The Spearman-Brown formula can be used for which types of tests?

A) speed and multiple-choice
B) true-false and multiple-choice
C) speed, true-false, and multiple-choice
D) trade school and driving tests
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
78
Use of the Spearman-Brown formula would be INAPPROPRIATE to

A) estimate the effect on reliability of shortening a test.
B) determine the number of items needed in a test to obtain the desired level of reliability.
C) estimate the internal consistency of a speed test.
D) All of these
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
79
If a time limit is long enough to allow test-takers to attempt all items,and if some items are so difficult that no test-taker is able to obtain a perfect score,then the test is referred to as a ________ test.

A) speed
B) power
C) reliable
D) valid
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
80
A Kuder-Richardson (KR)or split-half estimate of reliability for a speed test would provide an estimate that is

A) spuriously low.
B) spuriously high.
C) insignificant.
D) equal to a test-retest method.
Unlock Deck
Unlock for access to all 169 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 169 flashcards in this deck.