Deck 4: Reliability

Full screen (f)
exit full mode
Question
Because classic test theory assumes a person's true score is the same over time,repeating the same test over and over gives a distribution of scores that reflect what?

A) systematic error
B) random error
C) reliability
D) internal consistency
Use Space or
up arrow
down arrow
to flip the card.
Question
Which of the following is an important distinction between systematic errors and random errors?

A) Random errors are more likely than systematic errors to cause errors in conclusions.
B) Systematic errors occur only in objective measures and random errors occur only in subjective measures.
C) Random errors can be eliminated by careful wording of test items.
D) Systematic errors are extremely rare among psychological tests.
Question
Who developed methods for evaluating sources of error in behavioral research?

A) Edward Thorndike
B) Kuder and Richardson
C) Charles Spearman
D) Cronbach
Question
If we repeatedly administered the same test to the same individual,the standard deviation of the person's score would be the

A) standard error of the mean.
B) variance.
C) reliability of the test.
D) standard error of measurement.
Question
Assuming the "rubber yardstick" shrinks and expands at random,what can be said about the distribution of scores from the rubber yardstick?

A) It will have a mean of zero (0).
B) It will be normal.
C) It will have a standard error of zero (0).
D) It will be skewed.
Question
What is Spearman known for?

A) Working out the basics of reliability theory
B) Developing the notion of sampling error
C) Creating methods for measuring error
D) Developing multivariate analysis
Question
Theoretically,reliability is

A) the correlation of the observed test score with the true score.
B) the square root of the ratio of true to the observed score.
C) the ratio of true to the observed score squared.
D) not possible to define.
Question
Repeated use of the same test typically results in different scores.How does classical test theory account for this?

A) poor test validity
B) systematic variability
C) random error
D) inattention
Question
According to classical test theory,errors of measurement are

A) always overestimates of true score.
B) always underestimates of true score.
C) random.
D) constant.
Question
When talking about errors in terms of psychological testing,we are referring to the fact that:

A) someone got an answer incorrect.
B) there is always some inaccuracy in the measurement.
C) the test was inappropriate for that particular group.
D) the score is too subjective to be accurate.
Question
What is Cronbach known for?

A) Developing measures to evaluate sources of error
B) Creating the basics of multivariate analysis
C) Developed the basics of contemporary measurement theory
D) Distinguished between objective and subjective measures
Question
We can get an idea of how much measurement error is present in a score through the

A) true score.
B) observed score.
C) standard error of the mean.
D) standard error of measurement.
Question
Theoretically,if Susie repeatedly took the 6th grade achievement test,you would be able to find her true score by finding the ____ of the distribution of her scores.

A) mean
B) standard deviation
C) variance
D) standard error of measurement
Question
The work of Charles Spearman combined what two measurement concepts?

A) mean and variance
B) sample statistics and population parameters
C) sampling error and correlation
D) reliability and validity
Question
When creating a test,one generally uses a subset of items to represent a larger construct.This is known as

A) a population parameter.
B) a domain sampling.
C) a sampling error.
D) descriptive statistics.
Question
Classical Test Theory assumes

A) the length of a test has no bearing on its reliability.
B) measurement errors occur systematically.
C) it is not possible to estimate true scores.
D) the distribution of random errors is the same for every respondent.
Question
The basic theory of reliability was first worked out by

A) Karl Pearson.
B) Charles Spearman.
C) Julian Stanley.
D) Lee Cronbach.
Question
Classical Test Theory assumes that

A) errors are systematic.
B) errors are random.
C) true scores cannot be estimated.
D) the length of a test has no bearing on its reliability.
Question
If you have three clocks in your house,and every clock is 10 minutes fast,this is an example of

A) systematic error.
B) random error.
C) measurement error.
D) a rubber yardstick.
Question
An observed score is composed of

A) the residual and the true score.
B) the criterion and the predictor.
C) the measurement error and the predictor.
D) the true score and the measurement error.
Question
Professor Pine constructed five different short history tests by randomly drawing questions from the huge pool of all possible questions about the current material.He has created

A) randomly parallel tests.
B) a large sample size.
C) systematic errors.
D) attenuation effects.
Question
Sources of error associated with time sampling are measured using

A) the test-retest method.
B) the split half method.
C) KR 20.
D) the alpha method.
Question
Tests designed according to item response theory

A) are no longer considered useful.
B) can only be used with non-objective material
C) yield more reliable results with fewer items
D) provide low-tech methods for field use.
Question
In the domain sampling model,the error that is being considered is the error caused by

A) choosing the wrong domain.
B) systematic error.
C) using a limited sample of items.
D) random error.
Question
A reliability coefficient of .60 suggests that

A) 64% of the variance on the test is error.
B) 40% of the variance on the test is error.
C) 78% of the variance on the test is error.
D) the test can be used for clinical purposes but not for research.
Question
Dr.Smith is trying to determine the reliability of a new personality test.Two randomly parallel tests,A and B,have a correlation of .81.What is the estimated reliability of the new personality test?

A) .81
B) -.9
C) .9
D) .81/t
Question
Dr.Janine developed two equivalent forms of a test and administered them both,in counter-balanced order,to a group of people on the same day in order to assess reliability.What is this called?

A) test- retest
B) parallel forms
C) split-half
D) KR 20
Question
Federal government guidelines require that a test be

A) standardized for use among all U.S.sub-populations.
B) factor analyzed before it can be used to make employment decisions.
C) reliable before it can be used to make employment decisions.
D) reliable above the .90 level.
Question
How does the domain sampling model conceptualize reliability?

A) The absolute value of the difference between the standard error of measurement and the variance
B) The ratio of variance of the observed scores on the short version of a test and the variance of the long-run true scores
C) The sum of squares of the difference between the observed and true scores
D) The ratio of the number of sample items to the number of domain items,multiplied by the mean of the sample distribution
Question
The method for estimating the internal consistency of a test that simultaneously considers all possible ways of splitting the items is the

A) Spearman Brown formula.
B) Kuder-Richardson formula.
C) Cronbach's alpha.
D) the odd-even method.
Question
Why might different random samples of domain items yield different estimates of the true score?

A) sampling error
B) poor reliability
C) respondent error
D) item bias
Question
Which of the following would tend to provide the most conservative estimate of split-half reliability?

A) the Phillips method
B) the Spearman-Brown formula
C) coefficient alpha
D) the odd-even reliability coefficient
Question
Upon repeated applications of the same test,performance on the second application may be affected by previous experience on the test.This is known as

A) attenuation.
B) a carryover effect.
C) shrinkage.
D) selected recall.
Question
The difference between David's two typing tests,one at the beginning of the semester and one at the end,reflects the fact that he typed quite a few term papers during the semester.This reflects

A) attenuation.
B) random error.
C) practice effects.
D) domain sampling.
Question
As opposed to reliability based on the classical test theory,____ focuses on the range of item difficulty that is useful in assessing an individual's ability.

A) domain sampling
B) internal consistency
C) coefficient alpha
D) item response theory
Question
If a researcher is attempting to assess the reliability of a measure of depression,the method of choice would be

A) internal consistency.
B) time sampling.
C) the test-retest method.
D) more than one of these.
Question
A split-half correlation,KR 20,and coefficient alpha are all used to evaluate

A) standard errors of measurement.
B) internal consistency.
C) variance.
D) validity.
Question
The problems created by using a limited number of items to represent a larger and more complicated construct are explicitly considered in the ____ model.

A) multivariate
B) random sampling
C) domain sampling
D) standard error of measurement
Question
Suppose you were trying to estimate the reliability of a whole test on the basis of the correlation between scores on the two halves of the test.In order to correct for using scores based on the halves,you might use the

A) KR 20.
B) alpha method.
C) Spearman-Brown formula.
D) split half method.
Question
The Spearman Brown formula corrects for deflated reliability due to

A) half-length tests.
B) small sample size.
C) systematic error.
D) poor test item construction.
Question
The reliability of a difference score is

A) equal to the reliability of the most reliable of the two measures.
B) equal to the reliability of the least reliable of the two measures.
C) the average reliability of the two measures.
D) expected to be lower than the reliability of either of the two measures.
Question
Test constructors can improve test reliability by

A) increasing the number of items.
B) decreasing the number of items.
C) retaining items that have the most face validity.
D) reducing the item to total correlation.
Question
Which of the following is used to estimate the number of items that should be added to a test to achieve a specified reliability?

A) KR 20
B) coefficient alpha
C) Spearman-Brown prophecy formula
D) split-half technique
Question
The difference between KR 20 and coefficient alpha is

A) KR 20 can be used to evaluate time sampling problems while alpha cannot.
B) Alpha can be used to evaluate time sampling problems while KR 20 cannot.
C) KR 20 can only be used for items scored right or wrong but Alpha can be used for items in any format.
D) Alpha can only be used for items scored right or wrong but KR 20 can be used for items in any format.
Question
Approximately what value must a reliability coefficient have for most purposes in basic research?

A) .90
B) .50
C) .70
D) .30
Question
The standard error of measurement allows us to

A) estimate the degree to which a test provides inaccurate readings.
B) have an acceptable margin of error.
C) determine the source of error.
D) avoid any measurement error.
Question
The preferred method for assessing the level of agreement between observers is the

A) kappa statistic
B) Spearman coefficient
C) coefficient alpha
D) rank-order statistic
Question
Difference scores are created by

A) subtracting one test score from another.
B) subtracting the true score from a predicted score.
C) eliminating error from true scores.
D) giving a test to two different individuals.
Question
Correction for attenuation is used

A) to estimate the validity of a test.
B) to correct for tests that are short.
C) to correct for tests that are long.
D) to estimate the true correlation between variables that have been measured with error.
Question
The kappa statistic is used to

A) assess the level of agreement among several observers.
B) estimate the correlation between a continuous variable and an artificially dichotomous variable.
C) estimate the percentage of disagreement between observers.
D) estimate the validity of behavioral observation.
Question
Items are probably measuring the same thing when the correlation between an item and the total score

A) is high.
B) is low.
C) approaches 0.
D) is negative.
Question
What is the impact of carryover effects on test-retest reliability?

A) Test-retest reliability is not influenced by carryover effects.
B) Carryover effects result in an overestimation of reliability.
C) Carryover effects result in an underestimation of reliability.
D) Test-retest reliability increases carryover effects.
Question
Jennifer read a report in which the agreement between raters of children's aggressive behavior was .50,indicating

A) the raters agreed at chance levels.
B) agreement was poor.
C) agreement was excellent.
D) agreement was moderate.
Question
In order to determine the unidimensionality of a test,you can use

A) factor analysis.
B) split half reliability.
C) parallel forms assessment.
D) the Spearman-Brown prophecy formula.
Question
Which of the following is a source of measurement error?

A) respondent sampling
B) scorer sampling
C) internal consistency
D) external consistency
Question
Standard errors of measurement are used to

A) determine whether an observed score is the "true" score.
B) determine the standard deviation of the scores.
C) calculate the exact true score.
D) create confidence intervals around specific observed test scores.
Question
Which of the following is true of the parallel forms method?

A) It is the most often used method for estimating reliability.
B) It provides one of the most rigorous methods for estimating reliability.
C) It is largely ineffective with psychological tests.
D) Sophisticated computer programs have made it unnecessary.
Question
If the same test,given at different points in time to the same test takers,yields different scores,then the method typically used to assess this source of error is

A) test-retest.
B) alternate forms/parallel forms.
C) split-half.
D) KR 20.
Question
Which of the following is a problem in evaluating the agreement between observers in behavioral studies?

A) The observers are usually not trained.
B) The behaviors being studied are usually not directly observable.
C) There will always be some agreement by chance.
D) There is no method for evaluating the agreement between observers.
Question
Measures of test-retest reliability are sometimes considered inappropriate for the evaluation of health status because

A) health status tests should not given at multiple points in time.
B) variations in health status may be related to true changes over time rather than measurement error.
C) there is no domain of health status.
D) health status is too complicated to measure.
Question
The prophecy formula is used to

A) predict expected values.
B) estimate how long a test must be to achieve a desired level of reliability.
C) estimate how long a test must be to achieve a desired level of validity.
D) calculate test reliability.
Question
The formula used to estimate how long a test must be to achieve a desired level of reliability is

A) kappa
B) prophecy
C) Spearman
D) Thorndike
Question
What is the most useful indicator of reliability for the interpretation of individual scores?

A) split-half variance
B) test-retest
C) item sampling
D) standard error of measurement
Question
Tests will be most reliable if they are

A) multidimensional.
B) unidimensional.
C) brief.
D) criterion-referenced.
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/64
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 4: Reliability
1
Because classic test theory assumes a person's true score is the same over time,repeating the same test over and over gives a distribution of scores that reflect what?

A) systematic error
B) random error
C) reliability
D) internal consistency
B
2
Which of the following is an important distinction between systematic errors and random errors?

A) Random errors are more likely than systematic errors to cause errors in conclusions.
B) Systematic errors occur only in objective measures and random errors occur only in subjective measures.
C) Random errors can be eliminated by careful wording of test items.
D) Systematic errors are extremely rare among psychological tests.
A
3
Who developed methods for evaluating sources of error in behavioral research?

A) Edward Thorndike
B) Kuder and Richardson
C) Charles Spearman
D) Cronbach
D
4
If we repeatedly administered the same test to the same individual,the standard deviation of the person's score would be the

A) standard error of the mean.
B) variance.
C) reliability of the test.
D) standard error of measurement.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
5
Assuming the "rubber yardstick" shrinks and expands at random,what can be said about the distribution of scores from the rubber yardstick?

A) It will have a mean of zero (0).
B) It will be normal.
C) It will have a standard error of zero (0).
D) It will be skewed.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
6
What is Spearman known for?

A) Working out the basics of reliability theory
B) Developing the notion of sampling error
C) Creating methods for measuring error
D) Developing multivariate analysis
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
7
Theoretically,reliability is

A) the correlation of the observed test score with the true score.
B) the square root of the ratio of true to the observed score.
C) the ratio of true to the observed score squared.
D) not possible to define.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
8
Repeated use of the same test typically results in different scores.How does classical test theory account for this?

A) poor test validity
B) systematic variability
C) random error
D) inattention
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
9
According to classical test theory,errors of measurement are

A) always overestimates of true score.
B) always underestimates of true score.
C) random.
D) constant.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
10
When talking about errors in terms of psychological testing,we are referring to the fact that:

A) someone got an answer incorrect.
B) there is always some inaccuracy in the measurement.
C) the test was inappropriate for that particular group.
D) the score is too subjective to be accurate.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
11
What is Cronbach known for?

A) Developing measures to evaluate sources of error
B) Creating the basics of multivariate analysis
C) Developed the basics of contemporary measurement theory
D) Distinguished between objective and subjective measures
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
12
We can get an idea of how much measurement error is present in a score through the

A) true score.
B) observed score.
C) standard error of the mean.
D) standard error of measurement.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
13
Theoretically,if Susie repeatedly took the 6th grade achievement test,you would be able to find her true score by finding the ____ of the distribution of her scores.

A) mean
B) standard deviation
C) variance
D) standard error of measurement
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
14
The work of Charles Spearman combined what two measurement concepts?

A) mean and variance
B) sample statistics and population parameters
C) sampling error and correlation
D) reliability and validity
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
15
When creating a test,one generally uses a subset of items to represent a larger construct.This is known as

A) a population parameter.
B) a domain sampling.
C) a sampling error.
D) descriptive statistics.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
16
Classical Test Theory assumes

A) the length of a test has no bearing on its reliability.
B) measurement errors occur systematically.
C) it is not possible to estimate true scores.
D) the distribution of random errors is the same for every respondent.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
17
The basic theory of reliability was first worked out by

A) Karl Pearson.
B) Charles Spearman.
C) Julian Stanley.
D) Lee Cronbach.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
18
Classical Test Theory assumes that

A) errors are systematic.
B) errors are random.
C) true scores cannot be estimated.
D) the length of a test has no bearing on its reliability.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
19
If you have three clocks in your house,and every clock is 10 minutes fast,this is an example of

A) systematic error.
B) random error.
C) measurement error.
D) a rubber yardstick.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
20
An observed score is composed of

A) the residual and the true score.
B) the criterion and the predictor.
C) the measurement error and the predictor.
D) the true score and the measurement error.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
21
Professor Pine constructed five different short history tests by randomly drawing questions from the huge pool of all possible questions about the current material.He has created

A) randomly parallel tests.
B) a large sample size.
C) systematic errors.
D) attenuation effects.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
22
Sources of error associated with time sampling are measured using

A) the test-retest method.
B) the split half method.
C) KR 20.
D) the alpha method.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
23
Tests designed according to item response theory

A) are no longer considered useful.
B) can only be used with non-objective material
C) yield more reliable results with fewer items
D) provide low-tech methods for field use.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
24
In the domain sampling model,the error that is being considered is the error caused by

A) choosing the wrong domain.
B) systematic error.
C) using a limited sample of items.
D) random error.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
25
A reliability coefficient of .60 suggests that

A) 64% of the variance on the test is error.
B) 40% of the variance on the test is error.
C) 78% of the variance on the test is error.
D) the test can be used for clinical purposes but not for research.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
26
Dr.Smith is trying to determine the reliability of a new personality test.Two randomly parallel tests,A and B,have a correlation of .81.What is the estimated reliability of the new personality test?

A) .81
B) -.9
C) .9
D) .81/t
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
27
Dr.Janine developed two equivalent forms of a test and administered them both,in counter-balanced order,to a group of people on the same day in order to assess reliability.What is this called?

A) test- retest
B) parallel forms
C) split-half
D) KR 20
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
28
Federal government guidelines require that a test be

A) standardized for use among all U.S.sub-populations.
B) factor analyzed before it can be used to make employment decisions.
C) reliable before it can be used to make employment decisions.
D) reliable above the .90 level.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
29
How does the domain sampling model conceptualize reliability?

A) The absolute value of the difference between the standard error of measurement and the variance
B) The ratio of variance of the observed scores on the short version of a test and the variance of the long-run true scores
C) The sum of squares of the difference between the observed and true scores
D) The ratio of the number of sample items to the number of domain items,multiplied by the mean of the sample distribution
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
30
The method for estimating the internal consistency of a test that simultaneously considers all possible ways of splitting the items is the

A) Spearman Brown formula.
B) Kuder-Richardson formula.
C) Cronbach's alpha.
D) the odd-even method.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
31
Why might different random samples of domain items yield different estimates of the true score?

A) sampling error
B) poor reliability
C) respondent error
D) item bias
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
32
Which of the following would tend to provide the most conservative estimate of split-half reliability?

A) the Phillips method
B) the Spearman-Brown formula
C) coefficient alpha
D) the odd-even reliability coefficient
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
33
Upon repeated applications of the same test,performance on the second application may be affected by previous experience on the test.This is known as

A) attenuation.
B) a carryover effect.
C) shrinkage.
D) selected recall.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
34
The difference between David's two typing tests,one at the beginning of the semester and one at the end,reflects the fact that he typed quite a few term papers during the semester.This reflects

A) attenuation.
B) random error.
C) practice effects.
D) domain sampling.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
35
As opposed to reliability based on the classical test theory,____ focuses on the range of item difficulty that is useful in assessing an individual's ability.

A) domain sampling
B) internal consistency
C) coefficient alpha
D) item response theory
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
36
If a researcher is attempting to assess the reliability of a measure of depression,the method of choice would be

A) internal consistency.
B) time sampling.
C) the test-retest method.
D) more than one of these.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
37
A split-half correlation,KR 20,and coefficient alpha are all used to evaluate

A) standard errors of measurement.
B) internal consistency.
C) variance.
D) validity.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
38
The problems created by using a limited number of items to represent a larger and more complicated construct are explicitly considered in the ____ model.

A) multivariate
B) random sampling
C) domain sampling
D) standard error of measurement
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
39
Suppose you were trying to estimate the reliability of a whole test on the basis of the correlation between scores on the two halves of the test.In order to correct for using scores based on the halves,you might use the

A) KR 20.
B) alpha method.
C) Spearman-Brown formula.
D) split half method.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
40
The Spearman Brown formula corrects for deflated reliability due to

A) half-length tests.
B) small sample size.
C) systematic error.
D) poor test item construction.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
41
The reliability of a difference score is

A) equal to the reliability of the most reliable of the two measures.
B) equal to the reliability of the least reliable of the two measures.
C) the average reliability of the two measures.
D) expected to be lower than the reliability of either of the two measures.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
42
Test constructors can improve test reliability by

A) increasing the number of items.
B) decreasing the number of items.
C) retaining items that have the most face validity.
D) reducing the item to total correlation.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
43
Which of the following is used to estimate the number of items that should be added to a test to achieve a specified reliability?

A) KR 20
B) coefficient alpha
C) Spearman-Brown prophecy formula
D) split-half technique
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
44
The difference between KR 20 and coefficient alpha is

A) KR 20 can be used to evaluate time sampling problems while alpha cannot.
B) Alpha can be used to evaluate time sampling problems while KR 20 cannot.
C) KR 20 can only be used for items scored right or wrong but Alpha can be used for items in any format.
D) Alpha can only be used for items scored right or wrong but KR 20 can be used for items in any format.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
45
Approximately what value must a reliability coefficient have for most purposes in basic research?

A) .90
B) .50
C) .70
D) .30
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
46
The standard error of measurement allows us to

A) estimate the degree to which a test provides inaccurate readings.
B) have an acceptable margin of error.
C) determine the source of error.
D) avoid any measurement error.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
47
The preferred method for assessing the level of agreement between observers is the

A) kappa statistic
B) Spearman coefficient
C) coefficient alpha
D) rank-order statistic
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
48
Difference scores are created by

A) subtracting one test score from another.
B) subtracting the true score from a predicted score.
C) eliminating error from true scores.
D) giving a test to two different individuals.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
49
Correction for attenuation is used

A) to estimate the validity of a test.
B) to correct for tests that are short.
C) to correct for tests that are long.
D) to estimate the true correlation between variables that have been measured with error.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
50
The kappa statistic is used to

A) assess the level of agreement among several observers.
B) estimate the correlation between a continuous variable and an artificially dichotomous variable.
C) estimate the percentage of disagreement between observers.
D) estimate the validity of behavioral observation.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
51
Items are probably measuring the same thing when the correlation between an item and the total score

A) is high.
B) is low.
C) approaches 0.
D) is negative.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
52
What is the impact of carryover effects on test-retest reliability?

A) Test-retest reliability is not influenced by carryover effects.
B) Carryover effects result in an overestimation of reliability.
C) Carryover effects result in an underestimation of reliability.
D) Test-retest reliability increases carryover effects.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
53
Jennifer read a report in which the agreement between raters of children's aggressive behavior was .50,indicating

A) the raters agreed at chance levels.
B) agreement was poor.
C) agreement was excellent.
D) agreement was moderate.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
54
In order to determine the unidimensionality of a test,you can use

A) factor analysis.
B) split half reliability.
C) parallel forms assessment.
D) the Spearman-Brown prophecy formula.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
55
Which of the following is a source of measurement error?

A) respondent sampling
B) scorer sampling
C) internal consistency
D) external consistency
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
56
Standard errors of measurement are used to

A) determine whether an observed score is the "true" score.
B) determine the standard deviation of the scores.
C) calculate the exact true score.
D) create confidence intervals around specific observed test scores.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
57
Which of the following is true of the parallel forms method?

A) It is the most often used method for estimating reliability.
B) It provides one of the most rigorous methods for estimating reliability.
C) It is largely ineffective with psychological tests.
D) Sophisticated computer programs have made it unnecessary.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
58
If the same test,given at different points in time to the same test takers,yields different scores,then the method typically used to assess this source of error is

A) test-retest.
B) alternate forms/parallel forms.
C) split-half.
D) KR 20.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
59
Which of the following is a problem in evaluating the agreement between observers in behavioral studies?

A) The observers are usually not trained.
B) The behaviors being studied are usually not directly observable.
C) There will always be some agreement by chance.
D) There is no method for evaluating the agreement between observers.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
60
Measures of test-retest reliability are sometimes considered inappropriate for the evaluation of health status because

A) health status tests should not given at multiple points in time.
B) variations in health status may be related to true changes over time rather than measurement error.
C) there is no domain of health status.
D) health status is too complicated to measure.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
61
The prophecy formula is used to

A) predict expected values.
B) estimate how long a test must be to achieve a desired level of reliability.
C) estimate how long a test must be to achieve a desired level of validity.
D) calculate test reliability.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
62
The formula used to estimate how long a test must be to achieve a desired level of reliability is

A) kappa
B) prophecy
C) Spearman
D) Thorndike
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
63
What is the most useful indicator of reliability for the interpretation of individual scores?

A) split-half variance
B) test-retest
C) item sampling
D) standard error of measurement
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
64
Tests will be most reliable if they are

A) multidimensional.
B) unidimensional.
C) brief.
D) criterion-referenced.
Unlock Deck
Unlock for access to all 64 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 64 flashcards in this deck.