Deck 6: Writing and Evaluating Test Items

Full screen (f)
exit full mode
Question
This test item is an example of a

A)polytomous format.
B)dichotomous format.
C)Likert format.
D)category format.
Use Space or
up arrow
down arrow
to flip the card.
Question
Describing the chances that low-ability test takers will obtain each score is called the

A)dichotomous format.
B)polytomous format.
C)guessing threshold.
D)50% threshold.
Question
The expected level of chance performance, for a 200-item multiple-choice exam with four choice alternatives, is

A)25 correct.
B)50 correct.
C)75 correct.
D)100 correct.
Question
Which item format can best be factor analyzed to find which ones group together?

A)multiple-choice
B)Likert
C)dichotomous
D)forced-choice
Question
What describes the chances that a low-ability test taker will obtain each score?

A)acquiescence response set
B)the miss rate
C)guessing threshold
D)the moments method
Question
The following is an item from an attitude scale: ​
     Physical punishment is essential in order to control children.
               Strongly disagree
               Disagree
               Neither agree or disagree
               Agree
               Strongly agree

This item is in the

A)category format.
B)Likert format.
C)dichotomous format.
D)polytomous format.
Question
One problem with the use of category rating scales is that

A)many respondents are confused by dichotomous formats.
B)responses are sometimes influenced by the context in which objects are rated.
C)rating scales must be at least 100 points in order to be meaningfully interpreted.
D)category rating scale data do not have ordinal scale property.
Question
A test format that is typically used for attitude measurement is the

A)checklist format.
B)dichotomous format.
C)category format.
D)Likert format.
Question
The difference between Likert scales and category formats is that

A)category formats are used only in health settings.
B)category formats tends to be dichotomous while Likert scales tends to be polytomous.
C)category formats tend to have a smaller number of choices.
D)Likert scales tend to have a smaller number of choices.
Question
When distractors are likely to be selected as alternative responses on multiple-choice tests,

A)validity is increased.
B)item reliability is increased.
C)item reliability is decreased.
D)guessing is reduced.
Question
One method for measuring chronic pain asks the respondent to group statements according to how accurately they describe his/her discomfort. This would be an example of the

A)Q-sort format.
B)checklist format.
C)Likert format.
D)category format.
Question
The tendency for test takers to agree on most of the items is called a(n)

A)guessing threshold.
B)acquiescence response set.
C)item difficulty.
D)the miss rate.
Question
Suppose you got 75 items correct on a 100-item, six alternative, multiple-choice exam. What would your score be after we corrected for guessing?

A)50
B)57
C)63
D)70
Question
Suppose that you are taking a multiple choice test where there is no correction for guessing. If you aren't sure of the answer,

A)only guess if you have some confidence you are correct.
B)you should always guess on a speed test.
C)you should always guess.
D)you should never guess.
Question
Under what circumstance is it NOT to your advantage to guess on a multiple-choice exam?

A)when you are making a "wild guess" and a correction formula is being used
B)in any test situation where you are making a "wild guess"
C)when you can rule out one or more of the alternatives as being incorrect
D)when the guessing threshold is low
Question
Distractors that are obviously incorrect

A)lower the reliability of the test.
B)increase the reliability of the test.
C)have no impact on the reliability of the test.
D)reduce the likelihood of correct guessing.
Question
True-false examinations use

A)a dichotomous format.
B)a polytomous format.
C)a Likert format.
D)a category format.
Question
In multiple choice examinations, incorrect alternatives are called

A)flags.
B)non-categories.
C)distractors.
D)miss rates.
Question
What format do some personality tests use because it requires an absolute judgment?

A)multiple-choice
B)Likert
C)dichotomous
D)category
Question
In order to correct for guessing

A)a correction formula can be used.
B)distractors should be eliminated.
C)the number of items should be increased.
D)distractors should be increased.
Question
When teachers are initially told that the students they will be teaching are either not very imaginative or are very imaginative, ratings using an adjective checklist will tend to reflect this original assessment. This is an example of

A)the effect of context.
B)visual analogue.
C)low sample size.
D)forced choice effect.
Question
A multiple-choice test with five options has a chance performance level of

A).50.
B).25.
C).20.
D).10.
Question
Which type of item tends to lose reliability and become obsolete over time?

A)factual items
B)skill-based items
C)items based on abstract concepts
D)simple items
Question
Which method involves scoring that is very time consuming?

A)dichotomous format.
B)visual analogue scale.
C)Likert scale.
D)multiple-choice format.
Question
Why have checklists fallen out of favor?

A)They are simplistic.
B)They are prone to error.
C)They are difficult to write well.
D)They cannot be validated.
Question
Which of the following item writing recommendations has research support?

A)All answer options should be plausible.
B)Items should cover important concepts and objectives.
C)All parts of an item or exercise should appear on the same page.
D)There should be an equal number of true and false statements.
Question
The optimum level of item difficulty for a five-alternative multiple choice item is

A).50.
B).60.
C).70.
D).80.
Question
The method of item analysis which looks at the correlation between performance on an item (correct or incorrect)and total test score is

A)the extreme group method.
B)the tetrachoric method.
C)the point-biserial method.
D)the item characteristic curve method.
Question
In the extreme group method of item analysis,

A)point-biserial correlations are used.
B)data from some test-takers are not used in the analysis.
C)only the performance of those who scored extremely well is studied.
D)distractors are eliminated.
Question
Which testing method is popular for measuring self-rated health?

A)q-sort technique
B)visual analogue scale
C)checklists
D)category formats
Question
What is the impact of adding distractors on polytomous item reliability?

A)The number of distractors is inversely related to item reliability.
B)Large numbers of distractors can greatly increase reliability.
C)Adding distractors may not increase reliability if the distractors are implausible.
D)Reliability is optimized when there are 8 to 10 distractors.
Question
How do Likert format tests differ from tests made of dichotomous and polytomous items?

A)Likert format tests require far fewer items to achieve reliability and validity.
B)Likert format items quantify characteristics rather than classifying responses as correct or incorrect.
C)Likert format tests cannot be validated whereas dichotomous and polytomous item tests can be validated.
D)Likert format items require higher literacy levels than do dichotomous and polytomous items.
Question
If the five applicants for the chief financial officer position of ABC Company are highly qualified, the company should use a test that

A)has easier items.
B)discriminates 20% of the time.
C)contains mostly difficult items.
D)contains items ranging in difficulty from .30 to .70.
Question
If 50% of the individuals taking a particular test get a certain item correct, the difficulty (or easiness)level of that item would be

A).05.
B).25.
C).50.
D).10.
Question
For most tests, the maximum amount of information about differences between individuals can be obtained from items in the difficulty range of

A).30 to .70.
B).40 to .80.
C)between .55 and .85.
D)above .90.
Question
The optimal item difficulty of a six-alternative test is

A).50.
B).585.
C).60.
D).625.
Question
Which of the following is a disadvantage of true-false tests?

A)They are typically only useful with simple information.
B)They encourage memorization without understanding.
C)They are difficult to administer.
D)They encourage rapid responding.
Question
When Lupe argued that one of the questions on the five-alternative test was unfairly difficult, the teacher simply replied by saying that the item .60.difficulty was optimal at

A).50.
B).60.
C).625.
D).70.
Question
Which of the following increases the likelihood that students will guess when they are not sure of the correct response on a multiple choice item?

A)when they expect a low grade
B)when the items are easy
C)when the course is a required course
D)when they dislike the subject
Question
As the proportion of people who get an item on a test correct increases, the measure of item difficulty

A)decreases.
B)remains the same.
C)increases.
D)approaches chance.
Question
Professor Plum created class intervals from the test scores for his class. He made a line graph using these intervals on the X-axis and the proportion of students who answered a particular question correctly on the Y-axis. The result is

A)a discrimination index.
B)a correlation index.
C)an item characteristic curve.
D)a histogram.
Question
The proportion of test takers that get a "good" item correct increases as a function of the

A)item characteristic curve.
B)total test score.
C)validity of the test.
D)item difficulty.
Question
In order to evaluate a criterion referenced test, the test was administered to a group of students who had studied a learning unit and to another group who had not studied the learning unit. For each item on the test, the criterion for mastery would be

A)the point-biserial correlation.
B)below the antimode.
C)above the antimode.
D)the validity coefficient.
Question
The approach to test construction in which the item characteristic curve for each individual item is analyzed is called

A)prophecy theory.
B)classical test theory.
C)item response theory.
D)item analysis theory.
Question
In item analysis, the internal criteria against which items are evaluated refers to the

A)discrimination index.
B)total test score.
C)criterion.
D)predictor.
Question
When 100% of the test-takers get an item correct, the item will have a

A)low difficulty index (0%).
B)high discriminability index.
C)discriminability index of approximately .5.
D)very low discriminability index.
Question
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. Which item is inversely related to performance on the test?</strong> A)item a B)item b C)item c D)item d E)item e <div style=padding-top: 35px>
Refer to Exhibit 6-1. Which item is inversely related to performance on the test?

A)item a
B)item b
C)item c
D)item d
E)item e
Question
The average of a series of item characteristic curves is known as

A)the average characteristic curve.
B)the standard error of the item characteristic.
C)a test characteristic curve.
D)the variance ratio curve.
Question
One of the major advantages of tests developed using item response theory is that they

A)can be easily adapted for computer administration.
B)are longer.
C)are easier to administer.
D)can be developed with little effort.
Question
The least frequent score in a frequency polygon is the

A)negative discriminator.
B)discrimination point.
C)antimode.
D)criterion.
Question
When test items are evaluated against total test score, we use a(n)

A)internal criterion.
B)external criterion.
C)multivariate analysis.
D)criterion referenced test.
Question
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. Which item discriminates at various levels of performance?</strong> A)item a B)item b C)item c D)item d E)item e <div style=padding-top: 35px>
Refer to Exhibit 6-1. Which item discriminates at various levels of performance?

A)item a
B)item b
C)item c
D)item d
E)item e
Question
Proponents of criterion-referenced tests have criticized item analysis procedures because they

A)cannot be used for criterion-referenced tests.
B)have statistical flaws.
C)do not provide information about the type of errors that students make.
D)have no relevance for educational tests.
Question
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. Which item is unrelated to total test score performance?</strong> A)item a B)item b C)item c D)item d E)item e <div style=padding-top: 35px>
Refer to Exhibit 6-1. Which item is unrelated to total test score performance?

A)item a
B)item b
C)item c
D)item d
E)item e
Question
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. Which item discriminates well at low levels of performance but not at high levels?</strong> A)item a B)item b C)item c D)item d E)item e <div style=padding-top: 35px>
Refer to Exhibit 6-1. Which item discriminates well at low levels of performance but not at high levels?

A)item a
B)item b
C)item c
D)item d
E)item e
Question
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. In constructing a test, we would most likely eliminate</strong> A)item a B)item b C)item c D)item d E)item e <div style=padding-top: 35px>
Refer to Exhibit 6-1. In constructing a test, we would most likely eliminate

A)item a
B)item b
C)item c
D)item d
E)item e
Question
In experimental psychology, the proportion of the top third of the class that correctly answered the last question of the final was .93 while .89 of the bottom third of the class answered correctly. The professor should decide not to include this question in the next final because the discrimination index indicates

A)negative discrimination.
B)chance level performance.
C)that students were incorrectly prepared.
D)that the item does not discriminate well.
Question
The extreme group method and the point biserial method are both used to estimate

A)reliability.
B)validity.
C)discriminability.
D)difficulty.
Question
Dr. H likes to start off his tests with a few easier items in order to boost the confidence of the test takers. This is an example of

A)human factors.
B)the psychometric properties of the test.
C)optimum item difficulty.
D)item difficulty.
Question
An employment test attempted to find out if individuals who scored high on specific items that assessed an individual's ability to work well in a team related strongly to the test as a whole. The purpose of the study was to evaluate

A)human factors.
B)optimum item difficulty.
C)item discriminability.
D)categories.
Question
What is the first step in developing criterion-referenced tests?

A)Getting expert agreement about how to construct the test
B)Clearly specifying the objectives in precise statements about what is to be learned
C)Identifying the types of people who are likely to take the test and either score very well or very poorly
D)Creating items with a wide range of difficulty and administering them to experts in the field
Question
Which of the following methods is used in the analysis of item discriminability?

A)test-retest
B)extreme group
C)characteristic curves
D)factor analysis
Question
For essay items, the reliability of the scoring procedure should be assessed by determining the association(s)between

A)the item score and score on the overall test.
B)the test score and scores on tests measuring similar constructs.
C)each item on the test.
D)two scores provided by independent scorers.
Question
Peaked conventional tests present items

A)from a wide range of difficulty levels.
B)at optimum difficulty levels.
C)at levels appropriate for the test taker.
D)mostly at or near average difficulty levels.
Question
The proportion of responses that are expected to be correct for each level of ability can be represented by

A)an item characteristic curve.
B)a discrimination index.
C)a test characteristic curve.
D)the optimum difficulty.
Question
How many distractors does this item contain?

A)0
B)1
C)3
D)4
Question
Which of the following is NOT one of DeVellis (2016)'s guidelines for writing test items?

A)Consider mixing positively and negatively worded items.
B)Define clearly what you want to measure.
C)Consider using "double-barreled" items that convey two or more ideas at the same time.
D)Generate an item pool.
Question
To be reliable, a true-false test must

A)use simple items.
B)contain many items.
C)have a high guessing threshold.
D)contain few items.
Question
The most common sort of dichotomous format is

A)multiple choice.
B)Q-sort.
C)Likert.
D)true-false.
Question
In most situations, a good test should contain items

A)from a wide range of difficulty levels.
B)at optimum difficulty levels.
C)at levels appropriate for the test taker.
D)mostly at our near average difficulty levels.
Question
Items that retain their reliability over time tend to be those that focus on

A)skills.
B)abstract concepts.
C)easier concepts.
D)foundational knowledge.
Question
Why should tests include items from a variety of difficulty levels?

A)Students are less likely to guess when there is a wide range of item difficulty.
B)Good tests encourage students to do their best and a range of difficulty helps less confident students.
C)Good tests discriminate at a variety of difficulty levels.
D)Students like to have some easy items because they are more likely to respond correctly by chance.
Question
An item characteristic curve that rises gradually and then turns down for people at the highest levels of performance

A)is likely to occur when students are making a wild guess.
B)can happen when 'none of the above' is one of the multiple choice options.
C)turns down at a point referred to as the antimode.
D)indicates an item with a high level of difficulty.
Question
The difference in proportion of correct responses for each item between the top third of the class and the bottom third of the class is an example of a(n)

A)point biserial correlation.
B)discrimination index.
C)item difficulty.
D)guessing threshold.
Question
The optimal difficulty level for items is usually about halfway between 100% of the respondents getting the item correct and

A)a 50/50 chance for choosing a correct response.
B)the level of success expected by chance alone.
C)the highest possible variance in correct responses for the item set.
D)zero respondents getting the item correct.
Question
Which of the following is true of the characteristic curve for a "good" test item?

A)It is normally distributed.
B)It is bimodal and positively skewed.
C)It has a gradual, positive slope.
D)It is negatively accelerated.
Question
In order to choose questions for a final version of a test, the examiners created a graph with difficulty on one axis and discriminability on the other. The examiners should use the questions that

A)all above the .50-point on the discriminability axis.
B)fall below the .50 point on the discriminability axis.
C)fall between .30 and .70 on difficulty and above .30 on discriminability.
D)fall above the .50 point on discriminability and difficulty.
Question
Item analysis is different from the classical method of reliability because

A)a smaller number of items leads to higher reliability in item analysis.
B)item analysis ignores total score.
C)item analysis describes results for individual items, whereas reliability describes results for the scale as a whole.
D)item analysis generally considers the validity of the test.
Question
Which type of test is especially helpful for evaluating progress in individualized programs of instruction?

A)peaked conventional
B)criterion-referenced
C)rectangular-referenced
D)dichotomous
Question
Your text presents empirical evidence that indicates school children tend to repeat the same kind of errors when given problems of a particular type. This highlights the

A)importance of ranking students.
B)need to provide corrective feedback.
C)concept of 'teaching to the test.'
D)ability of IRT procedures to initiate guidance.
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/86
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 6: Writing and Evaluating Test Items
1
This test item is an example of a

A)polytomous format.
B)dichotomous format.
C)Likert format.
D)category format.
A
2
Describing the chances that low-ability test takers will obtain each score is called the

A)dichotomous format.
B)polytomous format.
C)guessing threshold.
D)50% threshold.
C
3
The expected level of chance performance, for a 200-item multiple-choice exam with four choice alternatives, is

A)25 correct.
B)50 correct.
C)75 correct.
D)100 correct.
B
4
Which item format can best be factor analyzed to find which ones group together?

A)multiple-choice
B)Likert
C)dichotomous
D)forced-choice
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
5
What describes the chances that a low-ability test taker will obtain each score?

A)acquiescence response set
B)the miss rate
C)guessing threshold
D)the moments method
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
6
The following is an item from an attitude scale: ​
     Physical punishment is essential in order to control children.
               Strongly disagree
               Disagree
               Neither agree or disagree
               Agree
               Strongly agree

This item is in the

A)category format.
B)Likert format.
C)dichotomous format.
D)polytomous format.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
7
One problem with the use of category rating scales is that

A)many respondents are confused by dichotomous formats.
B)responses are sometimes influenced by the context in which objects are rated.
C)rating scales must be at least 100 points in order to be meaningfully interpreted.
D)category rating scale data do not have ordinal scale property.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
8
A test format that is typically used for attitude measurement is the

A)checklist format.
B)dichotomous format.
C)category format.
D)Likert format.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
9
The difference between Likert scales and category formats is that

A)category formats are used only in health settings.
B)category formats tends to be dichotomous while Likert scales tends to be polytomous.
C)category formats tend to have a smaller number of choices.
D)Likert scales tend to have a smaller number of choices.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
10
When distractors are likely to be selected as alternative responses on multiple-choice tests,

A)validity is increased.
B)item reliability is increased.
C)item reliability is decreased.
D)guessing is reduced.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
11
One method for measuring chronic pain asks the respondent to group statements according to how accurately they describe his/her discomfort. This would be an example of the

A)Q-sort format.
B)checklist format.
C)Likert format.
D)category format.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
12
The tendency for test takers to agree on most of the items is called a(n)

A)guessing threshold.
B)acquiescence response set.
C)item difficulty.
D)the miss rate.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
13
Suppose you got 75 items correct on a 100-item, six alternative, multiple-choice exam. What would your score be after we corrected for guessing?

A)50
B)57
C)63
D)70
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
14
Suppose that you are taking a multiple choice test where there is no correction for guessing. If you aren't sure of the answer,

A)only guess if you have some confidence you are correct.
B)you should always guess on a speed test.
C)you should always guess.
D)you should never guess.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
15
Under what circumstance is it NOT to your advantage to guess on a multiple-choice exam?

A)when you are making a "wild guess" and a correction formula is being used
B)in any test situation where you are making a "wild guess"
C)when you can rule out one or more of the alternatives as being incorrect
D)when the guessing threshold is low
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
16
Distractors that are obviously incorrect

A)lower the reliability of the test.
B)increase the reliability of the test.
C)have no impact on the reliability of the test.
D)reduce the likelihood of correct guessing.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
17
True-false examinations use

A)a dichotomous format.
B)a polytomous format.
C)a Likert format.
D)a category format.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
18
In multiple choice examinations, incorrect alternatives are called

A)flags.
B)non-categories.
C)distractors.
D)miss rates.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
19
What format do some personality tests use because it requires an absolute judgment?

A)multiple-choice
B)Likert
C)dichotomous
D)category
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
20
In order to correct for guessing

A)a correction formula can be used.
B)distractors should be eliminated.
C)the number of items should be increased.
D)distractors should be increased.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
21
When teachers are initially told that the students they will be teaching are either not very imaginative or are very imaginative, ratings using an adjective checklist will tend to reflect this original assessment. This is an example of

A)the effect of context.
B)visual analogue.
C)low sample size.
D)forced choice effect.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
22
A multiple-choice test with five options has a chance performance level of

A).50.
B).25.
C).20.
D).10.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
23
Which type of item tends to lose reliability and become obsolete over time?

A)factual items
B)skill-based items
C)items based on abstract concepts
D)simple items
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
24
Which method involves scoring that is very time consuming?

A)dichotomous format.
B)visual analogue scale.
C)Likert scale.
D)multiple-choice format.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
25
Why have checklists fallen out of favor?

A)They are simplistic.
B)They are prone to error.
C)They are difficult to write well.
D)They cannot be validated.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
26
Which of the following item writing recommendations has research support?

A)All answer options should be plausible.
B)Items should cover important concepts and objectives.
C)All parts of an item or exercise should appear on the same page.
D)There should be an equal number of true and false statements.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
27
The optimum level of item difficulty for a five-alternative multiple choice item is

A).50.
B).60.
C).70.
D).80.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
28
The method of item analysis which looks at the correlation between performance on an item (correct or incorrect)and total test score is

A)the extreme group method.
B)the tetrachoric method.
C)the point-biserial method.
D)the item characteristic curve method.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
29
In the extreme group method of item analysis,

A)point-biserial correlations are used.
B)data from some test-takers are not used in the analysis.
C)only the performance of those who scored extremely well is studied.
D)distractors are eliminated.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
30
Which testing method is popular for measuring self-rated health?

A)q-sort technique
B)visual analogue scale
C)checklists
D)category formats
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
31
What is the impact of adding distractors on polytomous item reliability?

A)The number of distractors is inversely related to item reliability.
B)Large numbers of distractors can greatly increase reliability.
C)Adding distractors may not increase reliability if the distractors are implausible.
D)Reliability is optimized when there are 8 to 10 distractors.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
32
How do Likert format tests differ from tests made of dichotomous and polytomous items?

A)Likert format tests require far fewer items to achieve reliability and validity.
B)Likert format items quantify characteristics rather than classifying responses as correct or incorrect.
C)Likert format tests cannot be validated whereas dichotomous and polytomous item tests can be validated.
D)Likert format items require higher literacy levels than do dichotomous and polytomous items.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
33
If the five applicants for the chief financial officer position of ABC Company are highly qualified, the company should use a test that

A)has easier items.
B)discriminates 20% of the time.
C)contains mostly difficult items.
D)contains items ranging in difficulty from .30 to .70.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
34
If 50% of the individuals taking a particular test get a certain item correct, the difficulty (or easiness)level of that item would be

A).05.
B).25.
C).50.
D).10.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
35
For most tests, the maximum amount of information about differences between individuals can be obtained from items in the difficulty range of

A).30 to .70.
B).40 to .80.
C)between .55 and .85.
D)above .90.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
36
The optimal item difficulty of a six-alternative test is

A).50.
B).585.
C).60.
D).625.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
37
Which of the following is a disadvantage of true-false tests?

A)They are typically only useful with simple information.
B)They encourage memorization without understanding.
C)They are difficult to administer.
D)They encourage rapid responding.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
38
When Lupe argued that one of the questions on the five-alternative test was unfairly difficult, the teacher simply replied by saying that the item .60.difficulty was optimal at

A).50.
B).60.
C).625.
D).70.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
39
Which of the following increases the likelihood that students will guess when they are not sure of the correct response on a multiple choice item?

A)when they expect a low grade
B)when the items are easy
C)when the course is a required course
D)when they dislike the subject
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
40
As the proportion of people who get an item on a test correct increases, the measure of item difficulty

A)decreases.
B)remains the same.
C)increases.
D)approaches chance.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
41
Professor Plum created class intervals from the test scores for his class. He made a line graph using these intervals on the X-axis and the proportion of students who answered a particular question correctly on the Y-axis. The result is

A)a discrimination index.
B)a correlation index.
C)an item characteristic curve.
D)a histogram.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
42
The proportion of test takers that get a "good" item correct increases as a function of the

A)item characteristic curve.
B)total test score.
C)validity of the test.
D)item difficulty.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
43
In order to evaluate a criterion referenced test, the test was administered to a group of students who had studied a learning unit and to another group who had not studied the learning unit. For each item on the test, the criterion for mastery would be

A)the point-biserial correlation.
B)below the antimode.
C)above the antimode.
D)the validity coefficient.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
44
The approach to test construction in which the item characteristic curve for each individual item is analyzed is called

A)prophecy theory.
B)classical test theory.
C)item response theory.
D)item analysis theory.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
45
In item analysis, the internal criteria against which items are evaluated refers to the

A)discrimination index.
B)total test score.
C)criterion.
D)predictor.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
46
When 100% of the test-takers get an item correct, the item will have a

A)low difficulty index (0%).
B)high discriminability index.
C)discriminability index of approximately .5.
D)very low discriminability index.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
47
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. Which item is inversely related to performance on the test?</strong> A)item a B)item b C)item c D)item d E)item e
Refer to Exhibit 6-1. Which item is inversely related to performance on the test?

A)item a
B)item b
C)item c
D)item d
E)item e
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
48
The average of a series of item characteristic curves is known as

A)the average characteristic curve.
B)the standard error of the item characteristic.
C)a test characteristic curve.
D)the variance ratio curve.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
49
One of the major advantages of tests developed using item response theory is that they

A)can be easily adapted for computer administration.
B)are longer.
C)are easier to administer.
D)can be developed with little effort.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
50
The least frequent score in a frequency polygon is the

A)negative discriminator.
B)discrimination point.
C)antimode.
D)criterion.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
51
When test items are evaluated against total test score, we use a(n)

A)internal criterion.
B)external criterion.
C)multivariate analysis.
D)criterion referenced test.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
52
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. Which item discriminates at various levels of performance?</strong> A)item a B)item b C)item c D)item d E)item e
Refer to Exhibit 6-1. Which item discriminates at various levels of performance?

A)item a
B)item b
C)item c
D)item d
E)item e
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
53
Proponents of criterion-referenced tests have criticized item analysis procedures because they

A)cannot be used for criterion-referenced tests.
B)have statistical flaws.
C)do not provide information about the type of errors that students make.
D)have no relevance for educational tests.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
54
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. Which item is unrelated to total test score performance?</strong> A)item a B)item b C)item c D)item d E)item e
Refer to Exhibit 6-1. Which item is unrelated to total test score performance?

A)item a
B)item b
C)item c
D)item d
E)item e
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
55
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. Which item discriminates well at low levels of performance but not at high levels?</strong> A)item a B)item b C)item c D)item d E)item e
Refer to Exhibit 6-1. Which item discriminates well at low levels of performance but not at high levels?

A)item a
B)item b
C)item c
D)item d
E)item e
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
56
Exhibit 6-1
<strong>Exhibit 6-1   Refer to Exhibit 6-1. In constructing a test, we would most likely eliminate</strong> A)item a B)item b C)item c D)item d E)item e
Refer to Exhibit 6-1. In constructing a test, we would most likely eliminate

A)item a
B)item b
C)item c
D)item d
E)item e
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
57
In experimental psychology, the proportion of the top third of the class that correctly answered the last question of the final was .93 while .89 of the bottom third of the class answered correctly. The professor should decide not to include this question in the next final because the discrimination index indicates

A)negative discrimination.
B)chance level performance.
C)that students were incorrectly prepared.
D)that the item does not discriminate well.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
58
The extreme group method and the point biserial method are both used to estimate

A)reliability.
B)validity.
C)discriminability.
D)difficulty.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
59
Dr. H likes to start off his tests with a few easier items in order to boost the confidence of the test takers. This is an example of

A)human factors.
B)the psychometric properties of the test.
C)optimum item difficulty.
D)item difficulty.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
60
An employment test attempted to find out if individuals who scored high on specific items that assessed an individual's ability to work well in a team related strongly to the test as a whole. The purpose of the study was to evaluate

A)human factors.
B)optimum item difficulty.
C)item discriminability.
D)categories.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
61
What is the first step in developing criterion-referenced tests?

A)Getting expert agreement about how to construct the test
B)Clearly specifying the objectives in precise statements about what is to be learned
C)Identifying the types of people who are likely to take the test and either score very well or very poorly
D)Creating items with a wide range of difficulty and administering them to experts in the field
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
62
Which of the following methods is used in the analysis of item discriminability?

A)test-retest
B)extreme group
C)characteristic curves
D)factor analysis
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
63
For essay items, the reliability of the scoring procedure should be assessed by determining the association(s)between

A)the item score and score on the overall test.
B)the test score and scores on tests measuring similar constructs.
C)each item on the test.
D)two scores provided by independent scorers.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
64
Peaked conventional tests present items

A)from a wide range of difficulty levels.
B)at optimum difficulty levels.
C)at levels appropriate for the test taker.
D)mostly at or near average difficulty levels.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
65
The proportion of responses that are expected to be correct for each level of ability can be represented by

A)an item characteristic curve.
B)a discrimination index.
C)a test characteristic curve.
D)the optimum difficulty.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
66
How many distractors does this item contain?

A)0
B)1
C)3
D)4
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
67
Which of the following is NOT one of DeVellis (2016)'s guidelines for writing test items?

A)Consider mixing positively and negatively worded items.
B)Define clearly what you want to measure.
C)Consider using "double-barreled" items that convey two or more ideas at the same time.
D)Generate an item pool.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
68
To be reliable, a true-false test must

A)use simple items.
B)contain many items.
C)have a high guessing threshold.
D)contain few items.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
69
The most common sort of dichotomous format is

A)multiple choice.
B)Q-sort.
C)Likert.
D)true-false.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
70
In most situations, a good test should contain items

A)from a wide range of difficulty levels.
B)at optimum difficulty levels.
C)at levels appropriate for the test taker.
D)mostly at our near average difficulty levels.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
71
Items that retain their reliability over time tend to be those that focus on

A)skills.
B)abstract concepts.
C)easier concepts.
D)foundational knowledge.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
72
Why should tests include items from a variety of difficulty levels?

A)Students are less likely to guess when there is a wide range of item difficulty.
B)Good tests encourage students to do their best and a range of difficulty helps less confident students.
C)Good tests discriminate at a variety of difficulty levels.
D)Students like to have some easy items because they are more likely to respond correctly by chance.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
73
An item characteristic curve that rises gradually and then turns down for people at the highest levels of performance

A)is likely to occur when students are making a wild guess.
B)can happen when 'none of the above' is one of the multiple choice options.
C)turns down at a point referred to as the antimode.
D)indicates an item with a high level of difficulty.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
74
The difference in proportion of correct responses for each item between the top third of the class and the bottom third of the class is an example of a(n)

A)point biserial correlation.
B)discrimination index.
C)item difficulty.
D)guessing threshold.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
75
The optimal difficulty level for items is usually about halfway between 100% of the respondents getting the item correct and

A)a 50/50 chance for choosing a correct response.
B)the level of success expected by chance alone.
C)the highest possible variance in correct responses for the item set.
D)zero respondents getting the item correct.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
76
Which of the following is true of the characteristic curve for a "good" test item?

A)It is normally distributed.
B)It is bimodal and positively skewed.
C)It has a gradual, positive slope.
D)It is negatively accelerated.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
77
In order to choose questions for a final version of a test, the examiners created a graph with difficulty on one axis and discriminability on the other. The examiners should use the questions that

A)all above the .50-point on the discriminability axis.
B)fall below the .50 point on the discriminability axis.
C)fall between .30 and .70 on difficulty and above .30 on discriminability.
D)fall above the .50 point on discriminability and difficulty.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
78
Item analysis is different from the classical method of reliability because

A)a smaller number of items leads to higher reliability in item analysis.
B)item analysis ignores total score.
C)item analysis describes results for individual items, whereas reliability describes results for the scale as a whole.
D)item analysis generally considers the validity of the test.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
79
Which type of test is especially helpful for evaluating progress in individualized programs of instruction?

A)peaked conventional
B)criterion-referenced
C)rectangular-referenced
D)dichotomous
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
80
Your text presents empirical evidence that indicates school children tend to repeat the same kind of errors when given problems of a particular type. This highlights the

A)importance of ranking students.
B)need to provide corrective feedback.
C)concept of 'teaching to the test.'
D)ability of IRT procedures to initiate guidance.
Unlock Deck
Unlock for access to all 86 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 86 flashcards in this deck.