Greetings
Since time is literally money with computer-based testing,
it is advisable to know how the difficulty of the multiple choice items
included on a test impact the time needed by the candidate to complete the
test.
Phil Higgins
Manager, Computer Based Testing
|
Item Difficulty and Time Usage
|
Computer-based testing provides the opportunity to track the
amount of time spent on an item, how long candidates take to respond initially and later how long they take to review items. To better understand time
usage and candidates use of time, item difficulty and time usage was studied.
For purposes of this study, the items were divided into
three groups based on their percent correct (p-value). Percent correct is
usually considered to be a measure of the difficulty of the item. Group 1 included the difficult items which
less than 40% of the candidates answered correctly, Group 2 included the items
that 40% to 80% answered correctly, and Group 3 included the items that over
80% of the candidates answered correctly.
The ANOVA found significant differences among groups for both the
initial amount of time used per item (F = 7.05 p< .001) and the time used
for review per item (F = 15.13, p< .001). More time was required to answer and
review the more difficult items. The details of the analysis are shown in the
table.
While this is a logical outcome, it provides some insight
into the amount of time needed for an examination. When an examination is composed primarily of
items in the 40% - 80% range of difficulty, more time is required than when the
test includes primarily easy items in the 80%-99% range of difficulty. With criterion referenced testing, test item
difficulty tends to be targeted to the pass point, which when presented in
percents may often be around to 60% correct.
Thus, a test with mostly easy items will require
less time to complete than a test that is well targeted or contains mostly difficult
items.
Not all candidates reviewed all items. In fact many candidates reviewed very few
items. However, similar patterns of time
usage were found for all candidates.
Easier items required less review time than moderate or difficult items.
This study used only one data set, so the results may
not generalize. It does provide an
indicator of the time needed for candidates to complete an examination by considering
the difficulty of the items on the examination.
Descriptive Statistics for Time Usage by % Correct Item
Groups in Seconds
|
|
Group
|
Percent Correct
|
Mean seconds used
|
Std. Deviation
|
Minimum
|
Maximum
|
|
|
|
|
|
|
|
Initial time to respond
|
1
|
less than 40% (difficult items)
|
61.47
|
24.57
|
26.38
|
164.86
|
|
2
|
40% to 80% correct (moderate items)
|
55.56
|
22.60
|
18.28
|
131.26
|
|
3
|
80% or higher correct (easy items)
|
43.93
|
18.42
|
16.53
|
100.64
|
|
Total
|
Total
|
54.22
|
22.97
|
16.53
|
164.86
|
|
|
|
|
|
|
|
Review time to respond
|
1
|
less than 40% (difficult items)
|
12.67
|
4.87
|
3.48
|
29.88
|
|
2
|
40% to 80% correct
(moderate items)
|
11.11
|
5.36
|
3.11
|
27.83
|
|
3
|
80% or higher correct (easy items)
|
7.18
|
3.33
|
2.15
|
18.92
|
|
Total
|
Total
|
10.54
|
5.19
|
2.15
|
29.88
|
|