MEASUREMENT RESEARCH ASSOCIATES
TEST INSIGHTS
May 2009
Greetings
 
Since time is literally money with computer-based testing, it is advisable to know how the difficulty of the multiple choice items included on a test impact the time needed by the candidate to complete the test.

Phil Higgins
Manager, Computer Based Testing


Item Difficulty and Time Usage
Computer-based testing provides the opportunity to track the amount of time spent on an item, how long candidates take to respond initially and later how long they take to review items. To better understand time usage and candidates use of time, item difficulty and time usage was studied.
 
For purposes of this study, the items were divided into three groups based on their percent correct (p-value). Percent correct is usually considered to be a measure of the difficulty of the item.  Group 1 included the difficult items which less than 40% of the candidates answered correctly, Group 2 included the items that 40% to 80% answered correctly, and Group 3 included the items that over 80% of the candidates answered correctly.  The ANOVA found significant differences among groups for both the initial amount of time used per item (F = 7.05 p< .001) and the time used for review per item (F = 15.13, p< .001). More time was required to answer and review the more difficult items. The details of the analysis are shown in the table.
 
While this is a logical outcome, it provides some insight into the amount of time needed for an examination.  When an examination is composed primarily of items in the 40% - 80% range of difficulty, more time is required than when the test includes primarily easy items in the 80%-99% range of difficulty.  With criterion referenced testing, test item difficulty tends to be targeted to the pass point, which when presented in percents may often be around to 60% correct.  Thus, a test with mostly easy items will require less time to complete than a test that is well targeted or contains mostly difficult items.  
 
Not all candidates reviewed all items.  In fact many candidates reviewed very few items.  However, similar patterns of time usage were found for all candidates.  Easier items required less review time than moderate or difficult items.
 
This study used only one data set, so the results may not generalize.  It does provide an indicator of the time needed for candidates to complete an examination by considering the difficulty of the items on the examination.


Descriptive Statistics for Time Usage by % Correct Item Groups in Seconds


Group

Percent Correct

Mean seconds used

Std. Deviation

Minimum

Maximum








Initial time to respond

1

less than 40% (difficult items)

61.47

24.57

26.38

164.86


2

40% to 80% correct (moderate items)

55.56

22.60

18.28

131.26


3

80% or higher correct (easy items)

43.93

18.42

16.53

100.64


Total

Total

54.22

22.97

16.53

164.86








Review time to respond

1

less than 40% (difficult items)

12.67

4.87

3.48

29.88


2

40% to 80% correct  (moderate items)

11.11

5.36

3.11

27.83

 

3

80% or higher correct (easy items)

7.18

3.33

2.15

18.92


Total

Total

10.54

5.19

2.15

29.88


Measurement Research Associates, Inc.
505 North Lake Shore Dr., Suite 1304
Chicago, IL  60611
Phone: (312) 822-9648     Fax: (312) 822-9650



Coming Rasch-related Events
Jan. 25 - March 8, 2023, Wed..-Wed. On-line course: Introductory Rasch Analysis (M. Horton, RUMM2030), medicinehealth.leeds.ac.uk
Apr. 11-12, 2023, Tue.-Wed. International Objective Measurement Workshop (IOMW) 2023, Chicago, IL. iomw.net
June 23 - July 21, 2023, Fri.-Fri. On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com
Aug. 11 - Sept. 8, 2023, Fri.-Fri. On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com