Weighted Penalty Model for Content Balancing in CATS

Similar documents
Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

MTBF PREDICTION REPORT

Comparison of Two Measurement Devices I. Fundamental Ideas.

Calculation of the received voltage due to the radiation from multiple co-frequency sources

Uncertainty in measurements of power and energy on power networks

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

A NSGA-II algorithm to solve a bi-objective optimization of the redundancy allocation problem for series-parallel systems

Research of Dispatching Method in Elevator Group Control System Based on Fuzzy Neural Network. Yufeng Dai a, Yun Du b

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

Network Reconfiguration in Distribution Systems Using a Modified TS Algorithm

STATISTICS. is given by. i i. = total frequency, d i. = x i a ANIL TUTORIALS. = total frequency and d i. = total frequency, h = class-size

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

A Preliminary Study on Targets Association Algorithm of Radar and AIS Using BP Neural Network

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

TECHNICAL NOTE TERMINATION FOR POINT- TO-POINT SYSTEMS TN TERMINATON FOR POINT-TO-POINT SYSTEMS. Zo = L C. ω - angular frequency = 2πf

High Speed ADC Sampling Transients

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

NOVEL ITERATIVE TECHNIQUES FOR RADAR TARGET DISCRIMINATION

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

NETWORK 2001 Transportation Planning Under Multiple Objectives

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

Priority based Dynamic Multiple Robot Path Planning

PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

Side-Match Vector Quantizers Using Neural Network Based Variance Predictor for Image Coding

ANNUAL OF NAVIGATION 11/2006

Optimal Sizing and Allocation of Residential Photovoltaic Panels in a Distribution Network for Ancillary Services Application

Keywords LTE, Uplink, Power Control, Fractional Power Control.

Discussion on How to Express a Regional GPS Solution in the ITRF

A MODIFIED DIRECTIONAL FREQUENCY REUSE PLAN BASED ON CHANNEL ALTERNATION AND ROTATION

Webinar Series TMIP VISION

Arterial Travel Time Estimation Based On Vehicle Re-Identification Using Magnetic Sensors: Performance Analysis

The Spectrum Sharing in Cognitive Radio Networks Based on Competitive Price Game

Learning Ensembles of Convolutional Neural Networks

Graph Method for Solving Switched Capacitors Circuits

Control Chart. Control Chart - history. Process in control. Developed in 1920 s. By Dr. Walter A. Shewhart

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

problems palette of David Rock and Mary K. Porter 6. A local musician comes to your school to give a performance

PERFORMANCE COMPARISON OF THREE ALGORITHMS FOR TWO-CHANNEL SINEWAVE PARAMETER ESTIMATION: SEVEN PARAMETER SINE FIT, ELLIPSE FIT, SPECTRAL SINC FIT

Opportunistic Beamforming for Finite Horizon Multicast

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

A New Type of Weighted DV-Hop Algorithm Based on Correction Factor in WSNs

Multichannel Frequency Comparator VCH-315. User Guide

Chaotic Filter Bank for Computer Cryptography

Understanding the Spike Algorithm

A TWO-PLAYER MODEL FOR THE SIMULTANEOUS LOCATION OF FRANCHISING SERVICES WITH PREFERENTIAL RIGHTS

Rational Secret Sharing without Broadcast

Distributed Channel Allocation Algorithm with Power Control

High Speed, Low Power And Area Efficient Carry-Select Adder

An Adaptive Over-current Protection Scheme for MV Distribution Networks Including DG

antenna antenna (4.139)

A Simple Satellite Exclusion Algorithm for Advanced RAIM

Applying Rprop Neural Network for the Prediction of the Mobile Station Location

RC Filters TEP Related Topics Principle Equipment

Queuing-Based Dynamic Channel Selection for Heterogeneous Multimedia Applications over Cognitive Radio Networks

Traffic balancing over licensed and unlicensed bands in heterogeneous networks

Vectorless Analysis of Supply Noise Induced Delay Variation

Estimating Mean Time to Failure in Digital Systems Using Manufacturing Defective Part Level

On the Feasibility of Receive Collaboration in Wireless Sensor Networks

Section on Survey Research Methods JSM 2008

Adaptive System Control with PID Neural Networks

The Synthesis of Dependable Communication Networks for Automotive Systems

Review: Our Approach 2. CSC310 Information Theory

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

Evaluate the Effective of Annular Aperture on the OTF for Fractal Optical Modulator

Study of the Improved Location Algorithm Based on Chan and Taylor

Inverse Halftoning Method Using Pattern Substitution Based Data Hiding Scheme

Power System State Estimation Using Phasor Measurement Units

FEATURE SELECTION FOR SMALL-SIGNAL STABILITY ASSESSMENT

Mooring Cost Sensitivity Study Based on Cost-Optimum Mooring Design

@IJMTER-2015, All rights Reserved 383

Secure Transmission of Sensitive data using multiple channels

Prevention of Sequential Message Loss in CAN Systems

Biases in Earth radiation budget observations 2. Consistent scene identification and anisotropic factors

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

* wivecrest Corporation 1715 Technology Dr., Suite 400 Saq Jose, CA w avecrestcorp. corn

MODIFIED HALF SAMPLE VARIANCE ESTIMATION FOR MEDIAN SALES PRICES OF SOLD HOUSES: EFFECTS OF DATA GROUPING METHODS

Appendix E: The Effect of Phase 2 Grants

Adaptive Phase Synchronisation Algorithm for Collaborative Beamforming in Wireless Sensor Networks

White Paper. OptiRamp Model-Based Multivariable Predictive Control. Advanced Methodology for Intelligent Control Actions

Exploiting Dynamic Workload Variation in Low Energy Preemptive Task Scheduling

AIR FORCE INSTITUTE OF TECHNOLOGY

A MODIFIED DIFFERENTIAL EVOLUTION ALGORITHM IN SPARSE LINEAR ANTENNA ARRAY SYNTHESIS

Safety and resilience of Global Baltic Network of Critical Infrastructure Networks related to cascading effects

HUAWEI TECHNOLOGIES CO., LTD. Huawei Proprietary Page 1

Performance Analysis of the Weighted Window CFAR Algorithms

PRO- CRIMPER* III Hand Crimping

A Spreading Sequence Allocation Procedure for MC-CDMA Transmission Systems

Performance of Some Ridge Parameters for Probit Regression:

Latency Insertion Method (LIM) for IR Drop Analysis in Power Grid

AN ALTERNATE CUT-OFF FREQUENCY FOR THE RESPONSE SPECTRUM METHOD OF SEISMIC ANALYSIS

PRO- CRIMPER* III Hand Crimping

Generalized Incomplete Trojan-Type Designs with Unequal Cell Sizes

onlinecomponents.com

Resource Allocation Optimization for Device-to- Device Communication Underlaying Cellular Networks

Test 2. ECON3161, Game Theory. Tuesday, November 6 th

Data Compression for Multiple Parameter Estimation with Application to TDOA/FDOA Emitter Location

Ensemble Evolution of Checkers Players with Knowledge of Opening, Middle and Endgame

Modeling Hierarchical Event Streams in System Level Performance Analysis

Chapter 2 Two-Degree-of-Freedom PID Controllers Structures

Transcription:

Weghted Penalty Model for Content Balancng n CATS Chngwe Davd Shn Yuehme Chen Walter Denny Way Len Swanson Aprl 2009 Usng assessment and research to promote learnng

WPM for CAT Content Balancng 2 Abstract Ths research report proposes a new model called the Weghted Penalty Model (WPM) for content balancng n computer adaptve testng. The WPM approach attempts to balance content propertes across all content categores as well as other non-statstcal constrants, whle smultaneously consderng tem nformaton at each tem-selecton level and the scarcty of tems relatve to some constrants. Ths s accomplshed by assgnng a penalty value to each elgble tem n the tem pool. An tem wll be deemed as more desrable for selecton f 1) ts penalty value s small and 2) t wll not make any constrant volaton when admnstered. The purpose of ths study s to present the WPM approach and demonstrate ts performance usng smulaton wth real tem pool data.

WPM for CAT Content Balancng 3 Introducton Content balancng has been one of the bggest concerns n the mplementaton of computerzed adaptve testng (CAT). Ideally, the selecton of next tems for a CAT should be based on the tem nformaton gven the current estmate of profcency, subect to the sometmes competng requrement of balancng the test content specfcatons. To date, some content balancng methods provde a way not only to balance content categores but also to balance other constrants, such as an overlap constrant, tem set constrant, key dstrbuton constrant, and others. The constrants mentoned above are sometmes called non-statstcal constrants. In contrast wth non-statstcal constrants, the tem nformaton, tem dffculty, and tem dscrmnaton are statstcal constrants. Several methods of content balancng have been developed and studed, such as the Constraned CAT (CCAT; Kngsbury & Zara, 1989), the Weghted Devatons Model (WDM; Stockng & Swanson, 1993) and the shadow test approach (STA; van der Lnden, 2005). The CCAT, a straghtforward method, looks for the content category for whch the cumulatve percentages of admnstered tems currently s farthest below ts target percentage. The WDM method, whch s much more complcated, balances the constrants by weghtng each constrant and computng the devatons from the desred test propertes usng bnary programmng. (Note that constrants and propertes are used nterchangeably n ths study.) The STA selects tems from a shadow test that s a lnear test assembled pror to the selecton of each tem. The WDM and STA methods are smlar n that both are based on proectons of the future consequences of selectng an tem. However, they dffer n that the WDM calculates a proecton of a weghted sum of the propertes of the eventual test and the STA calculates a proecton (shadow test) of a realzaton of the full test. In ths study, we propose a content balancng method called the Weghted Penalty Model (WPM). Based on an approach orgnally proposed by Segall and Davey (1995), ths method attempts to balance content propertes across all content categores as well as other non-statstcal constrants. At the same tme, t consders tem nformaton at each temselecton level and the scarcty of tems relatve to some constrants (that s, the degree to whch tems wth propertes assocated wth partcular constrants are suffcently represented n the pool). Ths s accomplshed by assgnng a penalty value to each elgble tem n the tem pool at each tem-selecton level. Items wth smaller penalty values are deemed more

WPM for CAT Content Balancng 4 desrable for selecton. The penalty functon used by WPM s an adusted verson of the orgnal penalty functon proposed by Segall and Davey (1995) and s referred to as the adusted penalty functon. The purpose of ths study s to ntroduce the WPM approach and demonstrate ts performance usng smulaton wth real tem pool data and usng emprcal data from largescale placement CAT. Weghted Penalty Model The WPM s mplemented by formng a lst of tems to be canddates for the next tem admnstered. The WPM nvolves three stages: 1) calculatng the weghted penalty value for each elgble tem n the pool; 2) assgnng each elgble tem nto dfferent groups (we refer to these as color groups ); and 3) formng a lst of canddate tems. If an tem exposure control method s used, one of the canddate tems from the lst s selected based on the specfc tem exposure control method. Otherwse, the frst tem n the lst s selected to be admnstered. Calculatng the Weghted Penalty Value The defntons and formulas for calculatng the weghted penalty value are as follows: Defntons: For each constrant, defne Upper as the upper bound of the proporton of tems n the test that should have the property assocated wth constrant ; Lower as the lower bound of the proporton of tems n the test that should have the property assocated wth constrant ; Md as the mdpont between Upper and Lower ; and Prevalence as the proporton of the tems n the pool havng the property assocated wth constrant. For example, f the test length s 20 tems, the upper bound s 4 tems, the lower bound s 0 tems, the pool s 100 tems, and 30 tems n the pool have the property assocated wth constrant, then Upper = 0.2 (4/20), Lower = 0 (0/20), Md = 0.1, and Prevalence = 0.3 (30/100).

WPM for CAT Content Balancng 5 At any pont n the test, to obtan the weghted penalty value for each tem, the followng steps are taken: 1. Compute Prop, whch s the expected proporton of tems wth constrant that wll have been admnstered f all remanng tems n the test are selected n proporton to ther prevalence. That s, Prop = ( nadm + Prevalence nremanng)/ testlength, (1) where nadm s the number of tems admnstered so far havng ths property, nremanng s the number of tems remanng to be admnstered n the test (ncludng ths one), and testlength s the length of the test. 2. Compute X, whch s the expected dfference between Prop and the constrant target, Md, across the full length of the test. Thus, X = ( Prop Md ). (2) 3. For each elgble tem, compute the penalty value for each constrant usng one of Equatons (3) to (5) below: 1 D 2 P = X + Z, f Prop kd k < Lower, (3) where D s Lower Md, k s arbtrary but has been chosen to be 2, and Z s 1 f tem has property, otherwse Z s 0. where P X A Z 1 2 = + ka k A s Upper Md, f Prop Upper, (4) and, agan, k s arbtrarly chosen to be 2. P = X Z, f Upper > Prop Lower. (5) 4. For each tem, compute the total content penalty value that takes nto account all the content constrants of tem : J F = P w, (6) = 1

WPM for CAT Content Balancng 6 where w s the weght for constrant. 5. Standardze the total content constrant penalty value: F mn( F ) F =, max( F ) mn( F ) (7) where mn( F ) and max( F ) are the mnmum and maxmum F over all elgble tems, respectvely. 6. Gven ˆ as the current estmate of the ablty for each tem, compute the standardzed tem nformaton value: ( ˆ ˆ I ) SI ( ) =, (8) I ( ˆ ) where max I ( ˆ) s the nformaton value of tem gven ˆ and nformaton value across all elgble tems gven ˆ. I ( ˆ ) max s the maxmum 7. Compute the nformaton penalty value that takes nto account the nformaton: F ˆ 2 = SI( ). (9) 8. Fnally, compute the weghted penalty value: F = w F + w F, (10) where w and w are the weghts for F and F, respectvely. The weghts, w and w, referred to as the content constrant weght and the tem nformaton weght, respectvely, can dffer across the sequence of tems selected. For each tem selected, these weghts act as control parameters (van der Lnden, 2005), whch control the trade-off between the content constrant and the tem nformaton. We have found t useful to set w usng functons of the tem sequence number n the test. The use of dfferent functons results n varous patterns for the relatve weghts of set w versus w.

WPM for CAT Content Balancng 7 Fgure 1 llustrates four dfferent nformaton weght patterns. Two of the patterns are based on a logstc functon (logstc and logstc+2). In ths pattern the nformaton weght ncreases slowly n the begnnng and at the end but rapdly n the mddle. An addtonal two patterns are based on a quadratc functon (quadratc and quadratc+2); n ths pattern, the nformaton weght ncreases slowly n the begnnng and mddle but rapdly at the end. The content constrant weght s set to be a constant value of 5. When the value of the content constrant weght s larger than the value of nformaton weght, the CAT algorthm wll tend to select the tems that better ft the content constrants. When the value of the content constrant weght s smaller than the value of the nformaton weght, the CAT algorthm wll tend to select tems that maxmze nformaton. Informaton Weght Values 14 12 10 8 6 4 2 logstc quadratc logstc+2 quadratc+2 Content constrant weght 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Item Number Fgure 1. Alternate Informaton Weghts as a Functon of Item Sequence Number These weghts temper the sometmes undesrable characterstc of maxmum nformaton based tem-selecton that s, the tendency to choose only those tems wth the most desrable statstcal propertes. Employng a functon that emphaszes content over nformaton at the begnnng of the adaptve test s an dea that s dfferent n practce but smlar n concept to the a-stratfed tem-selecton method (Chang, Qan, & Yng, 2001). (In the a-stratfed tem-selecton method, the tems are stratfed nto a number of levels based

WPM for CAT Content Balancng 8 on the a-parameters. The early stages of a test uses tems wth lower a s and later stages use tems wth hgher a s.) The flexblty and ease wth whch w can be vared to change the relatve emphass of content and nformaton s a dstnct advantage of the WPM. Assgnng Items nto Dfferent Color Groups As an tem s assocated wth more than one constrant, some of the constrants assocated wth a specfc tem often are under ther correspondng lower bounds (.e., more desrable to be selected) whle the other constrants assocated wth the same tem are at or beyond ther correspondng upper bounds (.e., less desrable to be selected). For such tems wth both knds of constrants (more desrable and less desrable), the algorthm lkely would select one of them and cause the upper boundary volaton. A groupng method was developed to avod selectng an tem that would cause any content volaton whle there are stll other tems that would not cause the same ssue f selected. In the groupng method, frst, a flag s assgned to each of the constrants based on the number of tems that have been admnstered so far and the constrants upper and lower boundares. To assgn a flag to each of the constrants at each tem-selecton level, the followng rules are used: 1. If the lower bound of the constrant has not been reached, A s assgned to ths specfc constrant; 2. If the lower bound of the constrant has been reached but not the upper bound, B s assgned to ths specfc constrant; and 3. If the upper bound has been ether reached or exceeded, C s assgned to ths specfc constrant. After all of the constrants have been assgned flags, each elgble tem n the pool wll be place n a color group based on the flags of ts assocated constrants. The rules are: 1. If the flags of the assocated constrants for an tem are all A or the combnaton of A and B, ths tem s assgned to the green group; 2. If the flags of the assocated constrants for an tem are ether the combnaton of A, B, and C or the combnaton of A and C, ths tem s assgned to the orange group;

WPM for CAT Content Balancng 9 3. If the flags of the assocated constrants for an tem are all B, ths tem s assgned to the yellow group; and 4. If the flags of the assocated constrants for an tem are ether the combnaton of C and B or all C, ths tem wll be assgned to the red group. Formng a Lst After all the elgble tems n the pool have been assgned to color groups, the lst wll be formed accordng to the followng rules: 1. Between color groups, the order wll be green, orange, yellow, and red; and 2. Wthn each color group, the tems are ordered by the weghted penalty values from smallest to the largest. Item-selecton Procedure After formng a lst of tems usng WPM based on the tem exposure control method used, one of the tems from the lst s selected to be admnstered. In ths study, two tem exposure control methods are adopted: the Condtonal Randomesque (CR) method and the Stockng and Lews Condtonal Multnomal (SLCM; Stockng and Lews, 2000) method. The randomesque strategy (Kngsbury and Zara, 1989) randomly selects the next tem to be admnstered from the group of the most nformatve tems, gven the current estmated theta, where the group sze s predetermned (e.g. 2, 3, 4, 10). The CR method n ths study s the varaton of the regular randomesque strategy. Gven the current estmated theta, the CR method selects the next tem from a group of tems, where group sze s predetermned for that ablty range. (For example, 3, 4, 4, 5, 4, 3 are the group szes for the 6 theta ranges f the whole theta scale s dvded nto 6 ranges.) The rest of tems n that group that are not selected wll be blocked from further tem selecton. In dong so, the CR strategy allows a preset maxmum exposure rate for each ablty range to be stpulated and provdes a reasonable assurance that the maxmum exposure rate wll be constraned to that level. The SLCM strategy drectly controls the tem exposure rate condtonal on estmated theta by dervng an exposure-control parameter for each tem at each ablty level. The exposure-control parameter s valued from 0 to 1. Once the lst of tems has been formed usng the WPM method, the frst k tems from the lst are selected for further consderaton.

WPM for CAT Content Balancng 10 To randomly select one tem from the k tems, frst, a cumulatve multnomal dstrbuton s formed based on the exposure-control parameters of those k tems. Then, a random number s generated. The correspondng tem n the cumulatve multnomal dstrbuton based on the random number s selected to be admnstered. All tems precedng the one admnstered wll be blocked from tem selecton for the rest of the test. (See Stockng and Lews, 2000, for more detals.) The value of k s predetermned based on the pool sze and the test length. In addton to the two tem exposure control methods, another tem-selecton method adopted n ths study drectly selects the frst tem from the lst wthout applyng any tem exposure control method, whch s referred to as the none_iec method n ths study. Smulaton Study In ths secton, WPM s demonstrated through smulaton usng a real tem pool from a large-scale placement CAT. The study factors, data, smulaton desgn, and evaluaton crtera are descrbed n the followng sectons. Study Factors The factors nvestgated n ths study nclude three tem-selecton methods as mentoned prevously and two tem nformaton weght patterns, whch results n a total of sx study condtons. For the tem nformaton weght w n Equaton (10) at each tem-selecton level, two dfferent nformaton weght patterns, Logstc+2 and Quadratc+2, are studed. Fgure 2 shows the two patterns for a 12-tem test. The content constrant weght s set to be a constant value of 5. When the value of the content constrant weght s larger than the value of nformaton weght, the CAT algorthm mght tend to select the tems that better ft the content constrant. When the value of content constrant weght s smaller than the value of nformaton weght, the CAT algorthm mght tend to select tems that provde more nformaton.

WPM for CAT Content Balancng 11 14 12 Logstc+2 Quadratc+2 Content constrant weght 10 Weght 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 Items Fgure 2. Informaton weght curves Data The tem pool contaned 565 tems. The mean of the dscrmnaton parameters (a) was 1.163, wth SD = 0.486. The mean of the dffculty parameter (b) was 0.044, wth SD=1.100. The mean of the guessng parameter (c) was 0.198, wth SD=0.092. Note that tems could be ndcated as overlappng wth other tems; once one tem wthn an overlap set was admnstered, the other tems n the set were blocked from further consderaton. For ths tem pool, one tem mght be ether n multple overlap groups or n none at all. The length of the CAT s fxed at 12 tems. There are 21 constrants as shown n Table 1 for the 12-tem CAT. The frst 17 constrants are content constrants and the last four are key dstrbutons constrants.

WPM for CAT Content Balancng 12 Table 1. Constrants. Constrant Weght Lower Upper C1 5 0 1 C2 15 1 1 C3 10 0 1 C4 10 0 1 C5 20 1 1 C6 15 1 1 C7 15 1 1 C8 5 0 1 C9 15 1 1 C10 15 1 1 C11 15 1 2 C12 5 0 1 C13 5 0 1 C14 10 1 1 C15 5 0 1 C16 5 0 2 C17 5 0 1 C18 0.5 2 5 C19 0.5 2 5 C20 0.5 2 5 C21 0.5 2 5 Smulaton Desgn In ths study, the ablty values are estmated usng maxmum lkelhood estmaton. The maxmum and mnmum theta ponts were 5.0 and -5.0, respectvely. For both the SLCM and CR methods, ablty scale was dvded nto 10 theta ranges by 9 cut ponts, whch were -1.483, -0.865, -0.479, -0.165, 0.12, 0.395, 0.679, 1.003, and 1.449. Two thousand smulees were generated n ths study from a normal dstrbuton wth mean -0.59 and SD 1.37, calculated from the emprcal sample dstrbuton of the large-scale placement CAT. The predefned group szes for the CR method are all 4 for the 10 theta ranges n order to control the maxmum exposure rate around 0.25. The SLCM exposure-control parameters are generated through smulaton wth k=15 and desred maxmum exposure rate 0.25. Evaluaton Crtera Fve crtera were used to assess the performance of the WPM approach: (1) Overall bas for theta estmaton,

WPM for CAT Content Balancng 13 N Bas = ( ˆ ) N, (11) where N s the number of smulees, and ˆ are the true and estmated theta for smulee, = 1 respectvely; (2) The mean square error (MSE) for theta estmaton, N ˆ 2 N ; (12) = 1 MSE = ( ) (3) The correlaton between true theta and estmated theta; (4) The average condtonal standard error of measurement (CSEM); and (5) The percentage of tests that matches the target property. Results Table 2 lsts the results of the smulaton study. It shows that the WPM method worked very well on content balancng for each study condton wth the on-target rate nearly or equal to 100% Ths ndcates that the WPM method handled content balancng very well for the two nformaton weght patterns Logstc+2 and Quadratc+2 wth or wthout tem exposure control methods used. Wthn each tem exposure control method, the results regardng measurement precson obtaned from usng Logstc+2 are comparable to those obtaned from usng Quadratc+2. Therefore, the two dfferent nformaton weght patterns yelded smlar results for ths studed CAT desgn wth the real tem pool. Under each study condton, the bas value s very small. However, a certan amount of precson loss s expected wth any of the tem exposure control methods. As expected, the none_iec method had better measurement precson n terms of smaller MSE, hgher correlaton between true and estmated thetas, and smaller CSEM as compared to the other two IEC methods. For ether of the nformaton weght patterns, the CR method had slghtly hgher correlaton values and slghtly smaller MSE values than the SLCM method. On the contrary, the SLCM method had slghtly smaller CSEM values.

WPM for CAT Content Balancng 14 Table 2. Results of Smulaton ITEM EXPOSURE CONTROL METHOD None_IEC CR SLCM INFO WEIGHT VECTOR Logstc+2 Quadratc+2 Logstc+2 Quadratc+2 Logstc+2 Quadratc+2 Bas 0.0096 0.0163 0.0139 0.0010 0.0000 0.0093 MSE 0.3218 0.3320 0.4446 0.4413 0.5148 0.4809 Correlaton 0.9242 0.9208 0.9046 0.9059 0.8889 0.8965 CSEM 0.3291 0.3306 0.3647 0.3663 0.3626 0.3629 ON-TARGET RATE 99.70% 100% 99.90% 99.95% 99.85% 99.95% Table 3 lsts the results wth respect to tem exposure control. In terms of evaluatng tem exposure control results, maxmum tem exposure rates and the pool usage were calculated for the sx study condtons. For the none_iec method, the maxmum tem exposure rate was about.5 for both of the nformaton weght patterns, whch means for one out of every two students would see ths specfc tem. The maxmum tem exposure rate for the CR method was 0.2575 for both of the nformaton weght patterns, whch was close to the preset level 0.25. For the SLCM method, the maxmum tem exposure rate was about 0.21 for both of the nformaton weght patterns, whch s 4% below the preset level. For ths study, the SLCM method had best ft to the preset maxmum exposure rate when the same level was set for both the CR and the SLCM methods. The pool usage was expressed through the dstrbuton of the tem usage rate. Note that the average tem usage rate s 2.12%, based on a 12-tem test wth 565 tems n the pool (12 dvded by 565 s about 0.212). When none of the tem exposure control methods was appled, nearly 70% of the tems n the pool were not used and about 0.7% of the tems had the tem usage rate beyond 30%. Wthn ether of the IEC methods, the two nformaton weght patterns had smlar results for the dstrbuton of the tem usage rates. The CR method sgnfcantly reduced the zero tem usage rate from 70% to 18% and ncreased the number of tems nto the two tem usage categores 0%~2% and 2% to 10%. The SLCM method reduced the zero tem usage rate from 70% to 33%, and also ncreased the number of

WPM for CAT Content Balancng 15 tems nto the two categores 0%~2% and 2% to 10%. The CR method had better performance regardng the zero tem usage rate n ths study. Table 3. Results of Item Exposure Control ITEM EXPLOSURE CONTROL METHOD None_IEC CR SLCM INFO WEIGHT VECTOR Logstc+2 Quadratc+2 Logstc+2 Quadratc+2 Logstc+2 Quadratc+2 Max_IE 0.4860 0.5100 0.2575 0.2575 0.2120 0.2150 ITEM USAGE 0% ITEM USAGE 0 ~ 2% ITEM USAGE 2%~ 10% ITEM USAGE 10% ~ 20% ITEM USAGE 20% ~ 30% ITEM USAGE 30% and UP 69.56% 69.20% 18.23% 17.52% 32.74% 33.45% 13.63% 13.81% 48.50% 48.14% 32.39% 30.80% 7.79% 8.67% 30.97% 31.68% 32.04% 32.92% 6.55% 5.84% 1.42% 1.77% 2.65% 2.65% 1.77% 1.77% 0.88% 0.88% 0.18% 0.18% 0.70% 0.71% 0 0 0 0 Emprcal Data Analyses The WPM has been adopted by a large-scale CAT program. Ths large-scale CAT program has the same CAT desgn as the smulaton study n ths pape. The ntal emprcal data of 1,066 examnees were avalable from ths large-scale CAT program. The emprcal CAT used Quadratc+2 as the nformaton weght vector and CR method for tem exposure control. The values for the nformaton weght vector are: 2.06, 2.083, 2.311, 2.744, 3.322, 4.066, 4.975, 6.05, 7.289, 8.694, 10.264, and 12. The emprcal data analyss results are presented n Table 4. In Table 4, C1 to C21 were 21 constrants, and lower and upper presents the lower and upper lmts of the number of tems n a CAT assocated wth that constrant n the frst column. The on-target rate shows the percentage of examnees whose tests meet that specfc constrant. The sx columns on the rght sde of Table 4 present the percentage of examnees who have the

WPM for CAT Content Balancng 16 specfc number of tems (0 to 5) assocated wth the constrant n the frst column. For example, there should be at least 1 tem and at most 2 tems n a CAT that have the constrant C11. For those 87 examnees, 5.75% of examnees had 1 tem assocated wth C11 and 94.25% of examnees had 2 tems assocated wth C11; therefore, all examnees (100%) had reached the targeted number of tems (ether 1 or 2) assocated wth C11. The results show the on-target rates were 100% for all constrants, whch ndcated the WPM method worked well for ths specfc CAT desgn. Table 4. Emprcal Data Analyss Results Cross tabulaton for Percent of Examnees and Number of Items n the Constrants Constrant Lower Upper On_Target_Rate 0 1 2 3 4 5 C1 0 1 100.00% 44.83% 55.17% 0.00% 0.00% 0.00% 0.00% C2 1 1 100.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% C3 0 1 100.00% 26.44% 73.56% 0.00% 0.00% 0.00% 0.00% C4 0 1 100.00% 14.94% 85.06% 0.00% 0.00% 0.00% 0.00% C5 1 1 100.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% C6 1 1 100.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% C7 1 1 100.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% C8 0 1 100.00% 68.97% 31.03% 0.00% 0.00% 0.00% 0.00% C9 1 1 100.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% C10 1 1 100.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% C11 1 2 100.00% 0.00% 5.75% 94.25% 0.00% 0.00% 0.00% C12 0 1 100.00% 78.16% 21.84% 0.00% 0.00% 0.00% 0.00% C13 0 1 100.00% 86.21% 13.79% 0.00% 0.00% 0.00% 0.00% C14 1 1 100.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% C15 0 1 100.00% 74.71% 25.29% 0.00% 0.00% 0.00% 0.00% C16 0 2 100.00% 26.44% 73.56% 0.00% 0.00% 0.00% 0.00% C17 0 1 100.00% 11.49% 88.51% 0.00% 0.00% 0.00% 0.00% C18 2 5 100.00% 0.00% 0.00% 24.14% 45.98% 24.14% 5.75% C19 2 5 100.00% 0.00% 0.00% 19.54% 57.47% 19.54% 3.45% C20 2 5 100.00% 0.00% 0.00% 25.29% 55.17% 16.09% 3.45% C21 2 5 100.00% 0.00% 0.00% 28.74% 59.77% 10.34% 1.15%

WPM for CAT Content Balancng 17 Concluson In summary, the WPM method handled the content balancng very well n ths study, wth or wthout applyng the tem exposure control methods for both the emprcal and smulated data. The IEC results ndcated that although the SLCM method provded lower maxmum tem exposure rates, the CR method had great utlty for ncreasng pool utlzaton n ths study. As expected, both of the IEC methods had a certan amount of precson loss compared wth the none_iec method. References Chang, H., Qan, J., & Yng, Z. (2001). a-stratfed multstage Computerzed Adaptve Testng wth b blockng. Appled Psychologcal Measurement, 25(4), 333-341. Kngsbury, G., Zara, A. (1989). Procedures for Selectng Items for Computerzed Adaptve Tests. Appled Measurement n Educaton, 2(4), 359-75. Segall, D. O. & Davey, T. C. (1995, June). Some New Methods for Content Balancng Adaptve Tests. Paper presented at the annual meetng of the Psychometrc Socety, Mnneapols MN. Stockng, M.L. & Lews, C. (2000). Methods of Controllng the Exposure of Items n CAT n W. J. van der Lnden & C. A. W. Glas (Eds.), Computerzed Adaptve Testng: Theory and Practce, (pp. 163 182). Norwell MA: Kluwer. Stockng, M. L., & Swanson, L. (1993). A Method for Severely Constraned Item Selecton n Adaptve Testng. Appled Psychologcal Measurement, 17, 277-292. Van der Lnden, W. J. (2005). A Comparson of Item-Selecton Methods for Adaptve Tests wth Content Constrants. Journal of Educatonal Measurement. 42(3), 283-302.