Data mining is finding useful relationships in large datasets. "When you mine data (by "drilling down"), you use data to improve your business by predicting and understanding behavior." (Peter Frometa, SPSS Inc., 2001)
According to a press release, "in May 1998, more than 20 key players in the data mining market met to discuss the first draft of a new process model, CRISP-DM ("CRoss-Industry Standard Process for Data Mining"). This is designed to help businesses plan and work through the complete data mining process - from problem specification to deployment of results. The core consortium consists of NCR, ISL, Daimler-Benz and OHRA. At the centre of the CRISP-DM project is a Special Interest Group (SIG) of data mining service suppliers and large-scale commercial users."
Data mining employs a 6-stage approach to extracting meaning from business data. This parallels Rasch-based approaches to measurement construction in the social sciences. The Table below focusses on the Data Cleaning component of data mining. It is in marked contrast to the conventional "data is inviolable" approach of social science research.
The Figure shows the six phases of a data mining process. The sequence of the phases is not rigid. Moving back and forth between different phases is always required. It depends on the outcome of each phase which phase or which particular task of a phase, has to be performed next. The arrows indicate the most important and frequent dependencies between phases.
The outer circle symbolizes the cyclical nature of data mining itself. Data mining is not over once a solution is deployed. The lessons learned during the process, and from the deployed solution, can trigger new, often more focused business questions. Subsequent data mining processes will benefit from the experiences of previous ones. In the following, we outline each phase briefly:
1. Business understanding
This initial phase focuses on understanding the project objectives and requirements from a business perspective, then converting this knowledge into a data mining problem definition and a preliminary plan designed to achieve the objectives.
2. Data understanding
starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data or to detect interesting subsets with hidden information.
3. Data preparation
constructs the final dataset from the initial raw data. Data preparation tasks are likely to be performed multiple times and not in any prescribed order. Tasks include table, record and attribute selection as well as transformation and cleaning of data for modeling tools.
selects and applies modeling techniques and calibrates their parameters to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often necessary.
thoroughly reviews the model and the steps executed to construct the model to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. A decision on the use of the data mining results should be reached.
organizes and presents the knowledge gained in a way that the customer can use it. It often involves applying "live" models within an organization's decision making processes, for example in real-time personalization of Web pages or repeated scoring of marketing databases. However, depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise. In many cases it is the customer, not the data analyst, who carries out the deployment steps. However, even if the analyst will not carry out the deployment effort it is important for the customer to understand up-front what actions need to be carried out in order to actually make use of the created models.
Excerpted from CRISP-DM 1.0 Step-by-step data mining guide (2000)
Raise the data quality to the level required by the selected analysis techniques. This may involve selection of clean subsets of the data, the insertion of suitable defaults or more ambitious techniques such as the estimation of missing data by data modeling.
|Output||Data cleaning report|
Describe the decisions and actions that were taken to address the data quality problems. The report should also address what data quality issues are still outstanding and what possible effects they could have on the results.
|Activities||Reconsider how to deal with observed types of noise.|
Correct, remove or ignore noise.
Decide how to deal with special values and their meaning.
Reconsider data selection criteria in light of experiences of data cleaning (i.e., one may wish include/exclude other sets of data).
|Good Idea!||Remember that some fields may be irrelevant to the data mining goals and therefore noise in those fields has no significance. However, if noise is ignored for these reasons, it should be fully documented as the circumstances may change later!|
|Excerpted from CRISP-DM 1.0 Step-by-step data mining guide (2000)|
|1. Business Understanding
Determine business objectives
Determine data mining goals
"You must have a clear idea of what success would be."
|1. Conceptualize the latent variable|
What to measure?
How to do it?
What marks success?
|2. Data Understanding
"Do the data match your objectives?"
|2. Collect relevant data|
|3. Data Preparation
|3. Organize data|
|4. Construct measures|
Select measurement model
Explicable data fit?
"Can results be repeated and verified by someone else?"
|5. Evaluate results|
"Communicate! Impress! Compel!"
|6. Utilize measures|
Data Mining and Rasch Measurement CRISP-DM, Linacre J.M. Rasch Measurement Transactions, 2001, 15:2 p. 826-7
|Rasch Measurement Transactions (free, online)||Rasch Measurement research papers (free, online)||Probabilistic Models for Some Intelligence and Attainment Tests, Georg Rasch||Applying the Rasch Model 3rd. Ed., Bond & Fox||Best Test Design, Wright & Stone|
|Rating Scale Analysis, Wright & Masters||Introduction to Rasch Measurement, E. Smith & R. Smith||Introduction to Many-Facet Rasch Measurement, Thomas Eckes||Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences, George Engelhard, Jr.||Statistical Analyses for Language Testers, Rita Green|
|Rasch Models: Foundations, Recent Developments, and Applications, Fischer & Molenaar||Journal of Applied Measurement||Rasch models for measurement, David Andrich||Constructing Measures, Mark Wilson||Rasch Analysis in the Human Sciences, Boone, Stave, Yale|
|in Spanish:||Análisis de Rasch para todos, Agustín Tristán||Mediciones, Posicionamientos y Diagnósticos Competitivos, Juan Ramón Oreja Rodríguez|
|Forum||Rasch Measurement Forum to discuss any Rasch-related topic|
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
|Coming Rasch-related Events|
|Dec. 4 - Dec. 6, 2021, Sat.-Mon.||On-line conference: PROMS2021, Pacific Rim Objective Measurement Symposium (T. Bond, Yanzi, R Statistics and more), proms.promsociety.org/2021/|
|Jan. 21 - Feb. 18, 2022, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|May 20 - June 17, 2022, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|June 24 - July 22, 2022, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
|Aug. 12 - Sept. 9, 2022, Fri.-Fri.||On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com|
|Oct. 7 - Nov. 4, 2022, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com|
|June 23 - July 21, 2023, Fri.-Fri.||On-line workshop: Practical Rasch Measurement - Further Topics (E. Smith, Winsteps), www.statistics.com|
|Aug. 11 - Sept. 8, 2023, Fri.-Fri.||On-line workshop: Many-Facet Rasch Measurement (E. Smith, Facets), www.statistics.com|
The URL of this page is www.rasch.org/rmt/rmt152f.htm