"PageRank" is a widely-acclaimed algorithm used for determining the ranking or ordering of web pages by web search engines (Langville & Meyer, 2006; Chung, 2008). It was originally developed at Stanford University by Larry Page and Sergey Brin, who later started Google, Inc (Page & Brin, 1998). The algorithm has become a highly celebrated mathematical tool for analyzing networks of all kinds, including biological and social networks (Chung, 2008). The algorithm was the topic of a featured talk at the 2008 Joint Mathematics Meetings (joint meeting of the Mathematical Association of America and The American Mathematical Society) in San Diego.
The PageRank algorithm has much in common with one particular algorithm for estimating the item parameters of the Rasch model - the eigenvector (EVM) algorithm described by Garner and Engelhard (2002, etc.). Both methods depend on pairwise comparisons. Both methods also use the eigenvector of a matrix derived from pairwise comparisons.
An Illustrative Data Set
Persons | ||||||||||
Items | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
2 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
3 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
4 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
We can represent this data matrix graphically as a directed multi-graph, shown here, with nodes representing items. An arrow (a directed edge) from node j to node i indicates that someone got item i correct and item j wrong. So, the four arrows between items 1 and 2 indicate that 3 people succeeded on item 1, but failed on item 2, and one person succeeded on item 2 but failed on item 1.
The Eigenvector Method (EVM)
The EVM begins with creation of a paired comparison matrix similar to that now implemented in RUMM2020. A paired comparison matrix is constructed. Each entry, b_{ij}, representing the number of people who got item i right and item j wrong divided by the number of people who got item i wrong and item j right. This is an estimate of the difference in difficulties between items i and j on an odds-scale. b_{ii} are set to 1.
We can determine the Rasch item difficulties from this matrix by computing the eigenvector associated with the maximum eigenvalue, exponentiating the eigenvector, and subtracting out their mean, so that their sum is zero. The Rasch item difficulties are: -1.018, -.103, .268, .853.
The PageRank Algorithm
The PageRank algorithm is designed to assign weights to web pages so that the pages may be ranked in order of popularity. The web network we'll represent is the one depicted in Figure 1. The web pages A1, A2, A3, and A4 are represented by the four circles and an arrow would represent the fact that one web page links to the other web page. For example, web page A2 has four links to web page A1.
Each web site has a "PageRank" value associated with it. Let a1, a2, a3, a4 be those values. The PageRank value is conceptualized as the sum:
Nj is the total number of links out of Aj to any other web pages (6 for A2) and Mij is the number of links out of Aj to Ai (3 from A2 to A1) Thus each page in the system contributes a portion of their value to the page they reference. Then, for our illustrative dataset,
A convenient way to solve these simultaneous equations is by matrix algebra and eigenvalues, adding a constraint to make the equations identifiable and the unknowns non-zero. A solution is a1 = -.65, a2=-.49, a3=-.45, and a4=-.36. Thus, the easiest-to-link web page is A1, followed by A2, then A3, then A4.
Observations and Possibilities
There is much excitement over PageRank. Chung points out that PageRank is a "well defined operator on any given graph" (Chung, 2008), and describes the relationship between PageRank and random walks, spectral graph theory, spectral geometry, combinatorics, probability, and linear algebra. Langville and Meyer (2006) state that "models exploiting the Web's hyperlink structure are called link analysis models. The impact that these link analysis models have had is truly awesome."
The plot shows the relationship between the Rasch item difficulties and the PageRank weights for our illustrative dataset. This suggests that the relationship is ogival, but close to linear for practical purposes. Notice that the coefficients of the PageRank equations are in the range 0-1, representing empirical probabilities. Rasch theory suggests that expressing those coefficients as log-odds, log_{e}(coefficient/(1-coefficent)), would be an immediate improvement to the PageRank equations. Here is a fruitful area of research for a graduate student looking for a dissertation topic.
Tools being developed for PageRank analysis and validation may well prove productive for Rasch measurement. These include techniques for accommodating missing and extreme comparisons, and also for identifying "cut vertices", web pages whose omission causes disconnected subsets of web pages in the analysis. These would correspond to elements (items, persons, raters) on which the linkage in judging plans, adaptive-testing or equating analysis depends.
Mary Garner, Kennesaw State University, Georgia
Chung, Fan. (2008). The Mathematics of PageRank. Presentation at the Joint Mathematics Meetings. San Diego, CA. www.math.ucsd.edu/~fan/
Garner, M., & Engelhard, G. (2002). An Eigenvector Method For Estimating Item Parameters Of The Dichotomous And Polytomous Rasch models. Journal of Applied Measurement, 3(2), 107-128.
Langville, A.N., & Meyer, C.D. (2006). Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press.
Page, L., & Brin, S. (1998). The Anatomy of a Large Scale Hypertextual Search Engine. Proceedings of the Seventh International Web Conference. infolab.stanford.edu/pub/papers/google.pdf
Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14, 58-94. www.rasch.org/memo18.htm
Gerner M. (2009) Google's PageRank Algorithm and the Rasch Measurement Model, Rasch Measurement Transactions, 2009, 23:2, 1201-2
Forum | Rasch Measurement Forum to discuss any Rasch-related topic |
Go to Top of Page
Go to index of all Rasch Measurement Transactions
AERA members: Join the Rasch Measurement SIG and receive the printed version of RMT
Some back issues of RMT are available as bound volumes
Subscribe to Journal of Applied Measurement
Go to Institute for Objective Measurement Home Page. The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.
Coming Rasch-related Events | |
---|---|
Oct. 4 - Nov. 8, 2024, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
Jan. 17 - Feb. 21, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
May 16 - June 20, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
June 20 - July 18, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Further Topics (E. Smith, Facets), www.statistics.com |
Oct. 3 - Nov. 7, 2025, Fri.-Fri. | On-line workshop: Rasch Measurement - Core Topics (E. Smith, Winsteps), www.statistics.com |
The URL of this page is www.rasch.org/rmt/rmt232a.htm
Website: www.rasch.org/rmt/contents.htm