Permutation Tests


Mantel Statistic

Y Method for examining internal structure of a matrix (Hubert (1978, 1987) presents a number of indices that capture internal structural characteristics within two or more matrices.)

Y Results in a one-tailed p-value

Y For each pair of matrices (Aq and Ar)in A, a product index, can be directly computed as, for q = 1,..., Q-1, r = q+1..., Q. Add the product of corresponding entries, ignoring diagonal--for q=1, r=2,...,Q; for q=2, r=3,....,Q; etc... This statistic is typically attributed to Mantel (1967). A test of agreement (or concordance) between these two matrices can be performed by generating a distribution of indices across all possible permutations (permuting both rows and columns) of the n targets. More formally, we define Y as the set of all n! permutations, and y (k) as the target in position k of the permutation y Î Y . Thus, for the identity permutation, y I Î Y , we have y (1) = 1, y (2) = 2,..., y (n) = n. Now suppose we randomly select a permutation, y Î Y , and then apply this permutation to the rows and, simultaneously, the columns of Ar. We could then compute an index between Aq and the permuted matrix Ar(y ) as. If computer resources provide the feasibility to evaluate each of the n! = n*(n - 1)...*2*1 permutations of the rows and columns of Ar, then a complete distribution for the Mantel statistic can be determined and G qr can be mapped to that distribution.

Hubert (1978) observed that a potential limitation of the Mantel-type indices as described is that they are not invariant under monotone (order-preserving) transformations of the data. Because the statistics are computed based on one-to-one products of corresponding elements of matrices, different significance level conclusions could be obtained under different transformations. This limitation has some important implications for analysis of ESP confusion matrices if the goal is to assess similarity of internal structure among two or more matrices.

[Side Note: Concordance best on gifted subjects rather than on whole population.] If the value of G qr is sufficiently extreme with respect to the distribution, then we reject the null hypothesis of random labeling of the rows and columns of the matrices. Rejecting the null hypothesis suggests concordance between the two matrices with respect to their patterning of large and small elements. Complete enumeration of the distribution is practical for n 13 on current microcomputer hardware platforms. For larger matrices, a Monte Carlo sampling procedure can perform an approximate permutation test. In such cases, a large number of permutations are randomly selected from Y and used to obtain a sampling distribution for the Mantel statistic.

[???] The Mantel statistic can be characterized as an unnormalized Pearson correlation coefficient between the off-diagonal elements in Aq and Ar. If the elements in a pair of matrices are converted to ranks prior to analysis, then it is a straightforward exercise to construct variants of G qr that represent nonparametric measures of association, such as Spearman's rank correlation coefficient or Kendall's coefficient of concordance. Thus, the permutation paradigm is rather comprehensive in its scope, encompassing many well-known statistics, yet providing the flexibility for a variety of interesting alternatives.


HUBERT, L. J. (1987). Assignment methods in combinatorial data analysis. New York: Marcel Dekker.

Try a little program. Fair warning: This program uses total enumeration. This is also written in a relatively slow language for number-crunching, especially where the Triad Test is concerned. Even so, it can handle up to 15 x 15 matrices--but I wouldn't recommend more than 10 for Mantel or 8 for Triad.


Y As with the Mantel Test,a mehtod for examining internal structure of a matrix that results in a p value is one-tailed.

Y The within-stimulus gradient among triads of targets is applied to two matrices, is a. Specifically, for each-two-way array in a confusion matrix, Cq, a three-way array is created as follows:

, where.

This equation requires the comparison of different column entries (j and i) in the same row (h). By keeping h, i, and j distinct, we avoid comparisons with diagonal entries. For each pair of matrices (Cq and Cr) in a family of matrices, C, an index based on the within-stimulus order relationships of confusions can be computed as


Here, we tally the similar and dissimilar relationships of corresponding pairs of entries in both matrices, for all Cq , Cr in C. A pairwise comparison between the within-stimulus structures of two matrices can be conducted by calculating the within-stimulus triad test indices for set of all permutations, y Î Y , for the second matrix. (To conduct an approximate test, we would randomly sample from the set of all permutations, Ψ.) Formally stated,.

If F qr is sufficiently large with respect to the complete distribution or to the sampling of the distribution, then the null hypothesis of no structural agreement among the triadic relationships is rejected. As the name implies, relationships of responses (traveling down the permutation) to each particular target are examined, effectively looking at a confusion gradient along the permutation. The triad of targets is taken from both matrices to effectively compare the confusion gradients in the matrices to each other. The null hypothesis states that the two matrices exhibit no concordance with respect to the patterning of elements within their rows, i.e. confusion in the matrices does not follow a similar pattern across the targets.

Also see Maximizing Gradient Indices, Chapter 9, in relation to seriation in the monograph.

Try a little program. Fair warning: This program uses total enumeration. This is also written in a relatively slow language for number-crunching, especially where the Triad Test is concerned. Even so, it can handle up to 15 x 15 matrices--but I wouldn't recommend more than 10 for Mantel or 8 for Triad.



Y Parametric

Y Utts (1993, p.77) described an exact test presented by Scott (1972, p. 87), which has been used rather extensively in the parapsychological literature. The data are first arranged into a matrix (rows labeled by stimulus, columns labeled by responses) with "hits" recorded along the diagonal. The diagonal entries, i.e. the trace, are summed. The columns--only the columns--are then reordered in all n! = n*(n-1) *2* 1 possible permutations, with rows remaining in their original ordering, and each trace is summed. The statistic, which requires a random presentation of targets, is the proportion of sums that are as good as or better than the sum for the original "correct" ordering. This statistic was used by Targ (1994) in a remote-viewing replication study. Moreover, this counting measure has been discussed with regard to data analysis from Princeton Engineering Anomalies Research (Dobyns, Dunne, Jahn, & Nelson, 1992; Hansen, Utts & Markwick, 1992), particularly with regard to the relevance of the diagonal entries.

Try a little program. Fair warning: This program uses total enumeration. This is also written in a relatively slow language for number-crunching. Even so, it can handle up to 15 x 15 matrices.

DOBYNS, Y.H., DUNNE, B.J., JAHN, R.G., & NELSON, R.D. (1992). Response to Hansen, Utts, and Markwick: statistical and methodological problems of the PEAR remote viewing (sic) experiments. Journal of Parapsychology, 56, 115-146.

HANSEN, G.P., UTTS, J., & MARKWICK, B. (1992). Critique of PEAR remote-viewing experiments. Journal of Parapsychology, 56, 97-114.

SCOTT, C. (1972). On the evaluation of verbal material in parapsychology: A discussion of Dr. Pratt's monograph. Journal of the Society for Psychical Research, 46, 79-90.

UTTS, J. M. (1993). Analyzing free response data: A progress report, in L. Coly & D. S. McMahon (Eds.) Psi Research Methodology: A Re-examination, Proceedings of an International Conference. New York: Parapsychology Foundation, pp. 71-83.

Preferential Matching Exact Test

Y The Preferential Matching Exact Test was introduced to parapsychological literature by Morris (1972). Morris discusses the advantages of both dealing with individual units of information as well as dealing with whole responses. The preferential matching exact test is intended for use when dealing with whole responses, as does the preferential matching procedure developed by Stuart (1942). (Evaluating material in terms of units of information entails using "critical ratio" (CR or z) methods of observed deviation from chance divided by an estimate of expected standard deviation from chance.) In this test, the free-response material is considered and ranked as a whole, rather than atomisticly broken into smaller units of information to be matched against characteristics of the targets. The probability of concern is the number of ways to obtain the observed rank of sums or less divided by the number of ways all possible sums could be achieved. Naturally, the desirable circumstance is when the number of possible ranks is equal to the number of trials.

Y Solfvin, Kelly and Burdick (1978) extended and generalized the work of Morris. Burdick and Kelly (1977) categorized the preferential ranking and rating methods as one of the two main subclasses of holistic approaches to analysis of parapsychological data, the other subclass being forced-choice techniques. The full model is:

, where N = number of trials, M = obtained sum of ranks (min value is N if ranks begin at 1), R = number of possible ranks, e = an index running from 0 to the least positive integer of (s - N)/R, while s varies from N to M. (That is, s is M or less.) In this formula, the number of ways to obtain a particular sum, s, is given by the Uspensky term:

Hence, for the observed rank of sums, the Uspensky term gives the number of ways that the observed rank of sums could be obtained by simple letting s = M. This sum-of-ranks statistic is approximately normal:, where UM = N(R + 1)/2, , 0.5 -> the usual continuity correction and has the sign opposite from (M - UM).

In a confusion matrix, we would have n targets (objects) ranked on a scale of 1 to R, ranking similarity of each target with each target. That is, the number of trials N is n2. Consider the Schlitz and Gruber (1980) transcontinental remote viewing data for which five judges raked each of twenty transcripts to each of ten sites (targets/protocols) making the procedure a forced choice judging process. (Each site generated two transcripts-one from the percipient, E1 (M. S.), and one from the agent, E2 (E. R. G.). The data is presented in a 10x10 asymmetric similarity matrix. much like confusion data That's N = 5*2*100 = 1000 and R = 10. The lowest possible sum of ranks is 1000 while the highest sum of ranks is 50000. (The S&G data adds up to 5529.) The combinatorial term in the Uspensky (s = M) equation can be simplified.

Even with this simplification, we have

Uspensky = 1000*(5529 - 10*e -1) (5529 - 10*e -1000 + 1)/e !(1000 - e )! = 1000(5528 - 10*e ) (4530 - 10*e )/e !(1000 - e )!, where e varies from 0 to [(M - N)/R] = [(5529 -1000)/10] = [4529/10] = [452.9] = 452.

Moreover, for the computation of the actual probability, we have RN = 101000.


MORRIS, R.L. (1972). An exact method for evaluating preferentially matched free-response material. Journal of the American Society for Psychical Research, 66, 401-407.

SCHLITZ, M., & GRUBER, E. (1980). Transcontinental Remote Viewing. Journal of Parapsychology, 44, 305-317.

SOLFVIN, G.F., KELLY, E.F., & BURDICK, D.S. (1978). Some new methods of analysis for preferential-ranking data. Journal of the American Society for Psychical Research, 72, 93-109.

STUART, C.E. (1942). An ESP test with drawings. Journal of Parapsychology, 6, 20-43.

.:: Home : Fisher ::.