Resampling: Everyday Statistical Tool

 For more information contact Resampling Stats, 612 N. Jackson St., Arlington, VA, 22201 PROBABILITY AND STATISTICS THE RESAMPLING WAY Stats for Poets, Politicians - and Statisticians Julian L. Simon and Peter Bruce
             Probability theory and its offspring, inferential

        statistics, constitute perhaps the most frustrating branch of

        human knowledge.

             Right from its beginnings in the seventeenth century, the

        great mathematical discoverers knew that the probabilistic way of

        thinking -- which we'll call "prob-stats" for short -- offers

        enormous power to improve our decisions and the quality of our

        lives.  Prob-stats can aid a jury's deliberation about whether to

        find guilty a person charged with murder...reveal if a new drug

        boosts survival from a cancer...help steer a spacecraft to

        Saturn...inform the manager when to take a pitcher out of the

        baseball game...aid a wildcatter to calculate how much to invest

        in an oil well...and a zillion other good things, too.

             Yet until very recently, when the resampling method came

        along, scholars were unable to convert this powerful body of

        theory into a tool that laypersons could and would use freely in

        daily work and personal life.  Instead, only professional

        statisticians feel themselves in comfortable command of the prob-

        stats way of thinking.  And the most frequent application is by

        social and medical scientists, who know that prob-stats is

        indispensable to their work yet too often fear and misuse it.

             Prob-stats continues to be the bane of students, most of

        whom consider the statistics course a painful rite of passage --

        like fraternity paddling -- on the way to an academic degree.

        Even among those who study it, most close the book at the end of

        the semester and happily put prob-stats out of their minds

        forever.

             The statistical community has made valiant attempts to

        ameliorate this sad situation.  Great statisticians have

        struggled to find interesting and understandable ways to teach

        prob-stats.  Learned committees and professional associations

        have wrung their hands in despair, and spent millions of dollars

        to create television series and text books.  Chance magazine

        imaginatively demonstrates and explains the exciting uses and

        benefits of prob-stats.

             Despite successes, these campaigns to promote prob-stats

        have largely failed.  The enterprise smashes up against an

        impenetrable wall - the body of complex algebra and tables that

        only a rare expert understands right down to the foundations.

        Almost no one can write the formula for the "Normal" distribution

        that is at the heart of most statistical tests.  Even fewer

        understand its meaning; yet without such understanding, there can

        be only rote learning.

             Almost every student of probability and statistics simply

        memorizes the rules.  Most users of prob-stats select their

        methods blindly, understanding little or nothing of the basis for

        choosing one method rather than another.  This often leads to

        wildly inappropriate practices, and contributes to the damnation

        of statistics.  Indeed, in the last decade or so, the

        discipline's graybeards have decided that prob-stats is just too

        tough a nut to crack, and have concluded that students should be

        taught mainly descriptive statistics - tables and graphs - rather

        than how to draw inferences probabilistically, which is really

        the heart of statistics.

             The new resampling method, in combination with the personal

        computer, promises to change all this.  Resampling may finally

        realize the great potential of statistics and probability.

        Resampling estimates probabilities by numerical experiments

        instead of with formulae - by flipping coins or picking numbers

        from a hat, or with the same operations simulated on a computer.

        And the computer language-program RESAMPLING STATS performs these

        operations in a transparently clear and simple fashion.

             The best mathematicians now accept resampling theoretically.

        And controlled studies show that people ranging from engineers

        and scientists down to seventh graders quickly handle more

        problems correctly than with conventional methods.  Furthermore,

        in contrast to the older conventional statistics, which is a

        painful and humiliating experience for most students at all

        levels, the published studies show that students enjoy resampling

        statistics.

           THE REAPPEARANCE OF RESAMPLING IN THE HISTORY OF STATISTICS

             Resampling returns to a very old tradition.  In ancient

        times, mathematics in general, and statistics in particular,

        developed from the needs of governments and rich men to count

        their armies and flocks, and to enumerate the taxpayers and their

        possessions.  Up until the beginning of the twentieth century,

        the term "statistic" meant "state-istics", the number of

        something the "state" was interested in -- soldiers, births, or

        what-have-you.  Even today, the term "statistic" usually means

        the quantity of something, such as the important statistics for

        the United States in the Statistical Abstract of the United

        States.  These numbers are now known as "descriptive statistics,"
        in contrast to "inferential statistics" which is the science that

        tells us how reliable is a set of descriptive statistics.

             Another stream of thought appeared by way of gambling in

        France in the 17th century.  Throughout history people had

        learned about the odds in gambling games by experimental trial-

        and-error experience.  To find the chance of a given hand

        occurring in a card game, a person would deal out a great many

        hands and count the proportion of times that the hand in question

        occurred.  That was the resampling method, plain and simple.

             Then in the year 1654, the French nobleman Chevalier de Mere

        asked the great mathematician and philosopher Blaise Pascal to

        help him deduce what the odds ought to be in some gambling games.

        Pascal, the famous Pierre Fermat, and others went on from there

        to develop analytic probability theory, and Jacob Bernouilli and

        Abraham DeMoivre initiated the formal theory of statistics.  The

        experimental method disappeared into mathematical obscurity

        except for its use when a problem was too difficult to be

        answered theoretically, as happened from time to time in the

        development of statistical tests -- for example, the development

        of the famous t-test by "Student", the pen-name of  William S.

        Gossett -- and the World War II "Monte Carlo" simulations for

        complex military "operations research" problems such as how best

        to search for submarines with airplanes.

             Later on, these two streams of thought -- descriptive

        statistics and probability theory -- joined together.  Users of

        descriptive statistics wondered about the accuracy of the data

        originating from both sample surveys and experiments.  Therefore,

        statisticians applied the theory of probability to assessing the

        accuracy of data and created the theory of inferential

        statistics.

                            HOW RESAMPLING DEVELOPED

             Too much book-learning, too little understanding.  The

        students had swallowed but not digested a bundle of statistical

        ideas which now misled them, taught by professors who valued

        fancy mathematics even if useless or wrong.  It was the spring of

        1967 at the University of Illinois, in my (Simon's) course in

        research methods with four graduate students working toward the

        PhD degree.  I required each student to start and finish an

        empirical research project as a class project.  Now the students

        were presenting their work in class.  Each used wildly wrong

        statistical tests to analyze their data.

             "Why do you use the technique of cluster analysis?" I asked

        Moe Taher (a pseudonym).

             "I want to be up-to-date," said Taher.

             "How much statistics have you studied?" I asked.

             "Two undergraduate and three graduate courses," Taher

        answered proudly.

             It was the usual statistical disaster.  A simple count of

        the positive and negative cases in Taher's sample was enough to

        reveal a clear-cut conclusion.  The fancy method Taher used was

        window-dressing, and inappropriate to boot.  It was the same sad

        story with the other three students.

             All had had several courses in statistics.  But when the

        time came to apply even the simplest statistical ideas and tests

        in their research projects, they were lost.  Their courses had

        plainly failed to equip them with basic usable statistical tools.

             So I wondered:  How could we teach the students to distill

        the meaning from their data?  Simple statistical methods suffice

        in most cases.  But by chasing after the latest sophisticated

        fashions the students overlook these simple methods, and instead

        use unsound methods.

             I remembered trying to teach a friend a concept in

        elementary statistics by illustrating it with some coin flips.

        Given that the students' data had a random element, could not the

        data and the events that underlie the data be "modeled" with

        coins or cards or random numbers, doing away with any need for

        complicated formulas?

             Next class I shelved the scheduled topics, and tried out

        some problems using the resampling method (though that label had

        not yet been invented).  First the students estimated the chance

        of getting two pairs in a poker hand by dealing out hands.  Then

        I asked them the likelihood of getting three girls in a four-

        child family.  After they recognized that they did not know the

        correct formula, I demanded an answer anyway.  After suggesting

        some interesting other ideas -- we'll come to them later -- one

        of the students eventually suggested flipping coins.

             With that the class was off to the races.  Soon the students

        were inventing ingenious ways to get answers -- and sound answers

        -- to even subtle questions in probability and statistics by

        flipping coins and using random numbers.  The very next two-hour

        seminar the students re-discovered an advanced technique

        originally invented by the great statistician Ronald A. Fisher.

             The outcome of these experiments was resampling.  Even

        before this time, though I had not known of it, resampling had

        been suggested for one particular case in inferential statistics

        in technical articles in the Annals of Mathematical Statistics by

        Meyer Dwass in 1957, and in the Journal of the American

        Statistical Association by J. H. Chung and D. A. S. Fraser in 1958.

        They preceded me in applying sampling methods to the problem of

        deciding whether two sample means differ from each other, basing

        the procedure on Fisher's famous "randomization" test. The new

        idea I contributed was handling all (or at least most) problems

        by resampling.  And to that end, I taught a systematic procedure

        for carrying out resampling procedures and illustrated it for a

        variety of problems, while also teaching conventional methods in

        parallel.

             Then it was natural to wonder:  Could even children learn

        this powerful way of dealing with the world's uncertainty?  Max

        Beberman, the guru of the "new math" who then headed the

        mathematics department in the University of Illinois High School,

        quickly agreed that the method had promise, and suggested

        teaching the method to a class of volunteer juniors and seniors.

        The kids had a ball.  In six class hours they were able to

        discover solutions and generate correct numerical answers for the

        entire range of problems ordinarily found in a semester-long

        university introductory statistics class.  Furthermore, the

        students loved the work.

             A semester-long university class in statistics, with

        resampling and the conventional method taught side by side, came

        next.  But students complained that dealing cards, flipping

        coins, and consulting tables of random numbers gets tiresome.  So

        in 1973 I developed a computer language that would do with the

        computer what one's hands do with cards or dice.  The RESAMPLING

        STATS program, which handles all problems in statistics and

        probability with only about twenty commands that mimic operations

        with cards, dice, or random numbers, is a simple language

        requiring no computer experience.  Even 7th graders quickly

        understand and use it, though it is powerful enough for

        scientific and industrial use. (It does, however, provide a

        painless introduction to computers, which is a valuable

        educational bonus.)

             A major sub-part of the general resampling method - the

        "bootstrap", which was independently developed by Bradley Efron

        in the 1970s - has now swept the field of statistics to an

        extraordinary extent.  The New York Times had this to say:

             "A new technique that involves powerful computer
             calculations is greatly enhancing the statistical analysis
             of problems in virtually all fields of science.  The method,
             which is now surging into practical use after a decade of
             refinement, allows statisticians to determine more
             accurately the reliability of data analysis in subjects
             ranging from politics to medicine to particle physics...

             "`There's no question but that it's very, very important'
             said Frederick Mosteller, a statistician at Harvard
             University...Jerome H. Friedman, a Stanford statistician who
             has used the new method, called it `the most important new
             idea in statistics in the last 20 years, and probably the
             last 50'.  He added, `Eventually, it will take over the
             field, I think.'" (Nov. 8, 1988, C1, C6)

             Resampling is best understood by seeing it being learned.

        The instructor walks into a new class and immediately asks, say,

        "What are the chances if I have four children that three of those

        children will be girls?"  Someone says "Put a bunch of kids into a

        hat and pick out four at a time".  Teach says, "Sounds fine in

        theory, but it might be a bit difficult to actually carry

        out...How about some other suggestions?"

             Someone else says,  "Have four kids and see what you get."

        Teach says, "Sounds good.  But let's say you have four children

        once.  Is that going to be enough to give you a decent answer?"

        So they discuss how big a sample is needed, which brings out the

        important principle of variability in the samples you draw.

        Teach then praises this idea because it focuses attention on

        learning from experiment, one of the key methods of science.  S/he

        points out, however, that it could take a while to have a hundred

        families, plus some energy and money, so it doesn't seem to be

        practical at the moment.  Teach asks for another suggestion.

             Someone suggests taking a survey of families with four

        children.  Teach praises this idea, too, because it focuses on

        getting an answer by going out and looking at the world.  But

        what if a faster answer is needed?

             Someone else wonders if it is possible to "do something that

        is like having kids.  Put an equal number of red and black balls

        in a pot, and pull four of them out.  That would be like a

        family."  This kicks off discussion about how many balls are

        needed, and how they should be drawn, which brings out some of

        the main concepts in probability - sampling with or without

        replacement, independence, and the like.

             Then somebody wonders whether the chance of having a girl

        the first time you have a child is the same as the chance of a

        red ball from an urn with even numbers of red and black balls, an

        important question indeed.  This leads to discussion of whether

        50-50 is a good approximation.  This brings up the question of

        the purpose of the estimate, and the instructor suggests that a

        clothing manufacturer wants to know how many sets of matched

        girls dresses to make.

             Coins seem easier to use than balls, all agree.  And Teach

        commissions one student to orchestrate the rest of the class in a

        coin-flipping exercise.  Then the question arises:  Is one sample

        of (say) thirty coin-flip "families" enough?  The exercise is

        repeated several times, and the class is impressed with the

        variability from one sample of thirty to the next.  Once again

        the focus is upon variability, perhaps the most important idea

        inherent in prob-stats.

             Or another example:  The instructor asks, "What are the

        chances that basketball player Magic Johnson, who averages 47

        percent success in shooting, will miss 8 of his next 10 shots?"

        The class shouts out joking suggestions such as "Go watch Magic,"

        and "Try it yourself on the court."  Teach responds, "Excellent

        ideas, good scientific thinking, but not feasible now.  What else

        could we do?"

             Soon someone - say, Adam - suggests flipping a coin.  This

        leads to instructive discussion about whether the 50-50 coin is a

        good approximation, and whether ten coins flipped once give the

        same answer as one coin flipped ten times.  Eventually all agree

        that trying it both ways is the best way to answer the question.

             Teach then invites Adam up front to run the simulation.  Adam

        directs each of the 30 students to flip ten coins and count the

        resulting heads.  Teach writes the results on the board.  The

        proportion of the thirty trials in which there are 8 or more

        misses is then counted.  The students discuss this result in the

        light of the  variability from one sample of ten shots to another

        - an important and now-obvious idea.  And Teach points out how

        the same procedure is at the heart of industrial quality control.

             After a while someone complains that flipping coins and

        dealing cards is wearisome.  Aha!  Now Teach breaks out the

        computer and suggests doing the task faster, more accurately, and

        more pleasurably with the following computer instructions:

           REPEAT 100   obtain a hundred simulation trials

              GENERATE 10 1,1OO A  generate 10 numbers randomly between 1

                  and 100

              COUNT A 1,53 B   count the number of misses in the trial

                  (Magic's shooting average is 47% hits, 53% misses.)

               SCORE B Z   record the result of the trial

           END   end the repeat loop for a single trial

           COUNT Z 8,10 K  count the number of trials with 8 or more

                  misses

        The histogram (Figure 1) shows the results of 100 trials, and

        Figure 2 shows the results of 1000 trials.  The amount of

        variability obviously diminishes as the as the number of trials

        increases, an important lesson.

                                 Figures 1 and 2

             Then Teach asks:  "If you see Magic Johnson miss 8 of 10

        shots after he has returned from an injury, should you think that

        he is in a shooting slump?"  Now the probability problem has

        become a problem in statistical inference -- testing the

        hypothesis that Magic is in a slump.  And with proper

        interpretation the same computer program yields the appropriate

        answer -- about 6.5 percent of the time Magic will miss 8 or more

        shots out of 10, even if he is not in a slump.  So don't take him

        out or stop him from shooting.  Understanding this sort of

        variability over time is the key to Japanese quality control.

             Now the instructor changes the question again and asks:  "If

        you observe a player - call him Houdini - succeed with 47 of 100

        shots, how likely is it that if you were to observe the same

        player take a great number of shots - a thousand or ten thousand

        - his long-run average would turn out to be 53 per cent or

        higher?"  A sample of 47 baskets out of 100 shots could come from

        players of quite different "true" shooting percentages.

        Resampling can help us make transparent several different

        approaches to this problem in "inverse probability".

             Clearly, we need to have some idea of how much variation

        there is in samples from shooters like Houdini.  If we have no

        other information, we might reasonably proceed as if the 47/100

        sample is our best estimate of Houdini's "true" shooting

        percentage.  We could and take repeated samples from a 47%

        shooting machine to estimate how great the variation is from

        shooters with long-run averages in that vicinity, from which we

        could estimate the likelihood that the "true" average is 53%.

        (This is the well-known "confidence interval" approach.  In

        truth, the logic is a bit murky, but that is seldom a handicap in

        daily practice.  )

             Alternatively, we might be interested in a particular

        shooting percentage - say, Houdini's lifetime average before a

        shoulder injury.  In such a case, we might want to know whether

        the 47 for 100 is just spell of below-average shooting, or an

        indication that the injury has affected his play.   In this

        situation, we could repeatedly sample from a 53% shooting machine

        to see how likely a 47/100 sample is.  Using this "hypothesis-

        testing" approach, if we find that the 47/100 is very

        unusual, we concluded that the injury is hampering Houdini; if

        not, not.

             Consider still another possibility:  If Houdini is a rookie

        with no history in the league, we might want to apply additional

        knowledge about how often 53% shooters are encountered in the

        league. Here we might bring in information about the "distribution"

        of the averages of other players, or of other rookies, to see how

        likely a 47/100 sample is in light of such a distribution - a

        "Bayesian" approach to the matter.

             The resampling approach to problems like this one helps

        clarify the problem.  Because there are no formulae to fall back

        upon, you are forced to think hard about how best to proceed.

        Foregoing these crutches may make the problem at hand seem

        confusing and difficult, which is sometimes distressing. But in

        the long run it is also the better way, because it forces you to

        come to terms with the subtle nature of such problems rather than

        sweeping these subtle difficulties under the carpet.  You will

        then be in a better position to choose a step-by-step logical

        procedure which fits the circumstances.

             To repeat, in the absence of black-box computer programs and

        cryptic tables, the resampling approach forces you to directly

        address the problem at hand.  Then, instead of asking "Which

        formula should I use?" students ask such questions as "Why is

        something `significant' if it occurs 4% of the time by chance,

        yet not `significant' if a random process produces it 8% of the

        time?"

                          MAKING THE PROCEDURE MORE PRECISE

             Let's get a bit more precise and systematic.  Let us define

        resampling to include problems in inferential statistics

        as well as problems in probability, with this "operational

        definition":  With the entire set of data you have in hand, or

        with the given data-generating mechanism (such as a die) that is

        a model of the process you wish to understand, produce new

        samples of simulated data, and examine the results of those

        samples.  That's it in a nutshell.  In some cases, it may also be

        appropriate to amplify this procedure with additional assumptions

        that you deem appropriate.

             Problems in pure probability may at first seem different in

        nature than problems in statistical inference.  But the same

        logic as stated in the definition above applies to both varieties

        of problems.  The only difference is that in probability problems

        the "model" is known in advance -- say, the model implicit in a

        deck of cards plus a game's rules for dealing and counting the

        results -- rather than the model being assumed to be best

        estimated by the observed data, as in resampling statistics.

             Many problems in probability -- all of them, we conjecture -

        have a corresponding flip-side "shadow" or "dual" problem in

        statistics, and vice versa; the basketball case above is an

        example.

                              THE GENERAL PROCEDURE

             The steps in solving the particular problems above have been

        chosen to fit the specific facts.  We can also describe the steps

        in a more general fashion.  The generalized procedure simulates

        what we are doing when we estimate a probability using resampling

        problem-solving operations.

             Step A.  Construct a simulated "universe" of random numbers

        or cards or dice or another randomizing mechanism whose

        composition is similar to the universe whose behavior we wish to

        describe and investigate.  The term "universe" refers to the

        system that is relevant for a single simple event.  For example:

             a)  A coin with two sides, or two sets of random numbers "1-

        5" and "6-0", simulates the system that produces a single male or

        female birth, when we are estimating the probability of three

        girls in the first four children.  Notice that in this universe

        the probability of a girl remains the same from trial event to

        trial event -- that is, the trials are independent --

        demonstrating a universe from which we sample without

        replacement.

             b)  An urn containing a hundred balls, 47 red and 53 black,

        simulates the system that produces 47 baskets out of 100 shots.

             Hard thinking is required in order to determine the

        appropriate "real" universe whose properties interest you.

             Step(s) B.  Specify the procedure that produces a pseudo-

        sample which simulates the real-life sample in which we are

        interested.  That is, you must specify the procedural rules by

        which the sample is drawn from the simulated universe.  These

        rules must correspond to the behavior of the real universe in

        which you are interested.  To put it another way, the simulation

        procedure must produce simple experimental events with the same

        probabilities that the simple events have in the real world.  For

        example:

             a)  In the case of three daughters in four children, you can

        draw a card and then replace it if you are using a deck of red

        and black cards.  Or if you are using a random-numbers table, the

        random numbers automatically simulate replacement.  Just as the

        chances of having a boy or a girl do not change depending on the

        sex of the preceding child, so we want to ensure through

        replacement that the chances do not change each time we choose

        from the deck of cards.

             b)  In the case of Magic Johnson's shooting, the procedure

        is to consider the numbers 1-47 as "baskets", and 48-100 as

        "misses".

             Recording the outcome of the sampling must be indicated as

        part of this step, e.g. "record `yes' if girl or basket, `no' if

        a boy or a miss.

             Step(s) C.  If several simple events must be combined into a

        composite event, and if the composite event was not described in

        the procedure in step B, describe it now.  For example:

             a)  For the three girls in four children, the procedure for

        each simple event of a single birth was described in step B.  Now

        we must specify repeating the simple event four times, and

        determine whether the outcome is or is not three girls.

             b)  In the case of Magic Johnson's ten shots, we must draw

        ten numbers to make up a sample of shots, and examine whether

        there are 8 or more misses.

             Recording of "three or more girls" or "two or less girls",

        and "8 or more misses" or "7 or fewer", is part of this step.

        This record indicates the results of all the trials and is the

        basis for a tabulation of the final result.

             Step(s) D.  Calculate from the tabulation of outcomes of the

        resampling trials.  For example: the proportion of a) "yes" or

        "no", or b) "8 or more" or "7 or fewer", estimates the likelihood

        we wish to estimate in step C.

             There is indeed more than one way to skin a cat (ugh!).  And

        there is always more than one way to correctly estimate a given

        probability.  Therefore, when reading through the list of steps

        used to estimate a given probability, please keep in mind that a

        particular list is not sacred or unique; other sets of steps will

        also do the trick.

             Let's consider an extended example, my study in the 1960s of

        the price of liquor in the sixteen "monopoly" states (where the

        state government owns the retail liquor stores) compared to the

        twenty-six states in which retail liquor stores are privately

        owned. (Some states were omitted for technical reasons.)  This

        problem in " statistical hypothesis testing" would conventionally

        be handled with Student's t-test, but with much less theoretical

        justification than the resampling method possesses here.

             These are the representative 1961 prices of a fifth of

        Seagram 7 Crown whiskey in the two sets of states:

                  16 monopoly states: $4.65, $4.55, $4.11, $4.15, $4.20,

             $4.55, $3.80, $4.00, $4.19, $4.75, $4.74, $4.50, $4.10,

             $4.00, $5.05, $4.20

                  26 private-ownership states:  $4.82, $5.29, $4.89,

             $4.95, $4.55, $4.90, $5.25, $5.30, $4.29, $4.85, $4.54,

             $4.75, $4.85, $4.85, $4.50, $4.75, $4.79, $4.85, $4.79,

             $4.95, $4.95, $4.75, $5.20, $5.10, $4.80, $4.29.

             A social-scientific study properly begins with a general

        question about the nature of the social world such as:  Does

        state monopoly affect prices?  The scientist then must transform

        this question into a form that s/he can study scientifically. In

        this case, the question was translated into a comparison of these

        two sets of data for a single brand as collected from a trade

        publication. If the answer is not completely obvious from causal

        inspection of the data because of variation within the two

        samples - as in the case here, where the two samples overlap -

        the researcher may turn to inferential statistics help.

             The first step in using probability and statistics is to

        translate the scientific question into a statistical question.

        Once you know exactly which prob-stats question you want to ask -

        - that is, exactly which probability you want to determine -- the

        rest of the work is relatively easy.  The stage at which you are

        most likely to make mistakes is in stating the question you want

        to answer in probabilistic terms.  Though this step is difficult,

        it involves no mathematics.  Rather, this step requires only

        hard, clear thinking.  You cannot beg off by saying "I have no

        brain for math!"  The need is for a brain to do clear thinking,

        rather than a brain for mathematics.  People using conventional

        methods avoid this hard thinking by simply grabbing the formula

        for some test without understanding why they choose that test.

        But resampling pushes you to do this thinking explicitly.

             The scientific question here is whether the prices in the

        two sets of states are systematically different.  In statistical
        terms, we wish to "test the hypothesis" that there is a "true"

        difference between the groups of states based on their mode of

        liquor distribution - that is, a difference that is not just the

        result of happenstance -- or whether the observed differences

        might well have occurred by chance.  In other words, we are

        interested only in whether the two sub-groups of states are

        "truly" different in their liquor prices, or whether the

        difference we observe is likely to have been produced by chance

        variability.

             The resampling method proceeds as follows:  We consider that

        the entire "universe" of possible prices consists of the set of

        events that have been observed, because that is all the

        information that is available about the universe.  We therefore

        write each of the forty-two observed state prices on a separate

        card, and shuffle the cards together; the deck now simulates a

        situation in which each state has the same chance as any other

        state of being dealt into a given pile.  We can now examine, on

        the "null hypothesis" assumption that the two groups of states do

        not really reflect different price-setting mechanisms but rather

        differ only by chance, how often that universe produces by chance

        groups with results as different as those we actually observed in

        1961.  (In this case, unlike many others, the states constitute

        the entire universe we are interested in, rather than being a

        sample taken from some larger universe, as is the case when one

        does a biological experiment or surveys a small sample draws from

        the entire U. S. population, say.)

             From the simulated universe we repeatedly deal groups of 16

        and 26 cards without replacing the cards, to represent

        hypothetical monopoly-state and private-state samples.)   We

        sample without replacement (and hence for convenience we need
        only look at the 16 state set, since, once it is set, the average

        of the remaining 26 is also fixed) because there are only 42

        actual states for which data is available, and hence we are not

        making inferences to a larger, infinite universe.  Instead, we

        have the entire universe at hand.

             The probability that prices in the monopoly states are

        "really" lower than those in the private-enterprise states may be

        estimated by the proportion of times that the sum (or average) of

        those randomly-chosen sixteen prices from the simulated universe

        is less than (or equal to) the sum (or average) of the actual

        sixteen prices.  If we were often to obtain a difference between

        the randomly-chose groups equal to or greater than that actually

        observed in 1961, we would conclude that the observed difference

        could well be due to chance variation.

             This logic may not be immediately obvious to the newcomer to

        statistics.  It is fairly subtle, and requires a bit of practice,

        even with the resampling method to bring it to the fore.  But

        once you understand this way of thinking, you will have reached

        the heart of inferential statistics.

             The steps again:

             Step A.  Write each of the 42 prices on a card and shuffle.

             Steps B and C (combined in this case):  Deal the cards into

        groups of 16 and 26 cards.  Then calculate the mean price

        difference between the groups, and compare the experimental-trial

        difference to the observed mean difference of $4.84 - $4.35 =

        $.49; if it is as great or greater than $.49, write "yes",

        otherwise "no".

             Step D.  Repeat step B-C a hundred or a thousand times.

        Calculate the proportion "yes", which estimates the probability

        we seek.
             The estimate -- not even once in 10,000 trials (see Figure

        3) -- shows that it would be very unlikely that two groups with

        mean prices as different as were observed would happen by chance

        from the universe of 42 observed prices.  So we "reject the null

        hypothesis" and instead accept the proposition that the type of

        liquor distribution system influences the prices that consumers

        pay.

                                    Figure 3

             Under the supervision of Kenneth Travers and me at the

        University of Illinois during the early 1970s, PhD candidates

        Carolyn Shevokas and David Atkinson studied how well students

        learned resampling, working with experimental and control groups

        of junior college and four-year college students.  Both found

        that with resampling methods -- even without the help of computer

        simulation -- students produce a larger number of correct answers

        to numerical problems than do students taught with conventional

        methods.  Furthermore, attitude tests as well as teacher

        evaluations showed that students enjoy the subject much more, and

        are much more enthusiastic about it than conventional methods.

             It is an exciting experience to watch graduate engineers or

        high-school boys and girls as young as 7th grade re-invent from

        scratch the resampling substitutes for the conventional tests

        that drive college students into confusion and despair.  Within

        six or nine hours of instruction students are generally able to

        handle problems usually dealt with only in advanced university

        courses.

             The computer-intensive resampling method also provides a

        painless and attractive introduction to the use of computers. And

        it can increase teacher productivity in the school and university

        systems while giving students real hands-on practice.
             Monte Carlo methods have long been used to teach

        conventional methods.  Resampling has nothing to do with the

        teaching of conventional "parametric" methods, however.  Rather,

        resampling is an entirely different method, and one of its

        strengths is that it does not depend upon the assumption that the

        data resemble the "Normal" distribution.  Resampling is the

        method of choice for dealing with a wide variety of everyday

        statistical problems -- perhaps most of them.

             To repeat, the purpose of resampling is not to teach

        conventional statistics.  Rather, resampling breaks completely

        with the conventional thinking that dominated the field until the

        past decade, rather than being a supplement to it or an aid to

        teaching it.

             For those in academia and business who may use statistics in

        their work but who will never study conventional analytic methods

        to the point of practical mastery -- that is, almost all --

        resampling is a functional and easily-learned alternative.  But

        resampling is not intended to displace analytic methods for those

        who would be mathematical statisticians.  For them, resampling

        can help to understand analytic methods better.  And it may be

        especially useful for the introduction to statistics of

        mathematically-disadvantaged students.  (The method is in no way

        intellectually inferior to analytic methods, however; it is

        logically satisfactory as well as intuitively compelling.)

             Though we and the mathematical statisticians who have

        developed the bootstrap element in resampling (following Efron's

        work in the 1970s) have an identical intellectual foundation,

        they and we are pointed in different directions.  They see their

        work as intended mainly for complex and difficult problems; we

        view resampling as a tool for all (or almost all) tasks in prob-
        stats.  Our interest is in providing a powerful tool that

        researchers and decision-makers rather than statisticians can use

        with small chance of error and with total understanding of the

        process.

             Like all innovations, resampling has encountered massive

        resistance.  The resistance has largely been conquered with

        respect to mathematical statistics and advanced applications.

        But instruction in the use of resampling at an introductory

        level, intended for simple as well as complex problems, still

        faces a mix of apathy and hostility.

                                   CONCLUSION

             Estimating probabilities with conventional mathematical

        methods is often so complex that the process scares many people.

        And properly so, because the difficulties lead to frequent

        errors.  The statistical profession has long expressed grave

        concern about the widespread use of conventional tests whose

        foundations are poorly understood.  The recent ready availability

        of statistical computer packages that can easily perform these

        tests with a single command, irrespective of whether the user

        understands what is going on or whether the test is appropriate,

        has exacerbated this problem.

             Probabilistic analysis is crucial, however.  Judgments about

        whether to allow a new medicine on the market, or whether to re-

        adjust a screw machine, require more than eyeballing the data to

        assess chance variability.  But until now, the practice and

        teaching of probabilistic statistics, with its abstruse structure

        of mathematical formulas cum tables of values based on

        restrictive assumptions concerning data distributions -- all of

        which separate the user from the actual data or physical process

        under consideration -- have not kept pace with recent

        developments in the practice and teaching of descriptive

        statistics.

             Beneath every formal statistical procedure there lies a

        physical process.  Resampling methods allow one to work directly

        with the underlying physical model by simulating it.  The term

        "resampling" refers to the use of the given data, or a data

        generating mechanism such as a die, to produce new samples, the

        results of which can then be examined.

             The resampling method enables people to obtain the benefits

        of statistics and probability theory without the shortcomings of

        conventional methods, because it is free of mathematical formulas

        and restrictive assumptions and is easy to understand and use,

        especially in conjunction with the computer language and program

        RESAMPLING STATS.

        chance 0-191 statwork December 19, 1990

                                      [BOX1]

                    THE RESAMPLING STATS LANGUAGE AND PROGRAM

                      COMPARED TO BASIC AND OTHER LANGUAGES

             The computer language and program RESAMPLING STATS enable

        the user to perform experimental trials in the simplest, as well

        as more complex, Monte Carlo simulations of problems in

        probability and statistics.  Most of the twenty or so commands in

        RESAMPLING STATS mimic the operations one would make in

        conducting such trials with dice or cards; for example, SHUFFLE

        randomizes a set of elements.  (The rest of the commands are such

        as IF, END, and PRINT.)  This correspondence between the

        computer operations and the physical operations that one would

        undertake in a simulation with an urn or playing cards or

        whatever, which in turn correspond to the physical elements in

        the real situation being modeled, greatly helps the user to

        understand exactly what needs to be done with the computer to

        arrive at a sound answer.

             RESAMPLING STATS employs a fundamentally different logic

        than do standard programming languages such as BASIC and PASCAL.

        (APL is the only language with a similar logic.)  Standard

        languages imitate mathematical operations by making a variable -

        a single number at a time - the unit that is worked with.  In

        contrast, RESAMPLING STATS works with a collection of numbers - a

        vector.  This enables each operation to be completed in one pass,

        whereas in other languages there must be repeat loops until each

        element in the vector is processed.
             Furthermore, whereas other languages name the variable,

        RESAMPLING STATS names locations, and moves otherwise-nameless

        collections from location to location.  In computer logic this

        may not be a meaningful distinction.  But it is as much a working

        distinction as between a) a set of instructions that tell how to

        process tourist group 37 - first show them where the bar is, have

        their suitcases put away, and get them onto the bus, and b)

        instructions that tell what to do with whomever is in hotel A on

        January 1 and move them to hotel B, then process whomever is in

        hotel B on January 2 and then move them to hotel C, and so on.

             RESAMPLING STATS programs are much shorter and clearer than

        BASIC programs.  Typically, only about half as many instructions

        are needed.  Here is an example of the same problem written in

        the two languages, selected for illustration because it is the

        very first problem in the 1987 book THE ART AND TECHNIQUES OF

        SIMULATION, by Mrudulla Gnanadesikan, Richard L. Scheaffer, and

        Jim Swift, prepared by the American Statistical Association for

        use in high schools.

             "Outcomes with a Fair Coin: What are the numbers of heads (or

        tails) you can expect to get if you flip a given number of coins?"

        Please notice that there is a statistics problem closely related

        to this probability problem, with the same program used

        to solve it.  For example:  "You have a device that produces (say)

        a sample of 15 successes in 20 attempts.  How likely is it that

        the long-run ("true") rate for the device is 50% successes (or

        less)?"

              The BASIC program of Gnanadesikan et. al. is as follows:

        BASIC Program to Simulate Trials with Repeated Coin Tosses

        80  INPUT "ENTER THE NUMBER OF KEY COMPONENTS";N
        100  INPUT "ENTER THE NUMBER OF TRIALS";NT
        120  DIM T$(NT,N),C(2*N)
        140  FOR i = 1 TO NT
        150  LET NH = 0
        160  FOR J = 1 TO N
        170  LET X = RND (1)
        180  IF X < .5 THEN 220
        190  T$ (I,J) = "H"
        200  NH = NH + 1
        210  GOTO 230
        220  T$ (I,J) = "T"
        230  IF J = N THEN 260
        250  GOTO 270
        270  NEXT J
        280  C(NH + 1) = C(NH + 1) + 1
        290  NEXT I
        330  FOR K = 1 TO N + 1
        350  NEXT K
        360  END

             The BASIC program is written in general form and does not

        specify a particular number of coins and heads, as RESAMPLING

        STATS does.  (It has been simplified by removing the many "print"

        statements.)  Here is the RESAMPLING STATS program that does the

        same job, for a sample of five coins.

        REPEAT 100            Run a hundred simulation trials
           GENERATE 5 1,2 A   Generate randomly a sample of
                                 five "1"s and "2"s
           COUNT A 1 B        Count the number of "1"s in this trial sample
           SCORE B Z          Record the results in vector Z
        END                   End the repeat loop
        HISTOGRAM Z           Graph the results, and also produce a
                                 table of results with their relative and
                                 cumulative frequencies

        The results of this RESAMPLING STATS program are in Figure B1.

                                    Figure B1

                Do you agree that the RESAMPLING STATS program is not only

        much shorter and easier to write, but also is much more obvious

        to your intuition?

             Even more important, the above program in RESAMPLING STATS

        language is written by the user, which leads to learning

        about both statistics and computers.  In contrast, the BASIC

        program given by Gnanadesikan et. al. is pre-written by the

        authors, and all the user does is fill in the parameters.  The

        students therefore do not learn what is necessary to develop an

        abstract model of the real-life situation, or write a computer

        program to simulate that model, both of which are crucial steps

        in the learning process.

                                  [END OF BOX]

                                      [BOX 2]

                           THE PRO'S AND CON'S OF RESAMPLING

             1) Does Resampling Produce Correct Estimates?

             If one does not make enough experimental trials with the

        resampling method, of course, the answer arrived at may not be

        sufficiently exact.  For example, only ten experimental bridge

        hands might well produce far too high or too low an estimate of

        the probability of five or more spades.  But a reasonably large

        number of experimental bridge hands should arrive at an answer

        which is close enough for any purpose. There are also some

        statistical situations in which resampling yields poorer

        estimates about the unknown population than does the conventional

        parametric method, usually "bootstrap" confidence-interval

        estimates made from small samples, especially yes-or-no data. But

        on the whole, resampling methods yield "unbiased" estimates, and

        not less often than do conventional methods. Perhaps most

        important, the user is more like to arrive at sound answers with

        resampling because s/he can understand what s/he is doing,

        instead of grabbing the wrong formula in error.

             2. Do Students Learn to Reach Sound Answers?

             In the 1970s, Kenneth Travers, who was responsible for

        secondary mathematics at the College of Education at the

        University of Illinois, and Simon organized systematic controlled

        experimental tests of the method.  Carolyn Shevokas's thesis

        studied junior college students who had little aptitude for

        mathematics.  She taught the resampling approach to two groups of

        students (one with and one without computer), and taught the

        conventional approach to a "control" group.  She then tested the

        groups on problems that could be done either analytically or by

        resampling.  Students taught with the resampling method were able

        to solve more than twice as many problems correctly as students

        who were taught the conventional approach.

             David Atkinson taught the resampling approach and the

        conventional approach to matched classes in general mathematics

        at a small college.  The students who learned the resampling

        method did better on the final exam with questions about general

        statistical understanding.  They also did much better solving

        actual problems, producing 73 percent more correct answers than

        the conventionally-taught control group.

             These experiments are strong evidence that students who

        learn the resampling method are able to solve problems better

        than are conventionally taught students.

             3) Can Resampling Be Learned Rapidly?

             Students as young as junior high school, taught by a variety

        of instructors, and in languages other than English, have in the

        matter of six short hours learned how to handle problems that

        students taught conventionally do not learn until advanced

        university courses.  In Simon's first university class, only a

        small fraction of total class time -- perhaps an eighth -- was

        devoted to the resampling method as compared to seven-eighths

        spent on the conventional method.  Yet, the tested students

        learned to solve problems more correctly, and solved more

        problems, with the resampling method than with the conventional

        method.  This suggests that resampling is learned much faster

        than the conventional method.

             In the Shevokas and Atkinson experiments the same amount of

        time was devoted to both methods but the resampling method

        achieved better results.  In those experiments learning with the

        resampling method is at least as fast as the conventional method,

        and probably considerably faster.

             4.  Is the Resampling Method Interesting and Enjoyable?

             Shevokas asked her groups of students for their

        opinions and attitudes about the section of the course devoted to

        statistics and probability.  The attitudes of the students who

        learned the resampling method were far more positive -- they

        found the work much more interesting and enjoyable -- than the

        attitudes of the students taught with the standard method.  And

        the attitudes of the resampling students toward mathematics in

        general improved during the weeks of instruction while the

        attitudes of the students taught conventionally changed for the

        worse.

             Shevokas summed up the students' reactions as follows:

        "Students in the experimental (resampling) classes were much more

        enthusiastic during class hours than those in the control group,

        they responded more, made more suggestions, and seemed to be much

        more involved".

             Gideon Keren taught the resampling approach for just six

        hours to 14- and 15-year old high school students in Jerusalem.

        The students knew that they would not be tested on this material.

        Yet Keren reported that the students were very much interested.

        Between the second and third class, two students asked to join

        the class even though it was their free period!  And as the

        instructor, Keren enjoyed teaching this material because the

        students were enjoying themselves.

             Atkinson's resampling students had "more favorable opinions,

        and more favorable changes in opinions" about mathematics

        generally than the conventionally-taught students, according to

        an attitude questionnaire.  And with respect to the study of

        statistics in particular, the resampling students had much more

        positive attitudes than did the conventionally-taught students.

             The experiments comparing the resampling method against

        conventional methods show that students enjoy learning statistics

        and probability this way.  And they don't show the panic about

        this subject often shown by many others.  This contrasts sharply

        with the less positive reactions of students learning by

        conventional methods, even when the same teachers teach both

        methods in the experiment.

                                  [END OF BOX]

        Additional Readings

        Edgington, Eugene S., Randomization Tests, Marcel Dekker, N. Y. ,

        1980

        Efron, Bradley, and Diaconis, Persi; "Computer Intensive Methods

        in Statistic,"  Scientific American, May, 1983, pp. 116-130.

        Noreen, Eric W., Computer Intensive Methods for Testing

        Hypotheses, Wiley, 1989

        Simon, Julian L., Basic Research Methods in Social Science, 1969, N.

        Y., Random House  (3rd Edition in 1985 With Paul Burstein)

        Simon, Julian. L., Atkinson, David. T., and Shevokas, Carolyn.

        "Probability and Statistics:  Experimental Results of a Radically

        Different Teaching Method,"  American Mathematical Monthly, v.

        83, No. 9, Nov. 1976
        Simon, Julian. L., Resampling:  Probability and Statistics a Radically

        Different Way   (unpublished manuscript available from the

        author)

                         FIGURE FOR BOX (B1)

  40+
    +                               *
    +                               *
F   +                               *
r   +                               *
e 30+                               *
q   +                     *         *
u   +                     *         *
e   +                     *         *
n   +                     *         *
c 20+                     *         *
y   +                     *         *         *
    +                     *         *         *
*   +                     *         *         *
    +           *         *         *         *
Z 10+           *         *         *         *
1   +           *         *         *         *
    +           *         *         *         *
    + *         *         *         *         *
    + *         *         *         *         *         *
   0+-----------------------------------------------------
      |^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|
      0         1         2         3         4         5

                   Number of Heads

                             FIGURE 2
                    Magic Johnson, 47% shooter
                            100 trials

  40+
    +
    +
F   +
r   +
e 30+
q   +
u   +
e   +                *
n   +           *    *
c 20+           *    *    *
y   +           *    *    *
    +           *    *    *    *
*   +           *    *    *    *
    +           *    *    *    *
Z 10+           *    *    *    *
2   +      *    *    *    *    *    *
    +      *    *    *    *    *    *
    + *    *    *    *    *    *    *
    + *    *    *    *    *    *    *    *
   0+-------------------------------------------
      |^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|
      2         4         6         8        10

                Misses in 10 shots

                             FIGURE 3
                    Magic Johnson, 47% shooter
                            1000 trials

  400+
     +
     +
F    +
r    +
e 300+
q    +
u    +
e    +
n    +                          *    *
c 200+                     *    *    *
y    +                     *    *    *
     +                     *    *    *
*    +                     *    *    *    *
     +                     *    *    *    *
Z 100+                *    *    *    *    *
3    +                *    *    *    *    *
     +                *    *    *    *    *    *
     +                *    *    *    *    *    *
     +           *    *    *    *    *    *    *    *
    0+-----------------------------------------------------
       |^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|
       0         2         4         6         8        10

                         Misses in 10 Shots

                             FIGURE 3
                          Liquor Prices
                          10,000 trials

  1000+
      +
      +
F     +
r     +
e  750+
q     +
u     +                             ****
e     +                             **** *
n     +                           ********
c  500+                           **********
y     +                         ************
      +                         *************
*     +                        ***************
      +                       ****************
Z  250+                       *****************
4     +                     ********************
      +                     *********************
      +                   *************************
      +                *******************************
     0+---------------------------------------------------------------
        |^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|^^^^^^^^^|
       -60       -40       -20        0        20        40        60
                                   Cents