PROBABILITY AND STATISTICS: EXPERIMENTAL RESULTS OF A RADICALLY DIFFERENT TEACHING METHOD By Julian L. Simon, David T. Atkinson and Carolyn Shevokas Reprinted from the AMERICAN MATHEMATICAL MONTHLY Vol. 83, No. 9, November 1976 pp. 733-739 Reprinted from the AMERICAN MATHEMATICAL MONTHLY Vol. 83, No. 9, November 1976 pp. 733-739 PROBABILITY AND STATISTICS: EXPERIMENTAL RESULTS OF A RADICALLY DIFFERENT TEACHING METHOD Julian L. Simon, David T. Atkinson and Carolyn Shevokas Introduction. With the Monte Carlo method, students from high school to graduate school can quickly acquire the ability to handle probabilistic problems of daily living or scientific research. And the students understand what they are doing, with little danger of the formula-grabbing which too often afflicts conventional analytic methods. The Illinois procedure for teaching the Monte Carlo method has been used since 1965 with success: (a) for teaching research methods to graduate students in several fields at the University of Illinois who have already had one or several conventional statistics courses, but who nevertheless find themselves insufficiently equipped to handle the statistical problems in research projects; (b) with undergraduates in research methods courses; (c) with undergraduates as part of conventional statistics courses; and (d) with high school students down to age 13 or 14, in the U.S. (Simon and Holmes, 1969), in Israel and in Puerto Rico. The results seem successful to the teachers and to the students, as evidenced by the teachers' judgments and the students' answers to informal questionnaires. But such "soft" evidence is insufficient to convince skeptics þ which is perhaps as it ought to be. Harder evidence is therefore needed. To supply that evidence is the task of this paper. We first recapitulate the method and its logic. Then we describe three experiments that test the value of the method in a variety of class settings. The Monte Carlo method is not offered as a successor to analytic methods. Rather, it can be an underpinning for analytic teaching to help students understand analytic methods better. But it is also a workable and easily-taught alternative for students who will never study conventional analytic methods to the point of practical mastery þ and this includes most students at all educational levels. It may be especially useful for the introduction to statistics of mathematically-disadvantaged students. (But please do not infer from this that the method is intellectually inferior; the method is logically acceptable and intuitively instructive for all students.) It must be emphasized that the Monte Carlo method as described here really is intended as an alternative to conventional analytic methods in actual problem-solving practice. This method is not a pedagogical device for improving the teaching of conventional methods. This is quite different than the past use of the Monte Carlo method to help teach sampling theory, the binomial theorem and the central limit theorem. The point that is usually hardest to convey to teachers of statistics is that the method suggested here really is a complete break with conventional thinking, rather than a supplement to it or an aid in teaching it. That is, the simple Monte Carlo method described here is complete in itself for handling most þ perhaps all þ problems in probability and statistics. The Monte Carlo method always provides a logically acceptable solution. But more specifically with respect to statistical hypothesis testing, the Monte Carlo tests based on a randomization logic have properties that statisticians are now finding attractive because they are more robust than traditional parametric tests. (For the test of differences in means between two groups based on Fisher's randomization test, see Dwass 1957, and Chung and Fraser, 1958; for a variety of other tests see Simon, 1969, Chapters 23 and 24.) Hence the Monte Carlo test is often a better scientific choice than the conventional test þ in addition to its padagogical advantages. To illustrate the method, here is a sample question and examination answer by a high school student (one who qualified for an experimental course) after just six hours of classroom instruction: John tells you that with his old method of shooting foul shots in basketball his average (over a long period of time) was 6. Then he tried a new method of shooting and scored successes with nine of his first ten shots. Should he conclude that the new method is really better than the old method? Student A.S.: (a) Take twelve hearts to represent hits in shooting and eight spades to be misses; this is John's old probability in scoring. (b) Shuffle, draw a card and record "hit" or "miss", replace it and shuffle. (c) Repeat ten times altogether for one trial. (d) See how many times nine hits or more come up on ten tries. 1. Hit, hit, miss, miss, h, h, h, h, h, m, 7/10 hits9.6/10 2. 7/10 10. 9/10 3. 8/10 11. 5/10 4. 4/10 12. 7/10 5. 6/10 13. 8/10 6. 7/10 14. 6/10 7. 7/10 15. 8/10 Only 1 time in 15 times will 9/10 shots be made by the old .6 chance, so it seems probable here that John's new method helped. The Monte Carlo method is not explained by the instructor. Rather it is discovered by the students. With a bit of guidance the students invent, from scratch, the procedures for solution. For example, at the beginning of the first class the instructor may ask, "What are the chances that if you have four children three of them will be girls?" A few students do some calculations without success (in a naive class); the rest fidget. Then the students say that they don't know how to get the answer. The instructor presses the class to think of some way to come up with an answer. Someone suggests in jest that everyone in the class should go out and have four children. The instructor chooses to take this seriously. He says that this is a very good suggestion, though it has some obvious drawbacks. Someone suggests substituting a coin for a birth. This raises the issue of whether it is reasonable to approximate a 106:100 event with a 50-50 coin, and what is reasonable under various conditions. The instructor points out that the class still has no answer. Someone suggests that each student throw four coins. Someone else amends this by saying that four flips of one coin are just as good. The instructor questions whether the two methods are equivalent, and the class eventually agrees that they are. Finally, each student performs a trial, the data are collected, and an estimate is made. Someone wonders how good the estimate is. Someone else suggests that the experiment be conducted several more times to see how much variation there is. The meaning of the concept "chances" comes up in the discussion, and "probability" is defined pragmatically. By this process of self-discovery, students develop useful operating definitions of other necessary concepts such as "universe," "trial," "estimate," and so on. And together they invent þ after false starts and class corrections þ sound approaches to easy and not-so-easy problems in probability and statistics. For example, with a bit of guidance, an average university class can be brought to re-invent such devices as a Monte Carlo version of Fisher's randomization test. In an earlier report (Simon, 1969, Chapters 23 and 24) the flexibility and range of the Monte Carlo method is shown in problems ranging from permutations to correlation to randomization tests. In this manner, the students learn more than how to do problems. They gain the excitement of true intellectual discovery. And they come to understand something of the nature of mathematics and its creation. Though the experience of shuffling cards and counting tabled random numbers is educational at first, it tends to be a nuisance after awhile, and a deterrent to the use of the method. Furthermore, in some problems, the sample size required for the desired accuracy makes such hand methods onerous if not impossible. Therefore, a computer program, SIMPLE, has been developed that will perform the necessary operations rapidly and yet can be used immediately by a person with absolutely no computer experience. The SIMPLE program is also designed to be used as the method of choice for computer experience. The SIMPLE program is also designed to be used as the method of choice for sophisticated statisticians in many sorts of applications. This program is described in Simon and Weidenfeld (1974) , and materials are available upon request. A systematic Monte Carlo method is taught at the University of Illinois: this is an important difference from some examples in the literature of ad hoc Monte Carlo problem solution, e.g. Zelinka, 1973. The student is taught to work in a series of discrete steps. The first step is the construction of the universe whose behavior one is interested in. The second step (or set of steps) is the drawing of a sample from that universe. The third step is the computation of the statistic of interest, and, in inferential statistics, comparison of the experimental statistic to the "observed" or "bench-mark" statistic. The fourth step is the repetition of the sampling procedure a large number of times. And then the fifth step is the calculation of the proportion of "successes" to experimental trials, which estimates the probability of the event in which one is interested. The experiments. Three controlled experimental tests of the pedagogical efficiency of the Monte Carlo method have now been completed. The University of Illinois Experiment: The experimental situation was a one-semester elementary statistics class of 25 mostly economics and business undergraduate majors in 1973 at the University of Illinois. The course, taught by Simon, was primarily a conventional statistics course, using a conventional text (Spurr and Bonini, rev. ed., 1973); the Monte Carlo method was taught only as a supplement, with no reading on it other than the simulation chapter in Spurr and Bonini and the Zelinka article (1973) and suggested reading in Simon (1969, Chapters 23-25). All problems that were treated by the Monte Carlo method in class were also demonstrated by analytic methods, whereas many problems were solved by analytic methods that were not treated in class by the Monte Carlo method. Therefore, analytic methods had a very large advantage over the Monte Carlo method in student time and attention, both in reading and in class. Among the ten questions on the final exam (of which the student had to answer 8), there were four that the student could choose whether to answer by analytic methods or by Monte Carlo; the question given earlier is an example of these four questions. The choices of method by the students on the optional-method question give an indication of the usefulness of the Monte Carlo method. Some additional conditions relevant to the experiment: The students could bring books and notes. (A closed-book exam, where formulae had to be remembered, would disadvantage analytic methods relative to Monte Carlo methods.) And the four optional-method problems were extremely simple ones for the use of analytic methods. (Complex problems would tend to improve the relative performance of the Monte Carlo method, because complexity is its comparative advantage.) The results were as follows: 1. Almost every student used the Monte Carlo method for some question. This is the most exciting result of the experiment, because it suggests that the method has some usefulness to almost everyone. In total, more than half of the answers used the Monte Carlo method (44 of 86). 2. Almost every student did some questions by analytic methods. This implies that teaching the Monte Carlo method does not prevent the learning and use of analytic methods: that is, Monte Carlo does not drive out analytic methods. This is also a gratifying result. 3. There is a slightly-greater propensity for students who did better on the examination as a whole to do a larger proportion of problems by the Monte Carlo method. But the relationship is certainly not strong, which suggests that the Monte Carlo method is useful both to the good and to the less-good students. (And the lack of strong relationship also implies that we need not worry that the students who got better scores on the exam did so because they used the Monte Carlo more extensively and were therefore graded more easily.) 4. On each question some students used analytic methods and others used Monte Carlo methods. This shows that the Monte Carlo method is not specialized to some sorts of problems in the minds and practices of the students. 5. The average grades that the students received were higher on the questions answered with the Monte Carlo method than on those questions answered with analytic methods þ 9.1 versus 7.5 on a scale of 10. Polk Community College Experiment: At Polk Community College, Winter Haven, Florida, in 1974, Shevokas taught separate classes of General Mathematics, a 6" week 17 class-hours unit in probability and statistics, in three ways: conventional analytic method, Monte Carlo method with computer, Monte Carlo method without computer. The enrollments were 19, 39, and 13, respectively. Beforehand, the groups were given a cooperative Arithmetic Achievement Test (Educational Testing Service, 1962) and two attitude-toward-mathematics tests (Aiken and Dreger, 1961; McCallon and Brown, 1971; sample item: "The feeling I have toward math is a good feeling"). The differences in results among groups were not statistically significant, so we can safely consider that the groups were similar to start with. Only a mini-computer was available, and hence the types of programs that could be offered were not satisfactory. And the computer group had less time to learn probability because of the time devoted to learning about the computer programs. For these and other reasons we would have liked to confine our attention to the non-computer aspects of the experiment, but we include the with-computer group to increase Monte Carlo sample size. The conventional analytic group was assigned two conventional chapters on probability and statistics in a basic text (Meserve and Sobel, 1973); the Monte Carlo group was given duplicated reading materials prepared by Shevokas. 1. All students were given the same seven-question exam on completion of the probability unit; a typical question was: "Suppose a machine produces bolts, 10% of which are defective. Find the probability that a box of three bolts contains at least one defective bolt." The mean scores were: conventional, 35.8; Monte Carlo no-computer, 58.5; Monte Carlo with-computer, 50.8, on a basis of 100. While one could wish for higher scores altogether, the Monte Carlo groups did better. The difference between Monte Carlo and conventional groups is statistically significant, but even more important, it is of an educationally significant magnitude; the Monte Carlo no-computer group got 62% higher scores than the conventional group. 2. The two attitude-toward-mathematics scales were again administered afterwards. The Monte Carlo groups showed more favorable attitudes than the conventional group, with the non-computer Monte Carlo group being most favorable; considering the two scales together, the post-scores differ significantly among the groups. Perhaps most interesting, the mean changes from "before" to "after" for the Aiken-Dreger and McCallon-Brown scales were: conventional, þ5% and þ9%; Monte Carlo with-computer, 0% and þ8%; Monte Carlo no-computer, +22% and +8%. To put it more concretely, five of 19 conventional-group students had an improved attitude, 13 a worsened attitude (one tie); among the Monte Carlo no-computer group, 8 students had improved attitudes, 5 worsened. (The attitudes of the Monte Carlo with-computer group were apparently harmed by their need to spend extra hours on campus to use the computer.) 3. It is an important result that despite an initially-cool attitude toward the no-computer Monte Carlo method by the teacher, she came to enjoy teaching the Monte Carlo method much more than the conventional method, because the students reacted to the Monte Carlo work in an interested and enthusiastic manner. Olivet Nazarene College Experiment: At Olivet Nazarene (four year) College, Kankakee, Illinois, during the second half of each semester in 1974-1075 one class in Mathematics for General Education was taught probability and statistics by Atkinson in a conventional analytic fashion, while a second class was taught the Monte Carlo method. Class size was 21 students in each section the first semester; in the second semester there were 37 and 34 students, respectively, in the Monte Carlo and conventional sections. As in the case of the Polk Junior College experiment, students in this course generally have low skills and little interest in mathematics. Comparable duplicated reading materials prepared by Atkinson on the conventional and Monte Carlo methods were distributed to the respective classes. 1. In the first semester two pre-exam quizzes were given to each group, whereas three quizzes were given in the second semester. These quizzes each contained 1, 2 or 3 probability or statistical problems. On each quiz the Monte Carlo section did better than the conventional group, achieving class mean scores as much as twice as high as the conventional group. 2. The first part of the final exam the first semester was "conceptual." It asked the student to analyze problem data and describe the population, the hypotheses, and so on. The conventional group did better, getting a mean score of 47.9 compared to 40.8 for the Monte Carlo group (t=1.25). The analogous first part of the second semester's final exam was a 20-question multiple-choice test on the concepts of hypothesis testing. This time the Monte Carlo group did better, 60.3 to 51.8 (t=2.06). 3. The most important measure of performance was the second part of the final exam containing, respectively, three and four problems in the first and second semesters. Mean scores were: Semester 1: Monte Carlo, 69.5; conventional, 59.4 (t-1.7). Semester 2: Monte Carlo, 67.6; conventional, 56.6 (t=2.06). Inspection of second-semester tests showed that the Monte Carlo group did better on each and every question. 4. If one considers questions and answers only as "right" or "wrong," in the second semester 45.9% of the Monte Carlo students answered at least two questions correctly, whereas among the conventional group only 26.5% got two or more questions right. And the Monte Carlo group got 34.4% of the total questions right whereas the conventional group got 19.8% of the questions right. Comparative scoring of Monte Carlo and analytic answers requires some judgment. But the fact that the teachers in the Polk and Olivet Nazarene experiments (though not in the Illinois experiment) were not initially in favor of the non-computer Monte Carlo method provides some protection. 5. The Monte Carlo section had less mathematical ability than the conventional section in both semesters; the Monte Carlo groups had lower mean scores on the ACT math test, two quizzes and the midterm exam on the algebra material taught in the first half of the semesters, some of the differences being statistically significant. Hence the better performance shown by the Monte Carlo groups on the probability and statistical material was despite a lower endowment of mathematical ability. 6. A twenty-question attitude-toward-mathematics scale similar to the Aiken-Dreger scale was given before and after the probability-statistics unit. In both semesters the Monte Carlo groups began with less favorable attitudes. But by the end of the experiment the Monte Carlo groups' attitudes toward math were more favorable than those of the conventional groups. 7. An attitude-toward-probability-and-statistics scale was given after the probability-statistics instruction. In the first semester, eight of ten questions were answered more favorably among the Monte Carlo group by substantively large and statistically significant differences; the other two differences were tiny, and the questions referred to future plans rather than attitudes. In the second semester, 15 of 17 attitudes were more favorable in the Monte Carlo group, most of the differences being large and all with 1 > 1; two other questions were very slightly more favorable in the conventional group, with t < .23. 8. The teacher's subjective evaluation, as in other classes where the Monte Carlo has been taught, is that the students seem relatively interested in and enthusiastic about the material, with a great deal of class discussion. This made for an enjoyable experience for the teacher, despite initial doubts about the value of the Monte Carlo method. Conclusions. Taken as a whole, the evidence shows that the Monte Carlo method is a tool that students can and will use to arrive at correct answers to probabilistic-statistical problems. Therefore, it would seem to make sense to teach students to do standard probabilistic questions with the Monte Carlo method. In a conventional university probability or statistics course, this implies teaching the Monte Carlo method along with the analytic methods. In high school or college situations in which the student will not get a course or even a long section on probability and statistics, this implies teaching a block of 6-10 hours of the Monte Carlo method in the basic mathematics course so that the student will have at least some tools at his disposal. If one has to make a pedagogical choice between analytic and Monte Carlo methods, it would seem that Monte Carlo is the method of choice on a "cost/benefit" basis þ that is, it yields more usable output per unit of learning input. But luckily one does not usually have to make such a choice, because there can be plenty of time in the conventional elementary course for the Monte Carlo method to be treated along with the analytic method. And in a situation where the Monte Carlo method and only the Monte Carlo method might be taught þ say a high school and junior college þ the conventional method usually has no real opportunity at present to receive the attention that it must for students to acquire a usable tool, and hence the conventional approach is not a real alternative to the Monte Carlo method. Lest this be unclear or seem to equivocate: Where there is limited time, or where students will not be able to grasp conventional methods firmly, we advocate teaching the Monte Carlo approach, and perhaps that only. Where there is more time, and where students will be able to well learn conventional methods, we advocate (a) teaching Monte Carlo methods at the very beginning as an introduction to statistical thinking and practice; and (b) afterwards teaching the monte Carlo method with the conventional method as alternatives to the same problem, to help students learn analytic methods and to give them an alternative tool for their use. Teaching the Monte Carlo method also has additional pedagogical advantages. It produces (in fact, demands) a high level of class participation and teacher-student interaction. This makes the class time lively and enjoyable. The method also leads students to discover for themselves the intuitive meaning of fundamental concepts such as independence. And it increases their readiness to challenge the validity of the underlying data, which they must receive in raw form for the Monte Carlo method rather than in the defect-hiding summary form as, for example, "a population with æ = 100 and å = 10," the sort of language in which conventional problems are usually stated. The advantage of the Monte Carlo Method seems to stem from its greater simplicity in a fundamental intuitive sense due to having fewer "working parts," and because the student never needs to take anything on faith, especially the sort of faith that is necessary with analytic methods that work by way of the central limit theorem ("It is shown in advanced texts that...). Do we not owe it to our students and ourselves to at least give the Monte Carlo a hearing and a try? Acknowledgment. Kenneth Travers supervised Atkinson's and Shevokas's theses at the College of Education of the University of Illinois, from which their results are drawn; we are grateful for Traver's important contribution to this work. We also appreciate helpful comments in an earlier draft from Bob Bohrer. References D.T. Atkinson, A Comparison of the Teaching of Statistical Inference by Monte Carlo and Analytical Methods, Ph.D. thesis, University of Illinois, 1975. J.H. Chung and D. A.S. Fraser, Randomization tests for a two-sample problem, J. Amer. Statist. Assoc., 53 (September 1958) 729-35. Meyer Dwass, Modified randomization tests for nonparametric hypothesis, Ann. Math. Statist., 28 (March 1957) 181-187. Educational Testing Service, Cooperative Mathematics Tests, Arithmetic, Form A, Princeton, N.J. 1962. A.L. Edwards, Techniques of Attitude Scale Construction, Appleton-Century-Crofts, New York, 1957. E.L. McCallon and J.D. Brown, A semantic differential instrument for measuring attitude toward mathematics, J. Experimental Education, 39 (Summer 1971) 69-79. B.E. Meserve and M.A. Sobel, Introduction to Mathematics, 3rd ed., Prentice-Hall, Englewood Cliffs, N.J., 1973. Carolyn Shevokas, Using a Computer-Oriented Monte Carlo Approach to Teach Probability and Statistics in a Community College General Mathematics Course, Ph.D. thesis, University of Illinois, 1974. J.L. Simon, Basic Research Methods in Social Science, Random House, New York, 1969. _____, with the assistance of Allen Holmes, A really new way to teach (and do) probability and statistics, The Mathematics Teacher, 62 (April 1969) 283-288. _____, and Dan Weidenfeld, SIMPLE: Computer Program for Monte Carlo Statistics Teaching, mimeo, 1975. W.A. Spurr, and C.P. Bonini, Statistical Analysis for Business Decisions, rev. ed., Irwin, Homewood, 1973. Martha Zelinka, How Many Games to Complete a World Series? in F. Mosteller, et al. (eds.) Statistics by Example, Addison-Wesley, Reading, Mass., 1973. DEPARTMENT OF ECONOMICS, UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN, URBANA IL 61801 DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE, OLIVET NAZARENE COLLEGE, KANKAKEE, IL 60901 DEPARTMENT OF MATHEMATICS AND PHYSICAL SCIENCE, THORNTON COMMUNITY COLLEGE, SOUTH HOLLAND, IL 60473