Home Categories social psychology Say No to "Pseudo-Psychology"

Chapter 9 Chapter 5 Correlation and Causation—Using the "Oven Method" for Contraception

A few years ago, a large-scale study was conducted in Taiwan to investigate what factors were associated with contraceptive use.A large research team of sociologists and physicians collected a large amount of data on environmental and behavioral variables.The researchers were interested in which variable most accurately predicted contraceptive method.After collecting the data, the researchers found that one variable was most strongly associated with contraceptive use: the number of household appliances (oven, fans, etc.) in the household (Li, 1975). This result probably won't prompt you to suggest that you give out free ovens in high schools to address teenage pregnancies.But why don't you have such an idea?The correlation between electrical appliances and contraceptive use was high, and of the variables measured, this variable was the single most accurate predictor.I hope your answer will be: The question is about the "nature" rather than the "strength" of the relationship between these two variables.Launching the Free Oven Project heralds the notion that ovens lead to contraceptive use.In practice we would dismiss this suggestion as absurd, at least in the obvious example above, recognizing that the two variables may be correlated, but not causal.

In this example, we can conjecture that the relationship exists because the variables "use of contraception" and "number of appliances in the household" are linked through other variables that are correlated with both variables .Education may be one of the mediating variables.We know that education level is related to both contraceptive use and socioeconomic status.All we need now is the fact that better-off households have more appliances, we all have that association.Of course, other variables may also play a mediating role in the relationship between the two.However, no matter how strong the correlation between "Number of household appliances" and "Contraceptive use" is, this relationship does not imply a causal relationship between them.

The example of contraceptive methods makes it easy to understand the main idea of ​​this chapter: correlation does not necessarily mean causation.In this chapter we will discuss two major problems that prevent us from making causal inferences: the third variable problem and the directionality problem.We will also discuss how selection bias can lead to third variable problems. The limitations of the correlation evidence are not all as easy to identify as the "oven" example.When causality is obvious to us, when we have deep-seated biases, or when our interpretations are dominated by theoretical fixations, it is easy to dismiss correlation as evidence of causation.

In the early 20th century, tens of thousands of people in the southern United States fell ill and died from a disease called pellagra (approximately 100,000 deaths per year).Pellagra was considered an infectious disease caused by an unknown microorganism, and its main symptoms were dizziness, lethargy, ulcers, vomiting, and severe diarrhea (Chase, 1977, p.205).Since then, many physicians from the National Society for the Study of Pellagra have agreed with the evidence that pellagra is linked to sanitation.This is not surprising.People in Spartanburg, South Carolina, always seem to be safe from pellagra because they have piped water and good sewerage.This correlation justifies the idea that infectious diseases are spread through the excrement of pellagra patients due to poor sanitation.

A doctor named Joseph Goldberger, who under the direction of the U.S. Surgeon General conducted many studies on pellagra, was very skeptical of this explanation.He believed that pellagra was caused by a nutritionally unbalanced diet, in short, by widespread poverty in the American South.Many patients survive on a diet high in carbohydrates and very low in protein, with very little meat, eggs, and milk, and lots of grains, oats, and polenta.Goldberg argues that the correlation between sewage conditions and pellagra does not reflect causation in any way (as with the oven birth control example).He thinks the root cause is that households with clean pipes are also generally financially well off, and that economic differences are reflected in their diets, which include more animal protein in their diets.

But wait a minute!Why is Goldberg's causal inference necessarily correct?After all, both factions are sitting there, deducing from the relevant data what is the cause of pellagra.Why can't the doctors of the Medical Association say that Goldberger's correlation is also misleading?Why was Goldberg able to overturn someone else's hypothesis that a microbe spread through the excrement of people with pellagra caused by poor sewage treatment?Goldberg's assessment of pellagra also involved a small detail that I didn't mention just now: Goldberg ate pellagra patients' excrement.

Goldberg has a class of evidence that derives from the fact that researchers not only observe correlations, but also collect data by actually manipulating key variables (controlling for manipulations is discussed further in the next chapter).This approach often entails creating conditions that usually rarely arise naturally—it cannot be overemphasized to say that the special conditions devised by Goldberg do not arise naturally. Convinced that pellagra was not contagious or transmitted through a patient's bodily fluids, Goldberg injected himself with a patient's blood and ate secretions from a patient's throat and nose.In addition, he selected two patients: one with psoriasis and one with diarrhea.He scraped off the scales from the ringworm, mixed it with 4ml of the patient's liquid, then added the same amount of liquid excrement, and finally kneaded it with 4 pinches of flour to make small pills.Goldberg, Goldberg's assistant, and Goldberg's wife voluntarily took the pills. (Bronfenbrenner & Mahoney, 1.975, p.11)

Neither Goldberg nor any of the other volunteers developed pellagra.In short, Goldberg created all the conditions for the contagious disease to spread, and nothing happened. Goldberg's manipulation of the causal mechanism proposed by others showed that it was not valid, but testing of his own proposed causal mechanism was nonetheless necessary.Goldberg selected two groups of inmates from a Mississippi prison farm who were free of pellagra and who volunteered to participate in the experiment.One group was given a high-carbohydrate, low-protein diet, the type of food that Goldberg suspected was causing pellagra.Another group of participants was given a more balanced diet (nutrient composition). Five months later, the low-protein group developed pellagra, while the other group showed no signs of the disease.Goldberg's theory has been opposed by those who deny the existence of poverty for political reasons.After a long struggle, Goldberg's hypothesis was finally accepted, because the degree of fit between his hypothesis and experimental evidence is unmatched by any other hypothesis.

The history of pellagra shows that research-based social and economic policies can come at a terrible human cost.But that doesn't mean we should never use evidence from relevant studies.On the contrary, in many cases we must use correlation (see Chapter 8), and in some cases, only correlation is enough (for example, when our goal is to predict rather than determine the cause).Scientists often have to use insufficient knowledge to solve problems.It is important that we exercise caution when applying evidence of correlation.Cases like "Pellagra-sewage" occur frequently in every field of psychological research.This example also reveals the "third variable problem": the fact that there is a correlation between two variables—in this case, the incidence of pellagra and the condition of sewage treatment—does not mean that there is a relationship between the two variables. Direct causality, the correlation arises because both variables are individually correlated with a third variable - here diet - which was not measured.A correlation like this between sewage treatment conditions and pellagra is often called a "spurious correlation": the correlation occurs not because there is a measurable direct causal link between the two variables, but because the two variables Each variable is related to the third variable.

Let's look at an example that happened in real life.For years, debates have raged over the quality of teaching in public versus private schools.Some of the conclusions drawn from this debate vividly demonstrate the drawbacks of inferring causality from correlative evidence.Whether private schools are good or bad compared to public schools is an empirical question that can be discriminated against truth from fiction using survey research methods in the social sciences.However, this does not mean that the problem is a very simple one as long as it is a scientific problem that can be solved.All advocates of the superiority of private schools are implicitly aware of this, because they often cite the empirical fact that students in private schools do better than public schools when defending their views.Despite the indisputable fact that there is a wealth of consistent educational statistics in various studies, the question is whether to use these student achievement data to conclude that private schooling itself resulted in higher grades. Suitable?

Test scores are a function of many different variables that are correlated with each other.To assess how good public schools are vs. private schools, we need more complex statistics than just the correlation between school type and academic achievement.For example, academic achievement is related to many different indicators of family background, such as parental education, parental occupation, socioeconomic status, number of books in the home, and other factors.These characteristics are all related to whether to send children to private schools.Therefore, family background is a potential third variable that may affect the relationship between academic achievement and school type.In short, academic achievement may have nothing to do with school quality, and the result may be that children from well-to-do families learn better and are more likely to attend private schools. Fortunately, there are many complex statistical methods of correlation, such as multiple regression, partial correlation, path analysis (the development of statistics can be attributed in part to psychologists), these complex statistical methods can remove the influence of other variables, come up with a common Factors or covariates are defined to recalculate the correlation between two variables.Ellis Page and Timothy Keith from Duke University (Ellis Page & Timothy Keith, 1981) used more complex statistical techniques to analyze a series of statistical data on the education of high school students. This time Statistics were collected with funding from the National Center for Education Statistics (NCES).They found that when variables reflecting a student's family background and general intellectual ability were excluded, there was little relationship between academic achievement and school type.Other researchers have confirmed their findings (Berliner & Biddle, 1995; Carnoy, Jacobscn, Mishel, & Rothstein, 2005). Clearly, therefore, advocating that private schools improve educational achievement is no different than arguing that birth control requires an "oven".Academic achievement is associated with private schools not because of any direct causal mechanism, but because the family background and general cognitive levels of students in private schools are different from those of students who attend public schools. These relatively complex correlation statistical methods can eliminate the influence of the third variable, but they do not always weaken the strength of the original correlation.Sometimes, after the third variable is excluded, the original correlation between two variables still exists, and this result can be informative in itself.This result shows that the original correlation is not a spurious correlation caused by the third variable.Of course, it does not rule out that other variables can also cause spurious correlations. Thomas, Alexander, & Eckland (1979) provide a good example in data analysis.These researchers found that whether a high school student attends college is related to the student's family socioeconomic status.This is an important discovery, enough to shake the core value of our society - to achieve goals depends on individual ability.It shows that one's success depends on one's economic status.But before drawing this conclusion, we must first consider other assumptions.Here it is: The correlation between going to college and socioeconomic status is an artifact.One of the very obvious third variables is academic ability, which may be related to both college entrance and socioeconomic status. If this variable is excluded, the correlation between these two variables will disappear.After accounting for academic ability, the researchers calculated the data and found that the association between college attendance and socioeconomic level remained significant.Therefore, it is not entirely attributable to differences in academic ability that children from high-income classes are more likely to enter college.Of course, this finding does not rule out the possibility that some other variable accounts for the association between college admissions and socioeconomic status, but being able to use such a reanalysis to rule out the influence of academic ability on the two associations is in itself It is of great significance in theory and practice. Anderson & Anderson (1996) describe how they tested regional theories of violence by testing a range of different theories to see if they could explain the data they collected.They used partial correlation techniques to conduct this study.A previous study showing that violent crime was higher in the southern part of the United States than in the northern part of the United States tested the "heat hypothesis" -- uncomfortable high temperatures enhanced aggression motivation and aggressive behavior (p.740).Not surprisingly, they found a correlation between the city's average temperature and violent crime rates.But after statistically controlling some variables, such as unemployment rate, average personal income, poverty rate, education level, population size and other variables, the correlation between temperature and violent crime is still significant.This greatly increases the credibility of the "heat hypothesis" theory. If variables can be manipulated in some way, and scientific causal inferences can be made from them, there is no reason to make causal inferences based on relevant evidence alone.What is distressing is that when it comes to psychological topics, it is a common phenomenon to draw conclusions based only on correlations. Today, when psychological knowledge is becoming more and more important to solve social real problems, this tendency has caused Losses are also increasing.In the field of educational psychology, a well-known example illustrates this point well. Since the scientific study of reading began 100 years ago, researchers have known that there is a correlation between eye movement patterns and reading ability.Poor readers have irregular eye trajectories, more retraces (right-to-left movements), and longer fixations (pauses) on each line.Based on this correlation, some educators have hypothesized that deficits in oculomotor skills are responsible for reading problems, and thus many "eye-motor training programs" have been developed and implemented in elementary school children.These training programs were carried out a long time before it was found out whether this correlation really meant that irregular eye movements lead to poor reading ability. It is now clear that the correlation between eye movement and reading ability reflects a causal relationship that is quite the opposite of what was previously thought.Irregular eye movements do not cause dyslexia (Rayner, 1998). Rather, it is slow word recognition and difficulty comprehending that cause irregular eye movements.When children were taught to recognize words effectively and understand text better, their eye trajectories smoothed out.There is no relationship between training children's eye movements and improving their reading ability. In the past decade, researchers have clearly pointed out that language problems in text decoding and speech processing are the root causes of dyslexia (Rayner, Foorman, Perfetti, Pesetsky, & Seidenberg, 2001, 2002; Snowling & Hulme, 2005; Stanovich, 2000; Vellutino, Fletcher, Snowling, & Scanlon, 2004), and almost no cases of eye movement patterns causing dyslexia.But a careful rummage in the storerooms of most middle- and above-average schools reveals dusty eye-movement trainers, a sign that thousands of dollars were wasted on equipment, and that's why Correlates consequences that are considered causal evidence. A similar example was discussed in Chapter 1.There is a very popular belief in education and social services that academic achievement problems, substance abuse, teenage pregnancy, and other problem behaviors are all caused by low self-esteem.This claim assumes that the direction of the causal relationship is clear: low self-esteem leads to behavioral problems, and high self-esteem leads to high academic and other achievement.This directional causality assumption powers many self-esteem-enhancing educational programs, and the problem is the same as the eye movement example: inferring a directional causality assumption just because there is a correlation.It turns out that, even if there is a causal relationship, the relationship between self-esteem and academic achievement is more likely to go in the opposite direction; high academic achievement (among other areas of life) leads to high self-esteem (Baumeister, Campbell, Krueger, & Vohs, 2003; Stout, 2000). Our discussion thus far has centered around two pitfalls involved in correlations between variables.One of these is called the orientation problem, which has been illustrated with the example of eye movement and self-esteem.When there is a correlation between variable A and variable B, before concluding that a change in A causes a change in B, we must be clear that the direction of the causal relationship may be in the opposite direction, that is, from B to A.The second is a question about the third variable, which has been addressed through the example of pellagra (and the example of oven-birth control and private schools-academic achievement).A correlation between two variables does not predict causation in any direction, because the correlation occurs when both variables are correlated with a third variable. In some contexts, spurious correlations can easily emerge.This is why selection bias is so prone to occur. The term "selection bias" refers to the relationship between specific subjects and environmental variables. When people with different physiological, behavioral, and psychological characteristics choose different types of environments, selection bias may occur.Selection bias creates spurious correlations between environmental traits and behavioral-biological traits. Let's see how selection bias can create spurious correlations with an example.Quickly name a state that has a higher than average death rate from respiratory disease.One answer, of course, is Arizona.What?etc!Doesn't Arizona have clean air?Could it be that the smog of Los Angeles has spread so far?Has the suburban environment of Phoenix gotten that bad?No, definitely not!Let me stop and think for a moment.Maybe Arizona does have clean air, maybe people with respiratory problems would move here, and then they die here.That's right.If we're not careful enough, the situation described above can arise: We could be misled into thinking that the Arizona air is killing these people. However, selection bias is not always easy to spot.Especially when we expect to see a causal link in advance, this bias is often overlooked, as in the "self-esteem" example.Alluring evidence of correlation, coupled with inherent bias, can fool the brightest minds.Let us look at some examples below. The importance of the selective factor can be easily seen in the national discussion on "the quality of American education," which has been going on across the country for nearly two decades.In this debate, the public is inundated with educational statistics, but researchers fail to warn the public to avoid inferring causality from relevant data, which contains a large amount of misleading selection bias. Throughout this debate, many politically motivated people have tried to continually present evidence that the quality of education has nothing to do with teacher salaries or class sizes, despite the fact that there have been many studies showing that both are important ( Ehrenberg, Brewer, Gamoran, & Williams, 2001; Finn & Achilles, 1999).Among the evidence they cite are SAT (Scholastic Assessment Test) scores from all 50 states.The test, taken by high school students intending to go on to college, does show that student achievement has nothing to do with teacher salaries and educational expenditures.Even if there is a relationship, its trend appears to be in the opposite direction from what is expected.In many states, teacher salaries are high, but SAT test scores are low, and some states have the lowest teacher salaries in the country, while students have high SAT test scores.A closer look at this data set teaches us another lesson: How easy it is for selection bias to cause spurious correlations. For example, on further examination, Mississippi students scored higher on the SAT than California students (Powell & Steelman, 1996; Taube & Linden, 1989), and the differences were significant, with Mississippi students scoring higher than California students The average score is 100 points higher.And Mississippi has some of the lowest teacher salaries in the country, no doubt cheering those who advocate cutting teacher salaries.But wait a minute!Are Mississippi's schools really better than California's?Is the education level of the former really higher than that of the latter?of course not.California schools are better by almost every objective measure (Powell & Steelman, 1996).But if that's true, what about the SAT scores? The answer to this question is explained in terms of selection bias. The SAT differs from those standardized tests that schools typically choose, in which all students are required to take them.But not all high school students take the SAT, so there is a selection bias (Hauser, 1998; Powell & Steelman, 1996; Taube8cLinden, 1989; Wainer, 1989).Only those students wishing to enter university take this test.This factor can explain why the average score varies from state to state, and why some states with the best education systems have low average scores on the SAT. Selective factors manipulate SAT scores in two ways.First, some state universities require scores from the ACT (American College Test), not SAT scores.Therefore, in these states, only those students who plan to go to college outside the state will take the SAT test.Most of these students are most likely to have better family backgrounds or higher academic ability than the average student.This also happened with the Mississippi and California exams.Only 4% of high school students in Mississippi take the SAT, while in California it is as high as 47% (Powell & Steelman, 1996). The second selectivity factor is more subtle.In states with high-quality education, many students are more inclined to continue their education after high school.In these states, a high proportion of students who take the SAT test also includes some students with lower academic performance.And in states with high dropout rates and low quality education, the percentage of students who want to go on to college is low.In those states, the students who ended up taking the SAT represented those who were better academically in those states.Therefore, their average grades are naturally higher than those in states where most people take the entrance examination. This example of SAT scores also serves as a lesson in how difficult it is for the public to correct misleading data without the simple methodological and statistical thinking skills taught in this book.In the first edition of this book, written in 1983, I included the example of misuse of SAT scores stemming from selection bias.More than a decade later, in the fourth edition in 1994, I covered an article by Indiana State professor Brian Powell (1993) that analyzed the Writer George Wel1 wrote a column in 1993, and you can guess what it was about... Wel was against public education spending because states that scored high on the SAT didn't have high education expenditures.Powell (1993) noted that the states Weil singled out with particularly high SAT scores—Iowa, North Dakota, South Dakota, Utah, and Minnesota—have 5 percent, 6%, 7%, 4% and 10%, but the overall rate of taking the SAT test in the United States is more than 40%.The reason is that, in these states, the ACT is mandatory for admission to public schools, and only those students who plan to attend prestigious private schools out of state take the SAT (Powell, 1993, p. 352).In contrast, New Jersey, which Will cites, has low SAT scores and high education spending, where 76 percent of high school students take the test.Obviously, compared with New Jersey, students who take the SAT test in South and North Dakota deserve to be called an "elite teacher". In the journal Educational Researcher, psychometrician Howard Wainer (1993) analyzed an article based on a study by the Heritage Foundation published in 1993 The June 22 issue of The Wall Street Journal.This foundation is a think tank with strong ideological leanings, and has always been opposed to education investment - guess why? - States with low SAT scores tend to be high in education investment.Weiner's article not only reveals this consequence of selection bias, but also demonstrates that this relationship is stronger if the test is analyzed using a representative sample rather than a self-selected sample (National Assessment for Educational Progress, NAEP). The opposite happens: states with high education costs have high SAT scores. Powell & Steelman (1996) confirmed this relationship using the partial correlation technique mentioned earlier.They found that once each state was statistically controlled for differences in the proportion of students who took the test, each $1,000 increase in per-student education spending was associated with a 15 percent increase in the state's average SAT test score.Despite the overwhelming evidence that, without statistical correction, selection bias renders state-to-state comparisons of SAT scores meaningless, the media and politicians continue to use the uncorrected points to achieve their political goals. Examples from clinical psychology can show how deceptive and counterintuitive the problem of selection bias can be.Research data sometimes show that people who receive psychotherapy have lower cure rates for various addictions—such as obesity, drug use, and smoking—than those who do not (Rzewnicki & Forgays, 1987; Schachter, 1982).Do you want to know why?The reason is not because psychotherapy makes addicted behaviors more difficult to change, but because those who seek psychotherapy have more complex and problematic addiction problems and rarely heal themselves. Wainer (1999) tells us a story from World War II that reminds us of the counterintuitive side of selection bias.He referred to an aircraft analyst who had been trying to determine where on the plane the reinforced bulletproof layer should be placed by analyzing the distribution of bullet holes in the plane.His final decision: put the reinforced bulletproof layer on the return plane where there were no bullet holes.His reasoning is that the probability of a bullet hitting every part of the plane is equal, so if a plane can return, it means that the place where the plane was hit by the bullet must not cause fatal damage to the plane.Those places without bullet holes appear to be critical, because if they are hit in this part, the plane may not return.Therefore, the reinforced bulletproof layer should be installed on the part of the returning aircraft that was not hit! In summary, the rules for the reader in this chapter are simple: beware of the occurrence of selection bias; avoid causal inferences when there is only correlation.It is undeniable that limited causality does exist in complex correlation data.It is also undeniable that relevant evidence helps to demonstrate the convergent validity of hypotheses (see Chapter 8).Consumers of psychological knowledge, however, would rather be skeptical than be fooled by correlations that falsely imply causation. The purpose of this chapter is to convey the idea that the mere existence of a correlation between two variables does not guarantee that a change in one variable will cause a change in the other, that is, correlation does not imply causation.In a third-variable problem, a correlation between two variables does not imply a direct causal path between them, since the correlation may arise because both variables may be related to a third variable that is not being measured.In fact, if the underlying tertiary variable is also measured, correlation statistics such as the partial correlation (discussed in Chapter 8) can be used to assess whether the tertiary variable determines the relationship.Another reason that makes the interpretation of relevant statistics difficult is the problem of directionality.In fact, if two variables have a direct causal relationship, the direction of the causal relationship cannot be judged from the correlation. Selection bias is responsible for many spurious correlations in the behavioral sciences.In fact people choose their environment to some extent and artificially create correlations between behavioral traits and environmental variables.As Goldberg's example illustrates (and we'll discuss further in the next two chapters), the only way to ensure that selection bias doesn't mess things up is to conduct real experiments with all the variables manipulated.
Notes:
Press "Left Key ←" to return to the previous chapter; Press "Right Key →" to enter the next chapter; Press "Space Bar" to scroll down.
Chapters
Chapters
Setting
Setting
Add
Return
Book