The Chronicle of Statistics – Read “Lady Tasting Tea”

Original link: https://blog.devtang.com/2022/06/08/the-lady-tasting-tea-book-summary/

I recently finished reading “Women Tasting Tea – How Statistics Changed Science and Life”. This is a chronicle of the development of statistics in the nature of popular science. In addition to letting everyone understand the development of statistics, this book also gossips about the life stories of statisticians such as Fisher, Neiman, and Pier.

Here are some notes and reflections.

1. The story of a woman tasting tea

In order to echo the title of the book, the opening of the book tells a gossip story about a lady drinking tea. The story takes place in Cambridge, England, in the 20th century, when some university faculty and their wives are having afternoon tea, and one of the women insists that pouring tea into milk doesn’t taste the same as pouring milk into tea.

Unsurprisingly, the university professors in the room thought it was ridiculous, because there was no chemical difference between the two compounds. However, a diminutive man at the scene took the matter seriously.

After he ran to the kitchen for a meal, he started his experiment: the man handed the first cup of tea to the woman, and the woman tasted it for a minute and judged that it was made by pouring milk into the tea. The man took note, made no comments, and handed her a second cup of tea. In the end, everyone was surprised to find that the lady really identified all the tests correctly.

The book does not expand too much on how the lady does this, quickly cutting to the perspective of the hero of the book.

The man in the story is the hero Fisher. Fisher is the founder of the whole of modern statistics, publishing the influential “Statistical Methods for Research Workers” and “Experimental Design”.

2. P value and significance test

The process of making products now is often accompanied by user research and testing. In the process of application research, we will make some assumptions, and then use the significance test to see whether the hypothesis is significant.

To judge whether a hypothesis is significant, we introduce a P-value, which represents the probability of overturning the hypothesis.

Take the story of the lady tasting tea just now, for example. If we only tested it once, the lady actually had a 50% chance of guessing right, with a P value of 0.5. Obviously, this P value is not significant enough.

But if we test it 10 times in a row, the lady guesses right. Then the P value at this time is only 0.001. At this time, the P value is already very significant.

3. Real data is very important

Statistics often lie, so you can’t blindly trust the data. The best way is to go deep into the user and check the raw data.

There are many cases in this regard. The book cites a statistical case of crime rate and sentence time.

In the case, a study was done to compare the relationship between the length of time spent in prison and the recidivism of adult male prisoners. The results showed that prisoners with shorter sentences had a very high rate of reoffending. People use this as a basis to think that such people should be given long-term criminal law.

The book’s “Cunliffe” reviewed the research. She is not satisfied with the check calculation statistics table, but hopes to chat with the original data behind it: the criminals. As a result, she soon discovered that almost all of these prisoners were “poor and pathetic old people, because they had nowhere to go, so they committed crimes to get a chance to go back to prison”. And the researchers counted their multiple incarcerations as different prisoners when they made the table.

After the data was removed, there was no clear relationship between time served and recidivism.

The complexity of statistical work

Statistics are complex, and misleading conclusions can be drawn if you are not careful. The book introduces many such pits and proposes some solutions.

4.1 Case 1: Crop Harvest Research

When Fisher was studying crop harvests, he found that it was difficult to ensure that each field was the exact same sample. Some fields may have been affected by the use of some fertilizers in the past, which would interfere with the experimental data.

Fisher introduced a random process for this, dividing the farmland into several blocks, and the experimental group and the control group in each experiment were determined by random methods.

Randomization makes the individual differences of the sample diluted by uniformity. After the number of experiments reaches a certain level, it is difficult for individual differences to be concentrated only in a certain experimental group.

4.2 Case 2: The relationship between smoking and lung cancer

Most modern viewpoints believe that smoking is significantly correlated with lung cancer. But Fisher thinks the research is not rigorous enough, and he puts forward the following hypothesis:

If there is a gene A, people who carry this gene are generally more likely to like smoking than people who do not carry this gene; at the same time, this gene is naturally prone to lung cancer.

So you can see that people who smoke are associated with people who have lung cancer, but maybe the reason behind it is that they carry this gene that makes them prone to lung cancer even if they don’t smoke.

Satisfy Fisher’s challenge: You can only randomly select two groups of people, one that forces them to smoke, and the other that forces them not to smoke. In this way, the interference of the “smoking gene” can be excluded. But, morally and ethically, such experiments simply cannot be performed.

You see, it’s actually really hard to prove cause and effect statistically.

4.3 Case 3: Effectiveness testing of cancer drugs

The book cites an experimental dilemma between a cancer drug treatment group and a control group. Because patients respond to feelings of efficacy, if a control group patient feels ineffective, they may abandon their current treatment and switch to another treatment. In this way, the people left in the placebo group may be people who have good immune systems and feel that the treatment “works”.

Ultimately, the test results may be that the placebo is even better than the drug.

However, you cannot force a cancer patient to risk his life to end placebo treatment, both humanely and legally. More research has developed here, and eventually modern medicine will pursue the minimum number of patients who receive the “suboptimal treatment” of the placebo as an analytical test standard.

4.4 Case 4: Decision Paradox

Suppose we accept the idea of hypothesis testing and significance testing. Then if the probability of something happening is only one in ten thousand, then we should reject this hypothesis.

But let’s consider a scenario: we organize a raffle with 10,000 lottery tickets, each with the same probability of winning.

So, the probability of winning lottery number 1 is 0.0001, which we reject.

The probability of winning lottery number 2 is 0.001, which we also reject.

For any lottery ticket, we reject this assumption.

There are many similar decision paradoxes.

5. Small-probability decisions

Many times we are faced with a small probability, but we should still be full of expectations and work hard for the small probability.

For example: the probability of your child being admitted to Tsinghua University and Peking University is less than 0.01, but everyone will still work hard to train their children.

For another example, if we encounter a disease and the survival rate is only 0.01, we should still expect miracles and cooperate with treatment.

When faced with a small probability event, the world is synchronously split into multiple parallel universes, and we may all be the parallel universe where a lucky event (small probability) occurs.

Full of hope, do your best, and obey the destiny.

This article is reprinted from: https://blog.devtang.com/2022/06/08/the-lady-tasting-tea-book-summary/
This site is for inclusion only, and the copyright belongs to the original author.