Thursday, November 19, 2009

Propagate Errors, not Bullshit!

Thank-you, Eric.  I never fully understood the significance, or lack of significance, of p-values in clinical drug trials before your two posts.  It makes many expensive drugs much less impressive, now that I can see how marginal of an effect they offer for $$,$$$ per year per patient.

I have to admit that I also read the chapter about p-values in the back of "the best detective story ever"*, Statistics by David Freedman, Robert Pisani, and Roger Purves. (You can pick up used older editions at very reasonable prices.  It is a fantastic introduction to how to apply statistics intelligently.)

I think that science education should include more rigorous statistical training.  The only training I ever received in statistics was in Honors Freshman Chemistry at Berkeley.  We were instructed to read the first chapter (36 pages) in our laboratory textbook, Chemical Separations and Measurements: Theory and Practice of Analytical Chemistry by Dennis G. Peters, John M. Hayes and Gary M Hieftje.  Then we did a problem set to make sure that we understood the normal distribution, how to propagate errors and how to report our average values and 95% confidence intervals. As meager as that was, that was infinitely more than the nonexistent instruction that I received from the Physics department.

We could have used more training earlier in our careers.

When I taught physical chemistry lab, I read An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements by John R. Taylor, one of the assigned textbooks.  That's another highly-recommended classic book.

* Actual quip from the back cover of the 3rd edition.  Isn't that the most lovely quip you can imagine for a statistics book?

How did you learn statistics?  This is a question for everyone, not just trained scientists.

Tuesday, November 17, 2009

What is the P-Value of Bullshit? Part Two

Another guest post from Eric

Recall in part one of "What is the P-Value of Bullshit?" we did a thought experiment, called "Study #3" in which we encountered a measurement, four heads in a row, with p-value as low as .06, which was nonetheless almost certainly due to random statistics.

Study #4: In Which the Naive Interpretation of P-value Is Partially Redeemed

Surely there must be at least some scenarios in which we can interpret the p-value as "probability of our result being bullshit"? Well, yes, and here's one: Suppose our jar of 1000 pennies now contains 500 two-headed pennies and 500 ordinary pennies. So now a two-headed penny becomes a mundane hypothesis, no longer a "way out there" sort of thing. Suppose we pick a penny at random, and do our four flips and get four heads, a p=0.06 result.

Q4: Now what's the probability now that we are not holding a two-headed penny?
       The arithmetic goes [(1/2)/16]/[(1/2+(1/2)/16)] = 0.0588

A4: Almost 6%, about the same number as our p-value!

So in Study #4, in which we test a "mundane" hypothesis, your old thesis adviser's naive interpretation of p-value works pretty well.

As a general rule, in order to believe a measurement is real, you should look for a p-value that is small compared to how "out there" the result is you're trying to confirm. If you're testing a mundane hypothesis that is as likely to be true as not, then p=0.05 is probably good enough for you. But if you are trying to confirm that you have hit some one-in-a-thousand two-headed jackpot, then you'd better wait until you get a p value of safely less than 1/1000 (e.g., better flip 13 heads in a row, not just four!) Incidentally, the philosophy of this post, and of this approach to hypothesis confirmation, is based on Bayesian statistics.

You might complain that a sliding criterion for adequate p-value makes the believability of a statistical measurement a matter of subjective judgment. After all, usually we don't know ahead of time that we are fishing for a precisely one-in-a-thousand payoff, and we can only estimate how far "out there" our original hypothesis is.

My response to your complaint: tough tootsie roll. No one ever said doing science was going to be easy. You can blindly apply p-value analysis, and be a hack, or you can bring some careful thought to a problem, and be a real scientist. And speaking of hacks...

For a real-life example, let's go back to that NYT story, which had to do with two candidate AIDS vaccines. Each vaccine had been previously tested and shown quite decisively to be ineffective. The US Army and the NIH jointly decided to sponsor a placebo-controlled, Phase III human subjects trial on the use of the two vaccines in combination.Has the idea of using two vaccines in combination, when each is shown to be ineffective on its own, ever worked?

Dozens of AIDS scientists protested that this hypothesis was such a long shot that testing it amounted to a huge waste of AIDS-battle resources.Was it a one-in-a-thousand long shot, like our two-headed penny hypothesis? Who knows? But in any case it was surely one-in-25 or worse. The NIH and the Army pushed ahead, lined up 16,000 volunteers and spent $100 million, and in the end published a p = 0.04 result* claiming that the combined vaccine worked a wee little bit, providing immunity to only one in every three who got the full combined dose.

Does p = 0.04 mean that the probability that the published result is due to statistical noise is only 4%? The scientists interviewed in the NYT seemed to think so, but alas the study is most likely a 100 million dollars worth of statistical noise: bullshit.

At this very moment, somewhere in the world a scientist is testing a long-shot hypotheses: does eating a diet of only artichokes cure breast cancer?Are red-headed children more responsive to acetaminophen? There are thousands of such investigations going on. They are long shots, but every once in a while a seemingly bizarre hypothesis turns out to be true, so what the hell, no harm in checking it out?

Problem is, with many thousands of long-shot studies going on at any one time, by random chance you will get hundreds of "p = .04" results supporting hypotheses that are in fact incorrect. If you're from the naive school of p-value interpretation, you'll celebrate your p = 0.04 result by publishing a paper, or better, holding a press conference!

And if you are stats-challenged science journalist, you'll write the bullshit up for the New York Times.

*The "p=0.04" number actually comes from a fairly "aggressive" analysis. Playing more strictly by the rules, the study's authors got a still-less-impressive p=0.15.

**Thanks to Jonathan Golub of Slog for providing the point-of-departure for this two-part post. Always lively, readable and informative, Golub is, along with BMGM herself, one of my favorite bloggers on science and science policy. Like all prolific science writers, he has on rare occasions oversimplified and on very rare occasions, totally screwed up.

Monday, November 16, 2009

What is the P-Value of Bullshit? Part One

Eric here, sporadic guest poster on Bad Mom, Good Mom. I am a laboratory scientist working in Colorado.

Last week, BMGM aired one of her pet peeves, confusing correlation with causality.

My own statistical pet peeve? The oft-abused concept of p-value. Probably a majority of practicing laboratory scientists routinely misinterpret p-values. I'm not talking mere bit players either: last month the NYT reported on a Phase III human-subjects AIDS vaccine trial, run in Thailand by the US Army and the NIH. Naive thinking about statistics led to the publicized conclusions of the study being almost surely crap.

We'll come back to how we know the conclusions are crap, but first let's do a thought experiment. Imagine taking one two-headed penny and mixing it in with a jar of 999 ordinary pennies. Shake the jar up and pull out one penny. Don't look at it yet! Let's do some scientific studies.

Study #1: In Which We Collect No Data At All

Q1: Before you've looked at the penny you took out, what is the probability that the coin you are holding has two heads?
A1: You got it -- one in a thousand, or 0.1%.

Study #2: In Which We Collect Deterministic Data

Now look at both sides of the penny. Suppose you notice that on both sides, there is a head!
Q2: Now, what is the probability you are holding a two-headed penny?
A2: Yep -- unity, or 100%. OK, we're ready for something more difficult!

Study #3: In Which We Collect Some Odd-Seeming Statistical Data

Throw your two-headed coin back in, shake the jar, and again reach in and grab a single penny. Don't look at it yet! Now suppose you flip the coin four times and get a slightly unusual result: four heads in a row.
Q3: OK, given you flipped four heads in a row, now what would you say is the probability that your penny has two heads?

Well, if we were doing biomedical research, the first thing we do when we encounter statistical data is calculate the p-value, right? Turns out that if you took an ordinary (not two-headed) penny and flip in four times, then the probability you will get heads four times in a row is one in 16, or about 6%. So now we can (correctly!) define p-value by example: four heads in a row is a p-value 0.06 measurement.

Can we turn this idea on its head and say, "If we flip four heads in a row, then there is only a 6% chance the coin is not a two-headed coin"? Many practicing scientists would say "yes", but the correct answer is no, NO, Goddammit, NOOOOOO!

In our Study #3, picking a two-headed coin out of the jar is a very rare thing to do, one in 1000, whereas picking an ordinary coin out and flipping four heads in a row is only a slightly odd thing to do, (999/1000)(1/16), or about one in 16. Thus we get:

A3: the probability you are holding a two-headed coin is very small, (0.001)/(0.001+(999/1000)/16), or about 16 over 1000, only 1.6%. You are 98% likely not to be holding a two-headed coin!

Bottom line: your seemingly significant, p = 0.06 measurement of four heads in a row was not strong evidence of a two-headed coin, and saying otherwise would be, in the technical jargon of the trained laboratory scientist, "bullshit".

Perhaps your research adviser told you the p-value meant "probability your measurement is merely random noise". Is s/he always wrong about that?"

Nah, the old geezer got it right once in a while, if only by accident. To find out about that exception, stay tuned for part two of this post!

**Thanks to Jonathan Golub of Slog for providing the point-of-departure for this two-part post. Always lively, readable and informative, Golub is, along with BMGM herself, one of my favorite bloggers on science and science policy. Like all prolific science writers, he has on rare occasions oversimplified and on very rare occasions, totally screwed up.

Friday, November 13, 2009

Leaf Yoke Detour


Because I will do anything to avoid seaming up Shadow October Frost.  Including knitting another sweater.  Angela Hahn's Leaf Yoke Top [Ravelry link] in Knit Picks Comfy worsted, a super-soft pima cotton and acrylic blend.  The catalog called the color "planetarium".  I was expecting an inky navy-black.  It is really a dark Prussian blue.  Now why don't they just say that in the catalog instead of giving it an artsy-fartsy name?

Proof that cell phone use impairs driving

I was driving east on crowded Manhattan Beach Boulevard, performing my soccer mom duties, when this humongous SUV kept veering into my lane.  I looked over to see if the driver was drunk or perhaps having a stroke.  Nope, he was driving while talking on his hand-held cell phone, which BTW is illegal in California.

I told Iris to quickly get out the camera and catch him in flagrante delicto. He looked at our car and the camera and put down the phone. A few hundred yards later, he picked up the phone again. Iris managed to get some good photos.  Notice the height differential between our Prius and his Porsche Cayenne S.  His bumpers are aimed squarely at our head and shoulder height.  He's not paying attention to where he is aiming that 5000 pound weapon.

If you see this car, steer clear. He's a menace to society.

How to become a home cook

I've been thinking, reading and writing a great deal about food  lately.  It ramped up after I read Animal, Vegetable, Miracle.  I meant to post something about Michael Pollan's screed, Out of the Kitchen, Onto the Couch, but there was plenty of ink and pixels spilled across the internet without my contribution.

It's easy to judge people for watching others cook, rather than getting into the kitchen and cooking themselves.  But, what if someone doesn't know how to cook?  Where do they start?   How does someone who doesn't know a clove of garlic from a head of garlic* get started?

Cookbooks by celebrity chefs that are familiar from TV may not be the best place to start.  Professional chefs cook on restaurant scale on professional equipment (50,000 BTUs?  No problem!) with rare ingredients.  Years ago, a NYT article claimed that ~20% of cookbook recipes don't even work when tested in a home kitchen.  Recipes from celebrity restaurant chefs were heavily over-represented in the bad recipes .

I am a huge fan of Marion Cunningham because she tests each recipe in a home kitchen, using basic home equipment.  Then she has others test the recipes in their home kitchens.  I respect that attention to detail.

Learning to Cook with Marion Cunningham is the best book I have ever seen for learning how to cook. She assumes no prior knowledge, explains every term and shows every step.  She developed the book while teaching rank beginners how to cook.  If you want to learn how to become a home cook, this is the place to start.

I've compiled a list of my most useful cookbooks (plus one example of the type of cookbook I hate). 

A bilingual compilation of Taiwanese recipes doesn't have an ISBN # and doesn't show up on that list.  But it's also highly recommended. It was put together by the Northern California Chapter of the North America Taiwanese Women's Association.  My mom might still have more.  Email me if you want a copy.

* Don't laugh, but I once had a housemate who borrowed one of my cookbooks and made a garlic pasta with three heads of garlic.  He thought that was a lot of garlic, but the recipe said 3 cloves of garlic.  He didn't know what a clove was, but he assumed it was a unit of garlic.  The entire garlic bulb looked like the basic unit of garlic to that novice.  So do not assume prior knowledge.  Newbies are not necessarily dumb, but they don't know the jargon yet.

Tuesday, November 10, 2009

The catchy name school of science

Donald Knuth wrote in The Art of Computer Programming something to the effect of "The bubble sort has nothing to recommend it except for a catchy name."

What's catchier than "dandelion kids" and "orchid hypothesis"?  Read the Science of Success in the December issue of Atlantic.  Then read Genetic 'breakthroughs' in medicine are often nothing of the sort
Don't believe everything you read about genes and disease in prestigious journals like Science and Nature, say Marcus Munafò and Jonathan Flint. A lot of it is simply wrong.
I don't have time for a longer post. I have looming deadlines at work and at home. But my money is on Enrico Fermi, Marcus Munafò and Jonathan Flint.    ;-)

Discuss among yourself.

Sunday, November 08, 2009

Clothespin Extinction

I spent a week in Boulder, attending to IT issues at NCAR and a satellite data users' workshop. I had a few free hours* one afternoon and browsed the aisles of McGuckin Hardware.  I was looking for the clothespins I use, shown in How to Use a Clothesline

I learned that Penley, the maker of my old sturdy wooden clothespins, has discontinued domestic production. They now make their clothespins in China. I have no idea if the quality is the same. McGuckins hardware sells both the Penley wooden ones made in China, and a plastic variety.  Has anyone used them?  Do they hold wet, heavy laundry?

I have no use for the high-style ones mentioned in this story:
Nowadays plastic clothespins are available in endless variations, including a new one that has gone into widespread production, Zebra’s “sweet clip,” made with both hard and soft plastics, using a dual-injection manufacturing process. The hard plastic is in the long handles, while two softer cushions sit where the pin grips the clothes. Zebra developed a dual-plastic toothbrush 15 years ago, applied the principle to clothespins in Europe in the late 1990s, obtained a worldwide patent, and captured 8 percent of the global clothespin market. The pin is sold in North America under the name Urbana.
     “We love to target stupid products,” says Xavier Gibert of Zebra. “When you walk into a megastore, most of the time you see stupid products, boring products. You buy them because you need them. We target basic products to make them come alive, able to talk to people.” And what does the Urbana clothespin say? Something along the lines of “I’ll be gentle.”
     “The key of this peg is not to be able to hold very heavy clothes,” says Gibert. “It’s much more dedicated to sensitive clothes.” Response to the pin has been enthusiastic. “People were attracted by the design. They said, ‘Wow, we love the shape.’”
If we want to save carbon and achieve energy independence, we will need clothespins that can securely hold heavy and wet laundry.

* Ok, I squeezed in visits to Elfriede's Fine Fabrics and Shuttles, Spindles and Skeins.  There was stash enhancement.  Photos after I come up for air.  This is the first weekend in a month in which both Bad Dad and Bad Mom were home at the same time.  Iris celebrated by coming down sick.  She was a trooper at soccer playoffs this weekend.  Her team was short three players, and she played two quarters, even though she was very weak and tired.  She says I revived her for the fourth quarter with my magic zucchini-chocolate chip muffins.

Monday, November 02, 2009

TSA Story

I was running late to catch a flight two Sundays ago.  When I went through the security checkpoint, everything came out except for my laptop.  What was wrong?

I saw two TSA employees in discussion and looking at my laptop. They signaled a third one to come over and take a look. Yipes! I asked them if there was something wrong.

One TSA employee said to another, "Why don't we ask her?"

So she turned to me and asked, "Did you make that case?"

Gulp.  "Yes."

She smiled and turned to the other one, "I told you that you can't buy a case like that."

Remember my not so minamalist laptop case?

I survived the heaviest October snowstorm in Colorado since 1997 and got home to LAX safely.  I will be in Pasadena (Caltech) later this week.  In two weeks, I will be in Boston.  In four weeks, I will be in DC.  It looks like that laptop case will get around. 

Sadly, I lost the minimalist camera case in Twain Harte, CA; the one I made to replace it was lost inside Hoover Dam.  I am sad about the second one.  I used a vintage button and the last bits of two fabrics.  I didn't even have time to blog about it and Iris dropped it during the Hoover Dam tour.  She did take some nice photos, though.  Perhaps she can help me make the next case.

Sunday, October 25, 2009

Vegas, baby!



Our family went to Vegas and the first thing we did after checking into our hotel is search for Thai food in a strip mall.If Jonathan Gold says Lotus of Siam is the best Thai food in the US, then we have to try it.

Mark and Iris walked around the strip one evening, to watch the fountains and such, but I stayed in the room to rest.  I haven't been feeling well lately and smoke was the last thing I needed.


What else did we do?  Bad Dad was there to work, but we made a mommy and me trip to the Hoover Dam.

We took Belle along on the trip; she is in almost every photo.  We took the long tour inside the dam.  They measure the distance between the metal pins to gauge shifting.

The obligatory photo of the generators.

Look at the bicycle inside Hoover Dam!

Photo op at the air vent.
The new visitor center has a very nice exhibit showing how they built the dam, and how they generate electricity from the potential energy of the water.  More on that later.

We also visited the Liberace museum.

Many think he was a kitsch joke.  But he was so much more than that.  He invited my sister, our mom and myself to one of his concerts and put us up in the VIP box.  Our violin teacher played in his orchestra.  By all accounts, he was a very nice man.  Besides, he loved clothes and kept a whole army of couturiers employed.  More on that later.