Thursday, September 27, 2012

I am the Particle Man: Observer effect on family probability? (Part 3 of 3)

Since Monday and Tuesday we've been trying to answer what seems like a very simple question: What are the odds of having different sex ratios in a five-kid family? Like, what are the odds that Ann and Mitt Romney had those five boys?
We started investigating this question because I was viscerally annoyed with the simple calculation that 1/32 is the probability that a family of five will be all girls or all boys. Those odds imply that it's rare when it should be just as likely as any family of five.

If you haven't read them yet, please see Monday's and Tuesday's posts before starting here. They're the start of this journey that I'm chronicling, ending with today.

We stopped on Tuesday with a change of strategy in estimating the odds of different family compositions. See my long list of all 32 possible series of boy/girl in a five-kid family and add up the ways to achieve the six different family compositions. Here are our results:

What are the odds that you'll get...
5 girls, 0 boys? 1/32
5 boys, 0 girls? 1/32
4 girls, 1 boy?  5/32 (there are 5 possible series out of 32 that make up this boy/girl ratio in a family)
4 boys, 1 girl? 5/32
3 girls, 2 boys? 10/32 (there are 10 possible series out of 32 that make up this boy/girl ratio in a family)
3 boys, 2 girls? 10/32

(Psst. I googled how to calculate probabilities and found this website and DINGALING! they're actually using my example. And here's a nice site showing how to work with a binomial equation rather than list all the possible 32 outcomes like I did Tuesday.)

This sort of thinking about probabilities should remind you of how the odds of the outcomes of rolling the dice are not uniform across all numbers. Your best bet is a 6, 7, or 8 because there are more ways to get those three numbers than the others.

(The following list was edited thanks to a very nice comment, February 5, 2015) 

to roll a ...
2 ... there is 1 way: 1+1
3  ... there are 2 ways: 2 + 1; 1 + 2
4 ... there are 3 ways: 3 + 1; 1 + 3; 2 + 2
5  ... there are 4 ways: 3 + 2; 2 + 3; 4 + 1; 1 + 4
6  ... there are 5 ways: 3 + 3; 2 + 4; 4 + 2; 5 + 1; 1 + 5
7  ... there are 6 ways: 6 + 1;1+6; 5 + 2; 2 + 5; 4 + 3; 3 + 4
8 ... there are 5 ways:  4 + 4; 5 + 3; 3 + 5; 6 + 2; 2 + 6
9  ... there are 4 ways:3 + 6; 6 + 3; 5 + 4; 4 + 5
10  ... there are 3 ways: 5 + 5; 6 + 4; 4 + 6
11  ... there are 2 ways: 5 + 6; 6 + 5
12  ... there is 1 way: 6 + 6

(Psst. If you still think 7 is lucky for rolling the dice, then you should have more of a think about probability.)

And just like 6,7,and 8 from rolling the dice, having three boys and two girls (or three girls and two boys) has a "luckier" or higher probability, or more probable, more likely sex ratio in a family of five children.

How do we know which of the two sets of probabilities that I calculated--Tuesday's or today's--is correct?
All girls, no boys:         1/6 or 1/32?     (17% or 3 %)
All boys, no boys:        1/6 or 1/32?     (17% or 3 %)
Four girls, one boy:     1/6 or 5/32?     (17% or 16%)
Four boys, one girl:     1/6 or 5/32?     (17% or 16%)
Three girls, two boys:  1/6 or 10/32?   (17% or 31%)
Three boys, two girls:  1/6 or 10/32?   (17% or 31%)

I see very clearly why our second method (in bold) is superior to our first which was to incorrectly divvy up the odds in sixths. That is, I can see clearly why the odds of having five girls is still 1/32 and not 1/6. There are so many more ways to make a family of five with four girls or with three girls or with two girls than to make one with five girls, so you can't possibly have evenly distributed 1/6 odds for all those types of families of five children.

Initiate mind-blowing sequence.
But when you take the long view, 1/6 (or at least higher odds than 1/32) for a streak of five girls still seems not so crazy.

After all, the odds of having five children of all the same sex are only the lowest, the rarest, becuase we've arbitrarily decided that our family in question maxes out at five!

Would we find those same low odds of 1/32 for five girls in a row if the family had six kids--having more opportunities to have streaks of five girls during that span?

That's (a +b)^6 and if you scratch it out on a piece of paper you don't need to expand the binomial equation. Odds of having six straight girls is 1/64.  Same for any series you can make out of six births (all of which add up to a total of 64 different series of boy/girl adding up to six kids).

And then by just sketching or scribbling (but if you're fancy, you can also just use the binomial) you can see how you can get only three series (gggggg; gggggb; bggggg) to have five girls in a row to occur in a family with six births.

That means the odds of having a streak of five girls in a six child family is 3/64 which is 4.6875% (compared to 1/32 or 3.125% in a five child family).

So the odds are slightly larger in a bigger family.

Wait. Did I just do that right?

Let's try a family of seven to make sure I did.

Here are all possible streaks of five girls in a family of seven...

The odds of having five girls in a row in a family of seven =  8/128 = 6.25%

Okay, with a bigger family, the odds are even larger.

What about a family of eight? The odds of having five girls in a row in a family of eight =   19/256 = 7.4%
(Trust me... I scratched it out. And it could be more than 19/256, but my contacts fogged up before I could find anymore.)

Okay, yes. The odds of having a streak of five girls increase as the size of the family increases.

Wait. What?! How do odds change? Odds are odds?

Instead of going up in scale again to check, making calculations even harder, let's go down in scale to check our math. We already know from Tuesday that the odds of a five-kid family having a streak of four girls is 3/32 (ggggg; ggggb; bgggg)  = 9%.
Okay, now what about in a four-kid family? The odds of having four girls in a four-kid family are 1/16 = 6%.

WHAT?! Just by making a fifth baby, you've just seriously upped your chances of having a streak of  four girls. Your odds go from 6% if you max out at four kids to 9% if you max out at five kids. That sounds reasonable, but...

This means your odds of having a streak of four girls or five girls (or anything!) depend on what DIDN'T YET HAPPEN IN THE FUTURE.

I'm sorry. Hold on. Time out for a sec. My brain is literally inside out right now.

Am I seriously figuring out now--Today. This minute.--that probability is vulnerable to what hasn't yet happened in the future? And that the present can change past probabilities?

That sounds so familiar. That idea. But never do I think I've ever come to it by myself.

Until now I think it was always just a sentiment that Deepak Chopra hugged into to Oprah who gifted to Martha Stewart who baked into a lemon zest fortune cookie.*

So predicting or estimating frequencies can change by the very nature of the present? Very interesting.

Doesn't it sound like we're crossing streams with the whole quantum mechanics pickle about changing a particle's state the moment it's observed? (and here)

Are people just particles?!?!


Am I on psychedelic drugs and where can you get some too? 

This shouldn't be so bleeping mind-blowing should it?

Unless... unless... As my repulsed reaction to a snappy "1/32" indicated at the outset back on Monday:  Small scale probabilities are different from large scale ones. Probabilities become different the bigger and bigger that you get.

And it's no secret that people who think evolutionarily think big. We're transcending space and time constantly. What? We are. You're welcome to join us. It's fun here. No vomit comets necessary either.

So if we approach 100, 1,000, or say... um... just to pull a random number from the air... SEVEN BILLION births, we should expect to have a much higher than 1/32 chance in finding a streak of five girls.

True. Nobody's making a family of seven billion children. So the question is, do we treat each family as defined by a finite probability or do we see births and families in our species as part of one big series with vastly different probabilities at that level than at the level of the family?

If it's the latter, we should expect what, exactly? Greater odds than 1/32 for having five girls in a row that's for sure ... Greater than 19/256 that's for sure ... The odds are x (where x = ways to make 5+ girls in a row out of 7 billion) divided by 7 billion and so they're going to be greater than 1/32 by a long shot! It may even be close to our earlier totally gauche calculation on Tuesday of 1/6 or it could be even higher!**

So why do we even calculate odds at the family unit level? Just to practice our algebra? Are they really as meaningless as my gut was screaming out in Monday's post?

No no no. I know why we calculate them in our math workbooks and our homeworks. It's not just algebra practice, these are hypotheses we can test. We can use these expectations to see whether there is any factor skewing the outcomes of some families, perhaps there is something biochemical in the babymaking process that results in one kind of offspring for some parents. We'd have to look at families (to account for genes, etc) or at clusters of people living in the same environment (to account for bio-enviro interactions). If we find that within those sorts of sample populations people are having an unexpectedly high number of  all-girl families (i.e. there are significantly more than 1/32 families of five children who are girl-only), for example, then we might suspect that there is not a 50/50 boy/girl probability with these folks each time they make a baby and that might entice us to investigate further into their genes or into their ground water, etc.

But back to these issues about small versus large perspectives that we've uncovered here...

In general, we might find 1/32 five-girl familes of five kids max in our species, but if you look at a hospital register, for example, we'll find streaks of five girls much much more frequently than 1/32 (3%) of the time.

There is something misleading about the way we calculate probability in a closed and narrow view of the world. And there is something subtly different about thinking probabilistically about a series of independent events and thinking probabilistically about their outcomes, instead, especially when many separate series can have the same outcomes (e.g.  rolling a 7 with the dice or having 3 boys and 2 girls).

I think I've located my trouble with probability. It's just a small one with having a large denominator. You know, something pretty easily surmounted--it's just grasping silly little old infinity's all.

O! Maybe later I'll see if I can dig up what the demographic data say. I can ask: How do the frequencies of sex-ratios in human families fit these tight little closed and narrow probabilities/hypotheses? And do birth registers in hospitals show something much different, probably larger? I shall hope to find out. Not because it's a mystery; I already believe I know the answer. But because I can't simply believe to know an answer if there is a real way to see one, and there is a way in this case, so I should go and see in order to believe.

Thanks for reading. Hope our little journey back in time to the fundamentals of statistics blew your mind even a fraction of the way it blew mine!

Further humbling questions and related thoughts are increasingly probable to appear in future posts....


*Which reminds me to share this video of the furry little Buddha who lives in our house:

**Anybody know how to calculate this? Is a super computer necessary? Are there shortcuts for working with such large numbers--something like Pascal's Triangle perhaps?


Ken Weiss said...

I am at a meeting and haven't got time to solve your supercomputer question. If I understand what you are asking there, it is not about families but about a run of so many girls or boys if you enumerate all the births of our 7 billion species in their birth-moment order and ask what is the chance that there will be a run of 5 somewhere among this generation's births. It will be approximately 1.00 (certainty).

It certainly is important to understand, at least in principle, the probability of an outcome on a given test or a given number of tests, or the probability of the outcome in any possible subset of a particular number of tests, and so on.

And there is a difference between specifying the probability (of a boy, say) in advance and computing what would be expected, and estimating the probability that was at work, from a set of realized outcomes. One can test whether the data fit some prior assumption, or how consistent they are with what is estimated from the data rather than some other value.

But trying to wrap one's head around these concepts is a challenge. For students, not doing so can cost a grade. For Henry VIII's wife, it cost her head!*

There are tests for 'runs' of various outcomes in sequences of repeated events. With sexes of children, or dice, we have good ideas of what is happening. But with other seemingly similar situation, it has taken some extensive analysis to show whether the run was just luck or had some other explanation. Hot shooting streaks in basketball, or hitting streaks in baseball, or momentum in many sports, are examples.

*because (as the common stories go) he blamed her for not having any sons, only daughters. In fact, of course, since sons need a Y chromosome, one might surmise that the fault, if any, was his.

Holly Dunsworth said...

I heard (from my husband) that it's been argued that there is only one true streak in all of recorded American sports history. Apparently, Joe DiMaggio's hitting streak is significantly different from expectation. I'll see if I can get the reference on that.

Holly Dunsworth said...

The super computer footnote is asking how can we get help calculating how many streaks of 5 exist in seven billion (the same computation that I did with families of 6 and 7).

Holly Dunsworth said...

The Streak of Streaks by SJ Gould

Ken Weiss said...

Gould was a Yankee fan, and of course he'd find a way to show that the only real streak was Dimaggio's. I can't remember how formally Gould worked out streakiness, so maybe he did the following kind of thing.

There are undoubtedly other more correc ways to ask about streaks-of-5, but one way might be this: How many 5-in-a-rows are there in 5 billion (for simplicity)nucleotides? That would be a billion adjacent (tiled) 5-sets. What is the chance that a 5-set is a streak? 1/32. So you'd expect on average over 31+ million such streaks (of any specific 5-in-a-row sequence) in a billion 5-sets: 1,000,000,000/32.

This isn't completely accurate, because instead of discrete 5-spot tiles as above, you could slide a 5-place window down the 5 billion sites one site at a tine, and at each position ask if you have a streak in the window. Then, you move up one spot and you already have the previous 4 to look at plus the new spot to see if it, too is a 5-mer streak (if your desired streak is all of the same, like boyboyboyboyboyboy, this is easiest to think about, because the first 5 spots are all boys, but so are spots 2-6).

If this counts as two, then the chance that you have a new streak one position over from a current streak (if overlaps can count in this way) your chance of another streak is a whopping 1/2!

This is likely a baby approximation (I mean baby statistics, the only kind I can guess at, not baby boy).

Depending on the complexity of your sliding window (or tiling) rules, or whether you count, say, 10 boys in a row as one long streak, or two adjacent streaks, or 6 overlapping 5-place streaks, as below, and you start to go where no non-statistician should dare to go.

And for more complex 'streaks' like BBGGB or BGBBG, the tiling vs window difference becomes more complicated.

Holly Dunsworth said...

6 in a row contains a streak of five, all the way up to five billion in a row that contains a streak of five. So you have to count all those too. Or is this about Gould? Confusion, here. Maybe you're trying to say my post is wrong without saying so? Well if it is, then say it's so Joe ;)

Holly Dunsworth said...

I should say, "all the way up to five billion in a row that contains MANY STREAKS of five" ... and I'm not sure how this is different from your tiling/window perspective but maybe I shouldn't read your comment on my phone where it's not getting across to me so well on the improbably small screen ;).

Holly Dunsworth said...

I think that we're talking about calculating in similar ways. In my post I have examples of calculating streaks of 5 girls in 6 kid-families, in 7-kid families, and then wonder what those odds would be in a 7-billion-kid family.

And I'm going to go out on a crazy limb and predict that the odds of 5 in a row occurring in a set of 7 billion births (my question in the footnotes) are very close to 1/2.

That's a long way from 1/32---and it's thanks to defining the unit as the living species rather than the family.

Holly Dunsworth said...

unit = set

Francis Davey said...

I'm afraid your sums for two dice are wrong (you can tell this immediately by adding up the number of "ways" and finding it is more than 36).

For example, out of the 36 possible falls of two dice, only one gives 2 (both dice are 1) but two give 3 (1 + 2 and 2 + 1). Your mistake is to treat 1 + 1 as two possibilities.

Try rolling two die (or simulating it on a computer) a bit and you'll see that 2 is rarer than 3 and 7 more common than all the others.

Holly Dunsworth said...

Hmmm. I'm not sure why I doubled up on the pairs. Can't recreate my thinking at this point either. Weird!

Thanks for your input. Luckily none of that little bit about rolling dice impacts the journey I was on except that there are more ways to roll certain numbers than others, regardless of the errors I made.