Saturday, March 23, 2013

Random Expectations

A mathematician, a physicist, and an engineer were traveling through Scotland when they saw a black sheep through the window of the train. "Aha," says the engineer, "I see that Scottish sheep are black." "Hmm," says the physicist, "You mean that some Scottish sheep are black." "No," says the mathematician, "All we know is that there is at least one sheep in Scotland, and at least one side of that sheep is black."

It makes me a little sad that the engineer is usually the stupid guy in the joke.

Mathematicians often seem like they have a little trouble interfacing with the rest of the world, you know, because they're odd. I have been known to not get a joke because I over-analyzed it, or to just miss something because I am being too technical, and I'm not even a real mathematician. When I was living in Brazil, I had a tiny umbrella that I carried with me until the first storm completely destroyed it. One day, someone I knew saw it and started teasing me, calling my undersized umbrella "unisex." That probably connotes femininity in Brazilian culture, as opposed to masculinity, but I was confused. I thought that most umbrellas could be used by anyone, the exception being those with more feminine prints, like flowers. So, unisex must be better than the alternative, which would be girly. By the time I had deduced this incorrect interpretation of "unisex", the moment had passed and I had missed the opportunity to be ridiculed in good humor.

Some words have an everyday meaning and a scientific meaning, like work or power. Other words, though, are also misunderstood. We say random to mean unexpected. It is funny, then, that we have some definite expectations on what randomness looks like. If I asked someone to put random dots on paper, the result would probably look like the chart below.

Figure 1: "Random" Data

The thing is, that chart is not random. With a grid, you can see that the dots are spaced fairly evenly, one dot per area. 

Figure 2: "Not Actually Random" Data

Here is a plot of uniform random data I generated in Excel. It doesn't look very much like the "random" data at all. Like my old boss was fond of saying, "Randomness tends to be clumpy."

Figure 3: True Random Data (as random as the number generator in Excel is, anyway)

We have expectations about lists of numbers too. How often would you expect entries on a list of random data to begin with 1? About 10% of the time? In many cases, that is pretty far off.

If the random data spans a few orders of magnitude (powers of 10) or more, it generally follows Benford's Law. The number 1 shows up as a first digit about 30% of the time, and each higher number shows up less frequently, until 9 only appears less than 5% of the time, as seen in Figure 4. The greater the range of the data, the more closely the data follows the Law. One of the classic examples is the length of rivers. Interestingly, it doesn't matter what units are used in the measurements. You could measure rivers in miles, inches, centimeters, or furlongs. The same basic pattern shows up.

Figure 4: Probability of a data point starting with a number

The fact that the units don't matter is a clue to an explanation. Because we can convert the data from one unit to another, it means multiplying the whole data set by any number, say, 2 will result in a different data set that also follows Benford's Law. About 30% of the numbers will start with 1, even though it is not the same group of rivers whose lengths started with 1 before.

That seems kind of strange. We might expect that the 30% pattern would shift around as we multiply. But if we think about it, only some of the rivers that start with 1 will start with 2 after multiplying by 2. The ones that start with a one followed by a 5 or higher will now start with 3. Now think about the numbers in the new group that will start with 1. Everything that started with a 5, 6, 7, 8, or 9 before multiplying by 2 will start with 1 after multiplying by 2. If we add up the probabilities of starting by 5, 6, 7, 8, and 9 we get the same 30%.

The data set needs to cover a few orders of magnitude so that there is data to start with all the numbers. It wouldn't work with people's heights because they don't vary enough.  Most heights measured in inches would fall in the 60s and 70s, and there wouldn't be any that start with 1. But if the larger data points are 100, 1000, or better 10000 times larger than the smallest data points, there should be enough that start with each number to make the pattern work out.

It may strike you as funny that we have expectations on the "unexpected", and that those expectations are often wrong. Randomness is clumpier than we feel it should be, and random things are surprisingly predictable when considered in groups. That is actually what random means, technically: occurring with a certain probability. Predictable patterns emerge when random things are grouped. Kind of like people.

For more on randomness or Benford's Law, check out:
Scishow on randomness: http://www.youtube.com/watch?v=LElyagQ0n_g
Numberphile on Benford's Law: http://www.youtube.com/watch?annotation_id=annotation_143101&feature=iv&src_vid=VbtNy54ya9A&v=XXjlR2OK1kM
More Benford's Law: http://www.youtube.com/watch?v=vIsDjbhbADY

Thursday, March 14, 2013

Happy π Day!

A mathematician, a physicist, and an engineer are all given identical rubber balls and told to find the volume. The mathematician pulls out a measuring tape and records the circumference. He then divides by 2π to get the radius, cubes that, multiplies by π again, and multiplies by 4/3 to arrive at the volume. The physicist gets a bucket of water, places 1.00000 liters of water in it, drops in the ball, and measures the displacement to six significant figures. And the engineer? He writes down the serial number of the ball and looks it up.

I had to tell a joke that involves π because it is π Day! Every year on 3/14, nerds around the world get excited because the date resembles the first few digits of π. That is, unless they structure the date in a consistent way, day/month/year, in increasing time increments. Then π day doesn't work because it would be the 31st of April.

π is the ratio between the circumference and the diameter of a circle. That is, a circle that is 1 unit across is π units, or about 3.14 units, around. It may be surprising that π shows up all over the place in mathematics and science, not just in geometry. It might be my favorite number, and not just because I memorized 150 digits of it in middle school. It is kind of the rock star of numbers. Everyone is familiar with π, even if they don't really know how to use it.

So it seems odd that some mathematicians and scientists want to replace π. They propose that we should use τ. That little t is lowercase Greek tau, their letter t. (π is lowercase Greek pi, their letter p.) τ is equal to 2π, or about 6.28. Sometimes supporters of tau use inflationary language like "π is wrong" or things like that, but what they really claim is that since 2π shows up so much, we should just replace 2π with its own symbol and use π less often. Mathematicians don't really use the diameter of a circle much, but they use the radius constantly. τ works nicely in that sense, because it is the ratio between the circumference and the radius.

Another way in which π trumps π is measuring angles. Scientists and mathematicians don't really use degrees much; often they use radians. There are 2π radians in a full circle, so that comes out to exactly τ. So a full revolution is one τ, whereas half of a revolution is one π. A whole corresponding to a whole is better than a half corresponding to a whole. 

There are more reasons that are cited in support of τ, but I don't think the case is strong enough. π has plenty of uses without the 2, and one applies especially to me. You may have noticed that I haven't mentioned engineers in this discussion. That is because I think they generally fall into the π camp. The biggest reason probably is making physical measurements. If I need to know the size of a pipe, I measure across it with some calipers or measuring tape. In the real world, we use diameters much more often because they are natural. How do you measure the radius of a pipe? You would need some kind of specialized tool. 

There are debatable advantages to switching to τ, but the cost of re-education and converting would be much greater. It is claimed that τ is more natural to use, and in the theoretical world it may be. We live in a physical world, however, and we naturally started using π thousands of years ago because it is more natural in a physical world. And that is why I won't be celebrating on June 28th.

For more:
numberphile on π: http://www.youtube.com/watch?v=yJ-HwrOpIps
numberphile on τ vs. π: http://www.youtube.com/watch?v=83ofi_L6eAo
                                    http://www.youtube.com/watch?v=ZPv1UV0rD8U