100 Essential Things You Didn't Know You Didn't Know Page 6
Fortunately, there is rather more to this numerical insight than the example of the inebriated stroller suggests. The picture of a succession of steps in random directions is an excellent model of what is going on when molecules spread from one place to another because they are hotter than those nearby, and so move faster on the average. They scatter off other molecules one after another in a random way and spread out from the starting point just like the shambling drunkard, taking about N2 scatterings to go a distance equal to N of the distances that the molecules fly in between scatterings. This is why it takes a significant time to feel the effects of turning on a radiator across a room. The more energetic (‘hotter’) molecules are ‘drunkenly’ meandering across the room, whereas the sound waves from the clunking of the air locks in the pipes fly straight across at the speed of sound.
24
Faking It
Blackadder: It’s the same plan that we used last time, and the seventeen times before that.
Melchett: E-E-Exactly! And that is what’s so brilliant about it! We will catch the watchful Hun totally off guard! Doing precisely what we have done eighteen times before is exactly the last thing they’ll expect us to do this time!
Richard Curtis and Ben Elton
One of the most interesting general statistical misconceptions is what a random distribution is like. Suppose that you had to tell whether a particular sequence of events was random or not. You might hope to tell if it was not random by finding a pattern or some other predictable feature. Let’s try to invent some lists of ‘heads’ (H) and ‘tails’ (T) that are supposed to be the results of a sequence of tosses of a coin so that no one would be able to distinguish them from real coin tossings. Here are three possible fake sequences of 32 coin tosses:
Do they look right? Would you regard them as likely to be real random sequences of ‘heads’ and ‘tails’ taken from true coin tosses, or are they merely poor fakes? For comparison, here are three more sequences of ‘heads’ and ‘tails’ to choose from:
If you asked the average person whether these second three lists were real random sequences, most would probably say no. The first three sequences looked much more like their idea of being random. There is much more alternation between heads and tails, and they don’t have the long runs of heads and of tails that each of the second trio displays. If you just used your computer keyboard to type a ‘random’ string of Hs and Ts, you would tend to alternate a lot and avoid long strings, otherwise it ‘feels’ as if you are deliberately adding a correlated pattern.
Surprisingly, it is the second set of three sequences that are the results of a true random process. The first three, with their staccato patterns and absence of long runs of heads or tails, are the fakes. We just don’t think that random sequences can have all those long runs of heads or tails, but their presence is one of the acid tests for the genuineness of a random sequence of heads and tails. The coin tossing process has no memory. The chance of a head or a tail from a fair toss is 1/2 each time, regardless of the outcome of the last toss. They are all independent events. Therefore, the chance of a run of r heads or r tails coming up in sequence is just given by the multiplication ½ × ½ × ½ × ½ × . . . × ½, r times. This is ½r. But if we toss our coin N times so that there are N different possible starting points for a run of heads or tails, our chance of a run of length r is increased to N × ½r. A run of length r is going to become likely when N × ½r is roughly equal to 1 – that is, when N = 2r. This has a very simple meaning. If you look at a list of about N random coin tosses, then you expect to find runs of length r where N = 2r.
All our six sequences were of length N = 32 = 25 so if they are randomly generated we expect there is a good chance that they will contain a run of 5 heads or tails and they will almost surely contain runs of length 4. For instance, with 32 tosses there are 28 starting points that allow for a run of 5 heads or tails, and on average two runs of each is quite likely. When the number of tosses gets large, we can forget about the difference between the number of tosses and the number of starting points and use N = 2r as the handy rule of thumb.fn1 The absence of these runs of heads or tails is what should make you suspicious about the first three sequences and happy about the likely randomness of the second three. The lesson we learn is that our intuitions about randomness are biased towards thinking it is a good deal more ordered than it really is. This bias is manifested by our expectation that extremes, like long runs of the same outcome, should not occur – that somehow those runs are orderly because they are creating the same outcome each time.
These results are also interesting to bear in mind when you look at long runs of sporting results between teams that regularly play one another, Army vs Navy, Oxford vs Cambridge, Arsenal vs Spurs, AC Milan vs Inter, Lancashire vs Yorkshire. There are often winning streaks where one team wins for many years in a row, although this is not usually a random effect: the same players form the core of the team for several years and then retire or leave and a new team is created.
fn1 The result is easily generalised to deal with random sequences where the equally likely outcomes are more than two (H and T here). For the throws of a fair die the probability of any single outcome is 1/6 and to get a run of the same outcome r times we would expect to have to throw it about 6r times; even for small values of r, this is a very large number.
25
The Flaw of Averages
Statistics can be made to prove anything – even the truth.
Noël Moynihan
Averages are funny things. Ask the statistician who drowned in a lake of average depth 3 centimetres. Yet, they are so familiar and seemingly so straightforward that we trust them completely. But should we? Let’s imagine two cricketers. We’ll call them, purely hypothetically, Flintoff and Warne. They are playing in a crucial test match that will decide the outcome of the match series. The sponsors have put up big cash prizes for the best bowling and batting performances in the match. Flintoff and Warne don’t care about batting performances – except in the sense that they want to make sure there aren’t any good ones at all on the opposing side – and are going all out to win the big bowling prize.
In the first innings Flintoff gets some early wickets but is taken off after a long spell of very economical bowling and ends up with figures of 3 wickets for 17 runs, an average of 5.67. Flintoff’s side then has to bat, and Warne is on top form, taking a succession of wickets for final figures of 7 for 40, an average of 5.71 runs per wicket taken. But Flintoff has the better (i.e. lower) bowling average in the first innings, 5.67 to 5.71.
In the second innings Flintoff is expensive at first, but then proves to be unplayable for the lower-order batsmen, taking 7 wickets for 110 runs, an average of 15.71 for the second innings. Warne then bowls at Flintoff’s team during the last innings of the match. He is not as successful as in the first innings but still takes 3 wickets for 48 runs, for an average of 16.0. So, Flintoff has the better average bowling performance in the second innings as well, this time by 15.71 to 16.0.
Who should win the bowling man-of–the-match prize for the best figures? Flintoff had the better average in the first innings and the better average in the second innings. Surely, there is only one winner? But the sponsor takes a different view and looks at the overall match figures. Over the two innings Flintoff took 10 wickets for 127 runs for an average of 12.7 runs per wicket. Warne, on the other hand, took 10 wickets for 88 runs and an average of 8.8. Warne clearly has the better average and wins the bowling award, despite Flintoff having a superior average in the first innings and in the second innings!
All sorts of similar examples spring to mind. Imagine two schools being rated by the average GCSE score per student. One school could have an average score higher than another school on every single subject when they are compared one by one, but then have a lower average score than the second school when all scores were averaged over together. The first school could correctly tell parents that it was superior to the other school in every subje
ct, but the other school could (also legitimately) tell parents that its pupils scored more highly on the average than those at the first school.
There is truly something funny about averages. Caveat emptor.
26
The Origami of the Universe
Any universe simple enough to be understood is too simple to produce a mind able to understand it.
Barrow’s Uncertainty Principle
If you want to win bets against over over-confident teenagers then challenge them to fold a piece of A4 paper in half more than seven times. They’ll never do it. Doubling and halving are processes that go so much faster than we imagine. Let’s suppose that we take our sheet of A4 paper and slice it in half, again and again, using a laser beam so that we don’t get caught up with the problems of folding. After just 30 cuts we are down to 10-8 centimetres, close to the size of a single atom of hydrogen. Carry on halving and after 47 cuts we are down to 10-13 centimetres, the diameter of a single proton forming the nucleus of an atom of hydrogen. Keep on cutting and after 114 cuts we reach a remarkable size, about 10-33 of a centimetre, unimaginable in our anthropocentric metric units, but not so hard to imagine when we think of it as cutting the paper in half just 114 times, a lot to be sure, but far from unimaginable. What is so remarkable about this scale is that for physicists it marks the scale at which the very notions of space and time start to dissolve. We have no theories of physics, no descriptions of space, time and matter that are able to tell us what happens to that fragment of paper when it is cut in half just 114 times. It is likely that space as we know it ceases to exist and is replaced by some form of chaotic quantum ‘foam’, where gravity plays a new role in fashioning the forms of energy that can exist.4 It is the smallest length on which we can presently contemplate physical reality to ‘exist’. This tiny scale is the threshold that all the current contenders to be the new ‘theory of everything’ are pushing towards. Strings, M theory, non-commutative geometry, loop quantum gravity, twistors . . . all are seeking a new way of describing what really happens to our piece of paper when it is cut in half 114 times.
What happens if we double the size of our sheet of A4 paper, going to A3, to A2 and so on? After just 90 doublings we have passed all the stars and the visible galaxies, and reached the edge of the entire visible Universe, 14 billion light years away. There are no doubt lots more universe farther away than this, but this is the greatest distance from which light has had time to reach us since the expansion of the Universe began 14 billion years ago. It is our cosmic horizon.
Drawing together the large and the small, we have discovered that just 204 halvings and doublings of a piece of paper take us from the smallest to the largest dimensions of physical reality, from the quantum origins of space to the edge of the visible Universe.
27
Easy and Hard Problems
Finding a hard instance of this famously hard problem can be a hard problem.
Brian Hayes
It takes a long while to complete a large jigsaw puzzle, but just an instant to check that the puzzle is solved. It takes a fraction of a second for your computer to multiply two large numbers together, but it would take you (and your computer) a long time to find the two factors that have been multiplied together to make a big number. It has long been suspected, but never proved or disproved (and there is a one-million-dollar prize for doing either), that there is a real division between ‘hard’ and ‘easy’ problems that reflects the amount of calculating time that needs to be used to solve them.
Most of the calculations, or information-gathering tasks, that we have to do by hand, like completing our tax returns, have the feature that the amount of calculating to be done grows in proportion to the number of pieces we have to handle. If we have three sources of income we have to do three times as much work. Similarly, on our computer it takes ten times longer to download a file that is ten times bigger. Ten books will generally take ten times as long to read as one. This pattern is characteristic of ‘easy’ problems. They may not be easy in the usual sense, but when you add lots of them together the amount of work required doesn’t grow very quickly. Computers can easily cope with these problems.
Unfortunately, we often encounter another type of problem that is far less easy to control. Each time we add an extra piece to the calculation, we find that the calculation time required to solve it doubles. Very soon the total time required becomes stupendously large, and even the fastest computers on Earth can be easily defeated. These are what we mean by ‘hard’ problems.5
Surprisingly, ‘hard’ problems are not necessarily horribly complicated or mind-bogglingly difficult. They just involve a lot of possibilities. Multiplying together two large prime numbers is a computationally ‘easy’ task. You can do it in your head, with pencil and paper or on a calculator, as you wish. But if you give the answer to someone else and ask them to find the two prime numbers that were used in the multiplication, then they might be facing a lifetime of searching with the world’s fastest computers.
If you want to try one of these ‘hard’ problems for yourself, one that sounds deceptively easy, then find the two prime numbers that add up to give 389965026819938.fn1
These ‘trapdoor’ operations – so called because, like falling through a trapdoor, it is so much easier to go in one direction than in the reverse – are not altogether bad things. They make life difficult for us but they also make life difficult for people whose lives we are trying to make difficult for a very good reason. All the world’s principal security codes exploit trapdoor operations. Every time you shop online or extract cash from an ATM machine you are using them. Your pin number is combined with large prime numbers in such a way that any hacker or computer criminal wanting to steal your account details would have to factor a very large number into the two big prime numbers that were multiplied together to get at it. This is not impossible in principle, but it is impossible in practice in a sensible period of time. A criminal with the world’s fastest computer at his disposal might crack the encryption in several years, but by then the codes and account numbers would have been changed.
For this reason very large prime numbers are very valuable things, and some have been patented when written in a certain form. There is no limit to the number of prime numbers – they go on forever – but there is a largest one that we have been able to check for primeness by ensuring that it has no factors. There is no magic formula that can generate all prime numbers, and it is suspected that no such formula exists. If it did, and it was found, there would be a major crisis in the world. Any government agency that found it would undoubtedly keep it top secret. Any academic who found it and made it public without warning would bring the world tumbling down. All military, diplomatic and banking codes would become easily breakable by fast computers overnight. The world of online commerce would face a serious threat to its continued existence. We would have to move to iris, or fingerprint, or DNA-based recognition systems that relied on unique features of our biochemistry rather than numbers stored in our memories. But these new indicators would still need to be stored in a secure way.
The factoring of prime numbers is a ‘hard’ problem. Even if it is cracked and shown to be an ‘easy’ problem by means of a magic formula, you might think that we could just use some other ‘hard’ problem to encrypt sensitive information so that it still takes ages to reverse the operation and extract it. However, it is known that if one of the problems we believe to be ‘hard’ could be shown to be ‘easy’ by means of some new discovery, then that discovery could be used to turn all the other computationally ‘hard’ problems into ‘easy’ ones. It really would be a magic bullet.
fn1 The answer is 5569 + 389965026814369.
28
Is This a Record?
I always thought that record would stand until it was broken.
Yogi Berra
Records – remember those those little discs of black vinyl that your parents owned that made a noise when spun on a turntable at h
igh speed? Well, mathematicians are more interested in the other sort of records: the biggest, the smallest and the hottest. Are they predictable in any way?
At first you might think not. True, they follow a trend of getting ‘better’ – they wouldn’t be records if they didn’t – but how could you predict whether a Michael Johnson or an Ian Thorpe was going to come along and break record after record? Amazingly, the women’s world record in the pole vault was broken on eight separate occasions by Yelena Isinbayeva in one year alone. Records like this are not random in a very important sense. Each new record is the result of a competitive effort that is not independent of all the previous attempts at the same feat. Pole vaulters learn new points of technique and continually train to improve their weaknesses and refine their technique. All you can predict about records like this is that they will be set again, eventually, although there might be a very long wait for the next one.
However, there are different sorts of records that arise in sequences of events that are assumed to be independent of each other. Good examples are record monthly rainfalls, record high or low temperatures in one place over hundreds of years, or the heights of the record highest tides. The assumption that each event is independent of its predecessors is a very powerful one that allows us to make a striking prediction about how likely it is that there will be a record – irrespective of what the record is for. It could be rain, snow, fall of leaves, water levels, wind speeds or temperature.
Let’s pick on the annual rainfall in the UK as our example. In the first year that we keep records the rainfall must be a record. In year two, if the rainfall is independent of what it was in year 1, then it has a chance of 1/2 of being a record by beating the year-1 rainfall and a chance of 1/2 of not beating the year-1 rainfall. So the number of record years we expect in the first two years is 1 + ½. In year 3 there are just two ways in which the six possible rankings (i.e. a 1 in 3 chance) of the rainfall in years 1, 2 and 3 could produce a record in year 3. So the expected number of record years after 3 years of record keeping is 1 + ½ + ⅓. If you keep on going, applying the same reasoning to each new year, you will find that after n independent years of gathering data, the expected number of record years is the sum of a series of n fractions: