Normally we recognise a vector as having a raised (component) index: , whereas a covector has a lowered index: . Similarly the dual to , using the metric, is the covector with components denoted say; while the dual to is the vector with components .
Recall what this notation means. It presupposes a vector basis say, where in this case the labels entire vectors — the different vectors in the basis — rather than components. Hence we have the decomposition: . Similarly, the component notation implies a basis of covectors , so: . These bases are taken to be dual to one another (in the sense of bases), meaning: . We also have and , where as usual and . (The “sharp” and “flat” symbols are just a fancy way to denote the dual. This is called the musical isomorphism.)
However since , we may instead take the dual of both sides of this expression to get: . This gives a different decomposition than in the previous paragraph. Curiously, this expression contains a raised component index, even though it describes a covector. For each index value , the component is the same number as usual. But here we have paired them with different basis elements. Similarly is a different decomposition of the vector . It describes a vector, despite using a lowered component index. Using the metric, the two vector bases are related by: .
A good portion of the content here is just reviewing notation. However this article does not seem as accessible as I envisioned. The comparison with covectors is better suited to readers who already know the standard approach. (And I feel a pressure to demonstrate I understand the standard approach before challenging it a little, lest some readers dismiss this exposition.) However for newer students, it would seem better to start afresh by defining a vector basis to satisfy: . (While is identical notation to covectors, there need not be confusion, if no covectors are present anywhere.) This relation is intuitive, as the below diagram shows.
I learned this approach (of defining a second, “reciprocal basis” of vectors) from geometric algebra references. However, it is really just a revival of the traditional view. I used to think old physics textbooks which taught this way were unsophisticated, and unaware of the modern machinery (covectors). I no longer think this way. The alternate approach does require a metric, so is less general. However all topics I have worked on personally, in relativity, do have a metric present. But even for contexts with no metric, this approach could still serve as a concrete intuition, to motivate the introduction of covectors, which are more abstract. The alternate approach also challenges the usual distinction offered between contravariant and covariant transformation of (co)vectors and higher-rank tensors. It shows this is not about vectors vs covectors at all, but more generally about the basis chosen. I write about these topics in significant length in my MPhil thesis (2024, forthcoming, §2.3), and intend to write more later.
The discovery of spinors is most often credited to quantum physicists in the 1920s, and to Élie Cartan in the prior decade for an abstract mathematical approach. But it turns out the legendary mathematician Leonhard Euler discovered certain algebraic properties for the usual (2-component, Pauli) spinors, back in the 1700s! He gave a parametrisation for rotations in 3D, using essentially what were later known as Cayley-Klein parameters. There was not even the insight that each set of parameter values forms an interesting object in its own right. But we can recognise one key conceptual aspect of spinors in this accidental discovery: the association with rotations.
In 1771, Euler published a paper on orthogonal transformations, whose title translates to: “An algebraic problem that is notable for some quite extraordinary relations”. Euler scholars index it as “E407”, and the Latin original is available from the Euler Archive website. I found an English translation online, which also transcribes the original language.
Euler commences with the aim to find “nine numbers… arranged… in a square” which satisfy certain conditions. In modern notation this is a matrix M say, satisfying , which describes an orthogonal matrix. While admittedly the paper is mostly abstract algebra, he is also motivated by geometry. In §3 he mentions that the equation for a surface is “transformed” under a change of [Cartesian] coordinates, including the case where the coordinate origins coincide. We recognise this (today, at least) as a rotation, possibly combined with a reflection. Euler also mentions “angles” (§4 and later), which is clearly geometric language.
He goes on to analyse orthogonal transformations in various dimensions. [I was impressed with the description of rotations about n(n – 1)/2 planes, in n dimensions, because I only first learned this in the technical context of higher-dimensional rotating black holes. It is only in 3D that rotations are specified by axis vectors.] Then near the end of the paper, Euler seeks orthogonal matrices containing only rational entries, a “Diophantine” problem. Recall rotation matrices typically contain many trigonometric terms like sin(θ) and cos(θ), which are irrational numbers for most values of the parameter θ. But using some free parameters “p, q, r, s”, Euler presents:
where . (I have copied the style which Euler uses in some subsequent examples.) By choosing rational values of the parameters, the matrix entries will also be rational, however this is not our concern here. The matrix has determinant +1, so we know it represents a rotation. It turns out the parameters form the components of a spinor!! are the real components of a normalised spinor. We allow all real values, but will ignore some trivial cases. One aspect of spinors is clear from inspection: in the matrix the parameters occur only in pairs, hence the sets of values and give rise to the same rotation matrix. (Those familiar with spinors will recall the spin group is the “double cover” of the rotation group.)
The standard approach is to combine the parameters into two complex numbers. But in the geometric algebra (or Clifford algebra) interpretation, a spinor is a rotation of sorts, or we might say a “half-rotation”. It is about the following plane:
(For those who haven’t seen the wedge product nor bivectors, you can visualise for example, as the parallelogram or plane spanned by those vectors. It also has a magnitude and handedness/orientation.) The sum is itself a plane, because we are in 3D. Dividing by gives a unit bivector. For the spinor, the angle of rotation θ/2 say, is given by (c.f. Doran & Lasenby 2003 §2.7.1):
This determines θ/2 to within a range of 2π (if we include also the orientation of the plane). In contrast, the matrix given earlier effects a rotation by θ — twice the angle — about the same plane. This is because geometric algebra formulates rotations using two copies of the spinor. The matrix loses information about the sign of the spinor, and hence also any distinction between one or two full revolutions.
Euler extends the challenge of finding orthogonal matrices with rational entries to 4D. In §34 he parametrises matrices using “eight numbers at will a, b, c, d, p, q, r, s”. However the determinant of this matrix is -1, so it is not a rotation, and the parameters cannot form a spinor. Two of its eigenvalues are -1 and +1. Now the eigenvectors corresponding to distinct eigenvalues are orthogonal (a property most familiar for symmetric matrices, but it holds for orthogonal matrices also). It follows the matrix causes reflection along one axis, fixes an orthogonal axis, and rotates about the remaining plane. So it does not “include every possible solution” (§36). But I guess the parameters might form a subgroup of the Pin(4) group, the double cover of the 4-dimensional orthogonal group O(4).
Euler provides another 4×4 orthogonal matrix satisfying additional properties, in §36. This one has determinant +1, hence represents a rotation. It would appear no eigenvalues are +1 in general, so it may represent an arbitrary rotation. I guess the parameters , where I label by u the quantity mentioned by Euler, might form spinors of 4D space (not 3+1-dimensional spacetime). If so, these are members of Spin(4), the double cover of the 4D rotation group SO(4).
Euler was certainly unaware of his implicit discovery of spinors. His motive was to represent rotations using rational numbers, asserting these are “most suitable for use” (§11). Probably more significant today is that rotations are described by rational functions of spinor components. But the fact spinors would be rediscovered repeatedly in different applications suggests there is something very natural or Platonic about them. Euler says his 4D “solution deserves the more attention”, and that with a general procedure for this and higher dimensions, “Algebra… would be seen to grow very much.” (§36) He could not have anticipated how deserving of attention spinors are, nor their importance in algebra and elsewhere!
Imagine a light ray reflecting off a mirror. If the mirror is rotating, the direction of the reflected beam will also rotate, but at twice the rate of the mirror! This follows from the way the angles work, if you recall for example “the angle of incidence equals the angle of reflection” and think about it carefully… Or, just play around with an animation until it looks right 😉 . In quantum physics this angle-doubling property turns up in the description of electrons for example, where it seems very mysterious and exotic (keywords: “spin-1/2” and “spinors”). So its appearance in an “ordinary” and intuitive setting is reassuring.
For simplicity let’s use a two-dimensional plane, as shown in Figure 1. We measure angles from the positive direction of the x-axis, as usual in polar coordinates. I choose to measure from the centre outwards, so for the incoming ray the angle assigned is the opposite of what the arrow might suggest. Label the incoming ray angle b, and mirror rotation angle m. Now if you increase m by some given amount, the outgoing ray angle increases by twice as much. But if you increase b instead, the outgoing ray angle decreases by the same amount. We also need an “initial condition” of sorts: when the mirror is horizontal (m = 0°), and the ray arrives from directly above (b = 90°), the reflected beam is also at 90°. It follows:
reflected angle = 2m – b + 180°.
Now if the mirror rotates by 180°, the reflected ray completes a full 360° rotation, so is back to its original position. (We suppose the mirror is 2-sided.) If you hadn’t watched the rotation, you wouldn’t know anything had changed. But now suppose we make one side of the mirror red and the other blue, so the reflected ray takes on the colour of the closest side. Now the ray must make two complete revolutions, 720°, to get back to its original state! After one revolution it is back to the same position, but has a different colour, as the opposite side of the mirror faces the beam. Similarly, if the reflected ray is rotated by 180° in one direction, this is not the same as rotating by 180° in the opposite direction, as the colour is different. “Spinors” have these same features, except in place of red/blue their mathematical description picks up a factor of ±1.
You might try animating this yourself. If you draw the rays with unit length, then the arrow for the incoming beam points from (cos b, sin b) to (0,0). The outgoing arrow points from (0,0) to -(cos(2m–b), sin(2m–b)), where a minus sign replaces the 180° term from earlier. The colour depends on whether the incoming ray is from 0 to 180° ahead of the mirror, or 0 to 180° behind. This is determined by the sign of sin(m–b). It is convenient to allow angle parameters beyond 360°, which makes no physical difference at most only a change in colour, as we have learned 😀 . Below is Mathematica code I wrote, which uses slider controls for the angle parameters. The result is fun to play around with, and it helps make the angle-doubling more intuitive.
I have written on the topic of “covariant” and “contravariant” vectors (and higher-rank tensors) previously, and have been intending to write an update for a number of years. It must be noted some authors recommend avoiding these terms completely, including Schutz 2009 §3.3:
Most of these names are old-fashioned; ‘vectors’ and ‘dual vectors’ or ‘one-forms’ are the modern names.
Let’s return to Schutz’ reason soon. I have followed this naming practice myself, except I prefer to say “covector” or “1-form”, rather than “dual vector” which can be clumsy. (What would you call the vector which is the (metric) dual to a given 1-form: the “dual of a dual vector”, or a “dual-dual-vector”!?) People also talk about “up[stairs] indices” and “down[stairs] indices”, which seems alright.
But if you want to be cheeky, you might say a vector is covariant, while a 1-form is contravariant — the exact opposite of usual terminology! I remember a maths graduate student at some online school stating this. Similar sentiments are expressed by Spivak 1999 vol. 1 §4:
Nowadays such situations are always distinguished by calling the things which go in the same direction “covariant” and the things which go in the opposite direction “contravariant”. Classical terminology used these same words, and it just happens to have reversed this: a vector field is called a contravariant vector field, while a section of T*M is called a covariant vector field. And no one has had the gall or authority to reverse terminology so sanctified by years of usage. So it’s very easy to remember which kind of vector field is covariant, and which is contravariant — it’s just the opposite of what it logically ought to be.
While I love material which challenges my conceptual understanding, and Spivak’s humorous prose is fun; trying to be too “clever” with terms can hamper clear communication. Back to Schutz, who clarifies:
The reason that ‘co’ and ‘contra’ have been abandoned is that they mix up two very different things: the transformation of a basis is the expression of new vectors in terms of old ones; the transformation of components is the expression of the same object in terms of the new basis.
If you take the components of a fixed vector in a given basis, they transform contravariantly when the basis changes. But if you consider the vector as a whole — a single geometric object — and ask how a basis vector (specifically) is mapped to a new basis vector, the change is “covariant”. (Recall, as Schutz explains: “The property of transforming with basis vectors gives rise to the co in ‘covariant vector’ and its shorter form ‘covector’.”) In general, if you fix a set of components, by which I mean fixing an ordered set of numbers like (0,1,0,½) say, and then change the basis vectors these numbers refer to, then the change of a vector (as a whole entity) is “covariant”, so-called. For 1-forms, the converse of these statements apply. Some diagrams would make this paragraph clearer, but I leave this as an exercise, sorry.
However, it seems to me the most accurate description is that vectors don’t change at all, when you change a basis! Picture a vector as an arrow in space, then the arrow does not move. In this sense vectors are neither contravariant nor covariant, but invariant! (We could also say generally covariant, since they are geometric entities independent of any coordinate system. This is the usual modern meaning of the word “covariant”, but it’s a bit different to the covariant–contravariant distinction, so for clarity I avoid this language here.) In conclusion, one of the clearest descriptions is to simply say: vector, or 1-form / covector. Or, given historical usage, it is especially clear to say a vector’s “components transform contravariantly”. See the Table.
Table: transformation under basis change
object
clarification
vector
1-form
components
same (co)vector, but components in a new basis
contravariant
covariant
(co)vector
same (co)vector, treated as a whole
invariant
invariant
basis (co)vector
transform to different (co)vector
covariant
contravariant
Addendum: I recently learned a dual basis need not be made up of 1-forms, as in the usual formulation in differential geometry, but of vectors instead! Recall the defining relation between a coordinate basis and cobasis: , or for an arbitrary frame. In particular, each cobasis element is orthogonal to the “other” 3 vectors. But we can take the duals , which are vectors but obey the same orthogonality relations, via the metric scalar product: . (By relating to the standard approach I may have made things look complicated, but this should be visualised as simply finding new vectors orthogonal to existing vectors.) (On a separate note, “dual” here means as an individual vector, not dual as a basis.) It seems this vector approach to a dual basis was the original one. In 1820 an Italian mathematician Giorgini distinguished between projezione oblique (parallel projections) and projezioni ortogonali (orthogonal projections) of line segments, now termed contravariant and covariant. According to one historian (Caparrini 2003 ), this was “one of the first clear-cut distinctions between the two types of projections in analytic geometry.” However priority goes to Hachette 1809 . Today in geometric algebra, also known as Clifford algebra, a dual basis is also defined as vectors not 1-forms, via (Doran & Lasenby 2003 §4.3).
Earlier we analysed the probabilities for the bridge-crossing scenario in the Squid Game episode “VIPs”, which has “deadly high stakes” according to the Netflix blurb for the series. 🙂 So far, we made the assumption of no foreknowledge. This means our results for the players’ progress describe their chances as they stand before the game begins. Equivalently, if the game has started, our results assume the analyst knows nothing about prior contestants, and cannot view the state of the bridge.
But now, suppose we are told only that a specific player numbered i died on step number n. (That is, they stood safely on column n – 1, but chose wrongly amongst the next pair of glass panels on column n, breaking a pane and plummeting downwards.) Then the next player is definitely safe on step n, but has no information about later steps, so the game is essentially reset from that point. Hence the “conditional probability” that player I > i is still alive on step N > n is simply:
Recall is the chance player i′ is alive on step n′ (given no information nor conditions). We labelled as the chance they died on step n′ specifically, so analogously:
Now, suppose we are told only that a specific player I will die on step N. What is the probability for an earlier player’s progress? Bayes’ theorem says that given two events A and B, the conditional probabilities are related by , which in our case is:
The powers of 2 cancelled. The Table below shows some example numbers.
Table: Probability player i died on step n, given player I = 5 will die on step N = 8
step:
n = 1
2
3
4
5
6
7
8
player:
i = 1
4/7
2/7
4/35
1/35
0
0
0
0
2
0
2/7
12/35
9/35
4/35
0
0
0
3
0
0
4/35
9/35
12/35
2/7
0
0
4
0
0
0
1/35
4/35
2/7
4/7
0
5
0
0
0
0
0
0
0
1
In general, on any given row (fixed player i) the entries are nonzero only for n between i and N – I + i inclusive. This forms a diamond shape. For the row sum computer algebra returns a hypergeometric function times two binomial coefficients, which appears to simplify to 1 (for integer parameters) as expected, since player i must die somewhere. On any given column which is independent of n, meaning each step has equal chance that some player will die there. In particular the first entry itself takes this value: .
We examine other properties and special cases. By construction the last row and column are zeroes apart from ; our general formula does not apply for n = N. If we are told where the second player I = 2 died, then player i = 1 has an equal chance 1/(N – 1) of dying on any earlier step. Also from the definition it is clear:
so the table is symmetric about its central point. The ratio of adjacent entries follows from the binomial coefficients:
It follows that at step n, player i = (I – 1)n/N and the subsequent player have the same “fail” chance. Presumably the maximum lies within this range. Physically we require the indices i and n to be integers. For the chosen Table parameters above, the relation just given is simply i = n/2, so every second column contains an adjacent pair of equal values. For the steps (columns) on the other hand, on n = (i – 1)(N – 1) / (I – 2) and the following step the “elimination” chance is equal. Note these special index values are linear functions of the other index (i or n respectively), where we regard I and N as fixed.
By rearranging terms we can write equivalent expressions for the chance to be eliminated, such as:
For suitably large parameters, the probability resembles a gaussian curve. We can apply the de Moivre-Laplace approximation (with parameter p := ½ say) to the binomial coefficients. This gives a gaussian for a fixed step number n, as a function of the player number. I omit the height, but its centre and width are determined from the exponent which is:
The spread is maximum at n = N/2, in this approximation. Now to obtain a gaussian approximation for a fixed player i, apply the results of the previous blog post using the substitutions , , , and . The centre is . One option for the height of the gaussians — when looking for a simple expression — is to use the sums 1 and (I – 1)/(N – 1) determined before. Recall for a normalised gaussian, the height is inversely proportional to the standard deviation.
There are other conditional probability questions one could pose. Suppose we are given a window, bounded by the events that player J died on column L, and later player K dies on column M? Inside this window, the probabilities reduce to our above analysis: the chance i dies on n is just , where we also substitute and . As another possible scenario to analyse, we might be informed that player I is alive on step N. Then we would not know how far they progressed, just that it was at least that far. Or, we might be told player I died on or before step N.
A concluding thought: Bayes’ theorem is deceptively simple-looking. I tried harder ways beforehand, trying to puzzle through the subtlety of conditional probability on my own. But with Bayes, the main result followed easily from our previous work.
Consider the following function, which is the product of a certain pair of binomial coefficients:
We take a, b, X >> 1 to be constants, and x to have domain [a – 1, X – b + 1] which implies X > a + b – 2 at least. As usual , and this is extended beyond integer values by replacing each factorial with a Gamma function. Note the independent variable x appears in the upper entries of the binomial coefficients. Curiously, from inspection f is well-approximated by a gaussian curve. To gain some insight, for integer values of the parameters f is the polynomial:
This has many zeroes, and sometimes oscillates wildly in between them, hence the domain of x specified earlier.
Now the usual approximations to a single binomial coefficient (actually, binomial distribution) are not helpful here. For example the de Moivre–Laplace approximation is a gaussian in terms of the lower entry in the binomial coefficient, whereas our x is in the upper entries. More promising is the approximation as a Poisson distribution, which leads to a polynomial which is itself gaussian-like, and motivated the previous post incidentally. However we proceed from first principles, by estimating the centre point and the second derivative there.
At the (central) maximum of f, the slope is zero. In general the derivative is , where the H’s are called harmonic numbers. There may not exist any simple explicit expression for the turning points. Instead, the ratio of nearby points is comparatively simple:
using the properties of the binomial coefficient. The derivative is approximately zero where this ratio is unity, which occurs at:
This should be a close estimate for the central turning point. [To do better, substitute specific numbers for the parameters, and solve numerically.] It is typically not an integer. Our sought-for gaussian has form . We set the height . Only the width remains to be determined. The gaussian’s second derivative evaluated at its centre point is . On the other hand:
which uses the so-called harmonic numbers of order 2, and I incorporate the function and its derivative (both given earlier) for brevity of the expression. Matching the results at yields the variance parameter :
using . (At large values the series may give insight into the above.) But alternatively, we can approximate the second derivative using elementary operations. By sampling the function at , , and say, a “finite differences” approach gives approximate derivatives. We can use the simple ratio formula obtained earlier to reduce the sampling to one or two points only, which might gain some insight along the way (though I currently wonder if this is a dead end…).
Now , which becomes:
after using the ratio formula to obtain in terms of C. Similarly it turns out is the negative of the above expression, but with a and b interchanged. Then a second derivative is: , but the combined expression does not simplify further so I won’t write it out. The last step is to set , which is different to the earlier choice.
A slightly different approach uses , which may be expressed in terms of another sampled point . Similarly . The estimate for the second derivative follows, then later:
The expression is a little simpler in this approach, but at the cost of a second sample point. The use of and instead leads to the same result.
where the independent variable x ranges between 0 and X, and the exponents are large: . [We could call it a “polynomial”, though the exponents need not be integers. Specifically it is the product of “monomials” in x and X – x, so might possibly be called a “sparse” polynomial in this sense.] Surprisingly, it closely resembles a gaussian curve, over our specified domain .
The turning point is where the derivative equals zero. This occurs when x is the surprisingly simple expression:
at which the function has value:
An arbitrary gaussian, not necessarily normalised, has form: . This has centre D which we equate with , and maximum height C which we set to the above expression. We can fix the final parameter, the standard deviation, by matching the second derivatives at the turning point. Hence the variance is:
Hence our gaussian approximation may be expressed:
The integral of the original curve turns out to be:
This uses the binomial coefficient , which is extended to non-integer values by replacing the factorials with Gamma functions. We could then apply Stirling’s approximation to each factorial, to obtain:
though this is more messy to write out. On the other hand, the integral of the gaussian approximation is:
We evaluated this integral over all real numbers, because the expression is simpler and still approximately the same. The ratio of the above two expressions is .
We continue with the bridge-crossing scenario from Squid Game called “Glass Stepping Stones”. Here I analyse the probabilities for late contestants on a very long bridge, and the expectation value for a player’s progress. Last time I found an exact expression for the probability P(i,n) that player number i is still alive on the nth step. Now seems a good place to mention there are equivalent expressions, such as:
where is called the hypergeometric function, and the other term is the binomial coefficient which is read as “n choose i”. The factor seen previously has been absorbed. We listed several special cases of the probabilities last time. Another is:
So remarkably, the chance player i will be alive on step 2i – 1 is precisely 50%! For fixed large i, if we plot the probability distribution as a function of n it looks smooth, remaining near 1 for early steps before rolling down to near 0. Qualitatively this looks like a , tanh, or erf (“error function”). We reflect these curves, centre them on the value 1/2 at n = 2i – 1, and scale them linearly: so they have the appropriate bounds and match the slope at the centre point. See the Figure below.
In fact the slope used in the Figure is only an approximation as described next, but this is a deliberate choice to show it still gives a good fit. The exact slope evaluated at n = 2i – 1 seems a little too complicated to be useful. It contains a derivative of the hypergeometric function, which appears to approach -1/2 in the limit of large i, hence the slope at the centre point is asymptotic to . Another approach is to consider the subsequent bridge step, for which:
which uses the Gamma function. The difference P(i,2i) – P(i,2i-1) approximates the slope, and is also asymptotic to as . Hence our approximation for late players is:
Now the error function by definition is the integral of a gaussian curve. The derivative with respect to n of our approximation is precisely the righthand side below, which itself approximates the chance a late player dies on that step:
For fixed i this is a gaussian distribution with centre n = 2i – 1 and standard deviation . It is normalised in the sense its integral over is exactly 1, but physically we want the discrete sum over . For the 10th player this is approx. 0.9991, which is already close. The exact chance for dying on a given step was determined in the previous article to be . The Figure below shows some early values. As before, we can extend the function beyond integer parameters.
The ratio precisely. Hence for a given player, the adjacent steps n = 2i – 2 and 2i – 1 are equally likely locations their game will be “discontinued”. This is surely the maximum assuming integer parameters, apart from the first player for whom step 0 is safe but step 1 is their most likely “resting place”. Hence the reader might prefer to translate our gaussian approximation by half a step or so; apparently there are various approximations to a binomial coefficient. The subsequent step n = 2i is a more likely endpoint than the earlier step 2i – 3.
The expectation value for a given player’s death is:
This function is very close to 2i, apart from a small oscillatory wiggle. At integer i it is singular, but from inspection of its plot it may be extended to a continuous function with value precisely 2i on physical parameter values (that is, integer i). Finally, for a given step n, the probability that some player breaks a tile is:
precisely, which is unsurprising. (Better terminology would be the nth “column”, as Henle+ use.) This assumes that n or more players have finished their run, otherwise the step is less likely to be broken.
Update, May 19: The death chance is ½ times a binomial distribution in n. We previously found a gaussian curve for a given player i. Now. for a fixed step n, the de Moivre-Laplace approximation is a gaussian over the player number i:
Last time I analysed the bridge-crossing scenario in the series Squid Game. In this fictional challenge called “Glass Stepping Stones”, the front contestant must leap forward along glass panels, choosing left or right each time, knowing only that one side is strengthened glass while the other will shatter. At least later players may learn from the choices of their forerunners. Here I use combinatorial arguments, derive a recurrence relation for the chance to die on a given step, and obtain an analytic solution with a hypergeometric function.
Again, write or equivalently P(i,n) for the probability the ith player is still alive on the nth step. We showed these probabilities satisfy the recurrence relation , along with initial conditions , and for all players after the first. Equivalently, we can start from , and for . This is a bit like Pascal’s triangle. Rather than adding the previous two terms, we take their average — which of course is the sum divided by two. And rather than 1’s at the sides, we have 0’s and 1’s.
Let’s write for the likelihood the ith player will die upon landing on the nth step. Then . These values satisfy the same recurrence relation as before:
Only the initial conditions are different: , and for all players after the first. It is aesthetic to begin a step earlier: , except for . The Table below shows a few early entries.
Table: Probability player i will die on step n itself, given no foreknowledge
step:
n = 1
2
3
4
5
6
7
8
player:
i = 1
1/2
1/4
1/8
1/16
1/32
1/64
1/128
1/256
2
0
1/4
1/4
3/16
1/8
5/64
3/64
7/256
3
0
0
1/8
3/16
3/16
5/32
15/128
21/256
4
0
0
0
1/16
1/8
5/32
5/32
35/256
5
0
0
0
0
1/32
5/64
15/128
35/256
6
0
0
0
0
0
1/64
3/64
21/256
Alternatively, there are elegant combinatorial arguments, for which I was initially inspired by another blog . For player i to die on step n, the previous i – 1 players must have died somewhere amongst the n – 1 prior steps. There are “n – 1 choose i – 1″ ways to arrange these mistaken steps, out of total combinations of equal probability. Given any such arrangement, the next player has a 50% chance their following leap is a misstep, hence:
(I originally found this simple formula in a much more roundabout way, as often happens!) If i > n, the probability is zero. By similar reasoning, the chance that precisely i players have died by step n (inclusive) is:
A draft paper (Henle+ 2021 ) gives this result. It may also be obtained by summing over the previous formula: . Note if player i died on n′, the next player must make n – n′ correct guesses in a row, so that no-one else dies by the nth step.
Now the probability the ith player is alive at the nth step or further, is the probability any number of previous players died by step n or before. (So what is ruled out is i or more dying by this stage.) This is just a sum over the previous displayed formula: , which computer algebra simplifies to:
Here is called the “(ordinary) hypergeometric function”. I gave a limited table of these probability values in the previous blog. For fixed integer , the entire expression reduces to a polynomial in n of order i – 1 with rational coefficients, all times . For example the likelihood the 5th player is alive at step n is:
In general, for n < i we have , so players get some steps for free. The diagonal terms are . This makes sense because for player i to not be alive on the ith step, every leap by previous players must also have been a misstep. Some results like these may also be shown using induction and the recurrence relation. I give more special cases in the next blog post. Yet the general formula works even for non-integers, though this is not physical, as the Figure below shows. For negative parameters (not shown) it has a rich structure, with singularities, and some probabilities values negative or exceeding 1.
An alternative derivation of the probabilities is based on where the previous player died (if at all). If that player i – 1 died on step n′ < n, their follower must make n – n′ correct guesses in a row to reach step n safely. Now sum the result from n′ = i – 1, which is the earliest step upon which they may conceivably die, up to n′ = n – 1. Add to this the chance the player was still alive at step n – 1 (which is one minus the sum of chances they died on step n′) as this guarantees the following player i is alive at n. Numerical testing shows the result is indeed equivalent. Hence rather than summing over players for a fixed step, one may instead sum over steps for a fixed player.
In the popular Korean series Squid Game, one episode features a bridge-crossing game, whose probabilities are a fun challenge to calculate. (Warning: partial spoilers ahead.) In this cruel fictional scenario, called “Glass Stepping Stones” in the English subtitles, glass panels are suspended above a long fall. Contestants must leap between them. At each step the leading player is forced to choose left or right, knowing only that one panel is made of ordinary glass which will shatter, and the other is strengthened (“tempered”) glass which will hold. Later contestants cross the same bridge, and watch all previous attempts, so can learn the successes and failures.
The odds are simple for the first player. On each leap forward, there is a 50% chance they will fall to their death. Hence the chance of surviving N steps is , an exponential decrease. In the show (~30 minute mark), one player actually calculates this: 15 untested steps remain ahead of him, for a horrifyingly low 1/32768 chance of survival from that point. (Actually this is the third player, but more on that later.)
But suppose we do not know the outcome of earlier players. At the start, before anyone has moved, what is the probability say, that player number i will still be alive on step number n? We showed . For player 2, it is certain they will survive step 1, by copying the first player if they were successful, or switching to the opposite pane if not. By extension player i is certain to survive the first i – 1 steps, hence for all .
In general, we set up a recurrence relation. But consider firstly the case i = 2. What is the chance they are alive at step n? If the first player died on step 1 (I mean, they leaped from the starting platform to an ordinary glass panel at step 1), then their successor must guess n – 1 tiles to reach step n successfully (I mean, to still be alive on panel n, and not fall through it). The probability of this combination of events is . Similar reasoning applies to any step up to n – 1. However if the first player is still alive on n – 1, their follower is guaranteed to reach step n successfully. (Any later performance of the first player is irrelevant to their successors at step n.) The overall probability is the sum over these possibilities, which for an arbitrary player is:
This gives the probability in terms of the previous player. (Note the term in parentheses is the chance the previous player will die on step precisely.) Hence starting from the initial conditions given earlier, we may build up an array of values using a spreadsheet, computer program, or computer algebra system. The latter choice preserves exact fractions, which feels very satisfying. Also we define for convenience, where “step 0” may be interpreted as the ledge contestants safely start from. The Table below gives the first few values.
Table: Probability player i is still alive by step n, given no foreknowledge
step:
n = 1
2
3
4
5
6
7
8
player:
i = 1
1/2
1/4
1/8
1/16
1/32
1/64
1/128
1/256
2
1
3/4
1/2
5/16
3/16
7/64
1/16
9/256
3
1
1
7/8
11/16
1/2
11/32
29/128
37/256
4
1
1
1
15/16
13/16
21/32
1/2
93/256
5
1
1
1
1
31/32
57/64
99/128
163/256
6
1
1
1
1
1
63/64
15/16
219/256
In Squid Game the bridge has 18 (pairs of) steps. The probability of crossing the entire bridge is the probability of being alive on step 18, as the next leap is to safety. In theory the 9th player has nearly even odds of making it: , and the next player likely will: . In the show, 16 players compete in this challenge, so the last player has excellent odds, supposedly: . However our analysis does not account for human behaviour! In the show, time pressure, rivalries, and imperfect memory compete with logical decision making and the interests of the group as a whole. On the other hand, some players claim to distinguish the glass types by sight or sound, which would give an advantage. These make interesting plot elements, but would spoil the simplicity and purity of a mathematical analysis.
Returning to the recurrence relation, it simplifies to:
Hence each term is just the average of two previous terms. However I wanted to derive this via direct physical interpretation, not algebraic manipulation alone. This is intuitively satisfying. With the end result in mind, we relate to the previous step and previous player. [Update, 21st April: A simpler way is to consider step 1. If player 1 guesses breaks it, there are i – 1 remaining players for the next n – 1 steps. If player 1 instead guesses correctly, there are i players for the next n – 1 steps. This gives the recurrence relation.] Consider the three cases for player i – 1: they (A) died before step n – 1, (B) died on step n – 1, or (C) made it safely to step n – 1 or further. The total probability is the sum of these cases:
The first term for example is the “conditional probability” that i is alive at step n, given that case A occurred; times the probability of case A itself occurring. There is a similar decomposition to the above for P(i,n-1). Now most parts of the expression are straightforward. If i – 1 died at step n – 1, then the next player is definitely safe at that step, but may only guess at the following step, so P(i,n-1|B) = 1 and P(i,n|B) = 1/2. If i – 1 was safe at step n – 1 or further, then the next player is safe for an extra step: P(i,n-1|C) = 1 = P(i,n|C). For case A the conditional probabilities are more difficult, but we do not need to calculate them. Observe that if the previous player died before n – 1, then steps n – 1 and n are uncharted territory. Hence the chance the following player makes it to n safely, is half of whatever it was for them to reach n – 1 safely: . Hence the decomposition becomes:
But this is just apart from the C term, as seen from expanding out the conditional cases. Now . It follows as before. We did not need to evaluate P(A) or P(B), though this is straightforward.
Now that the reader (and author!) have more experience with conditional probability, let’s return to the third player in the Squid Game episode. Before anyone moved, he had chance of surviving the bridge. This would seem to contradict the earlier calculation, which gave a lower chance by a factor of 21½, a surprising contrast! The black-masked “Front Man” said to the VIP observers, “I believe this next game will exceed your expectations” (~12:30 mark), but in this sense it did not 😆 . The distinction is the information learned. Conditional probability is a subtle and beautiful thing. If we know nothing about the previous attempts, nor the state of the bridge, the probabilities are our variables . But if we are given the information player I died on step N for instance, then the following player has no information about later steps, and the bridge scenario is essentially reset from that point onwards. Hence .
This scenario has been a valuable learning experience, as I had not worked with conditional probabilities before. Probability is very important in physics, particularly quantum physics where it is intrinsic (it is usually assumed). I originally came up with an incorrect recurrence relation, but realised this upon comparison with an article in Medium , which uses an elegant combinatorial argument. The scenario had already captured my attention, but realising my flaw drove further my need to understand. A related article is also helpful; I recommend these if you find my discussion hard to follow. There is even a draft paper on the Squid Game bridge probabilities! Presumably this is all little more than a specific application of textbook combinatorics. Still, it is fun to rediscover things for oneself.