Tag: probability
Array shuffling
by selene on May.27, 2012, under Blog
Array shuffling or randomization is one of those things that’s easy to get wrong in subtle ways.
The naive way looks something like this (Actionscript):

for (var i = 0; i < myArray.length; i++)

{

var swap = Math.random() * (myArray.length1);

var temp = myArray[i];

myArray[i] = myArray[swap];

myArray[swap] = temp;

}
Understanding what’s wrong with it requires a little bit of basic probability. Let’s say you have a deck of 3 cards, ABC. How many ways can you arrange them? If we write them out, we find a total of 6 arrangements:
 ABC
 ACB
 BAC
 BCA
 CAB
 CBA
Writing them all out is fine for 3 cards, but what if you have a full deck of 52? Even 5 cards will be a headache. So let’s use a bit of math. Here are the 5 card slots we have:
__ __ __ __ __
For the first slot, any of the 5 cards, ABCDE, can go in it. There are 5 choices for the first slot:
5 __ __ __ __
Once we’ve put a card in the first slot, there are 4 cards left. Any of them can go in the second slot. If we drew A first, then the second slot could be any of BCDE. If we drew B first, then ACDE are all possible, and so on:
5 x 4 __ __ __
Similarly, the third slot has 3 possible cards that can go in it:
5 x 4 x 3 __ __
And we can fill in the rest:
5 x 4 x 3 x 2 x 1 = 120
You may recognize this as the factorial function, written as n!, where n is the first number. An ideal shuffle would have an equal chance of producing any of the n! possible arrangements. How does the naive method I showed above stack up?
For a size 5 array, there’s an equal chance of any of the 5 elements being assigned to the first index:
5 __ __ __ __
For the second index, we pick another element from the whole array, meaning there’s 5 possibilities:
5 x 5 __ __ __
In fact, for every index in the array, every element has an equal chance of being chosen:
5 x 5 x 5 x 5 x 5 = 3,125
3,125 is a lot more than 120, and since 3,125 / 120 = 26.041666…, there’s no way to evenly divide the results. What happens is that multiple sequences of swaps will lead to the same arrangements, with some more common than others. e.g. if we start from ABCDE, both of these will arrive at BCDEA:
 Swap 0 and 1 — BACDE
 Swap 1 and 2 — BCADE
 Swap 2 and 3 — BCDAE
 Swap 3 and 4 — BCDEA
 Swap 4 and 4 — BCDEA
and
 Swap 0 and 4 — EBCDA
 Swap 1 and 0 — BECDA
 Swap 2 and 1 — BCEDA
 Swap 3 and 2 — BCDEA
 Swap 4 and 4 — BCDEA
What we really want is something that gives each arrangement exactly 1 chance of appearing. Here’s some code that does the trick:

for (var i = 0; i < myArray.length; i++)

{

var swap = i + Math.random() * (myArray.length – i);

var temp = myArray[i];

myArray[i] = myArray[swap];

myArray[swap] = temp;

}
It goes through the array in order from 0 up to myArray.length, picking an element that hasn’t been swapped each time. So the first element has n choices to swap with, the second element has n1, and so on, just like we calculated. If we go through the array in reverse, we can simplify the calculation for what index to swap with, and get something called the FisherYates shuffle:

for (var i = myArray.length–1; i >= 0; i–)

{

var swap = Math.random() * i;

var temp = myArray[i];

myArray[i] = myArray[swap];

myArray[swap] = temp;

}
If you need to shuffle an array, use the FisherYates shuffle!
This blog post was prompted because I recently came across some shuffling code that produces very skewed results, even worse than the naive method:

for (var i = 0; i < myArray.length; i++)

{

var swapIndex = Math.random() * (myArray.length1);

var temp = myArray[swapIndex];

myArray.splice(swapIndex, 1);

myArray.push(temp);

}
Don’t use this. Seriously. Out of curiosity, I wrote a Python script to test how badly skewed the results are. It runs this shuffle, the naive, and the FisherYates 500,000 times on a 4element array, and outputs what fraction of the time each arrangement comes up. I put up the results in a Google spreadsheet. You can see that FisherYates’ results are pretty much the ideal. The naive is somewhat distributed, but arrangements starting with 2 will come up more than others. And the last shuffle is quite terrible, with 1 having a 34% chance of being the first element. It also takes longer, since it has to rearrange the elements of the array with every swap.
So in conclusion, use the FisherYates shuffle. It’s fast and easy. Better yet, see if your language has a builtin array shuffle. Both PHP and Python do.