# Weighted random shuffling

## Table of Contents

## 1. Weighted random shuffling

For some application I needed a random shuffle \(r\) of a list of numbers
\(a\). The shuffle, however, should be weighted, such that, given a list
of weights \(w\), the element \(a_i\) with weight \(w_i\) should be the first
element of the random shuffle \(r\) with probability \(w_i/\sum_k w_k\). I
searched for a python implementation of such an algorithm, but I
couldn't find one. Also, `numpy`

doesnt' provide it, and
`numpy.random.shuffle`

only offers shuffling with uniform weights. Thus,
this code.

## 2. Weighted shuffling in pure python

Weighted random shuffling is the same as weighted random sampling from a list \(a\) without replacement. In steps:

- Choose with probability \(w_{i}/\sum_{i} w_{i}\) element \(a_{i}\) from \(a\).
- Add this element to a list \(r\).
- Remove element \(a_{i}\) from \(a\) and \(w_{i}\) from \(w\).
- Stop if \(a\) is empty, else,
- Return to step 1 with the updated lists \(a\) \(w\).

However, I found the code below for weighted random sampling on this page.

```
from random import random
def weighted_choice(weights):
rnd = random() * sum(weights)
for i, w in enumerate(weights):
rnd -= w
if rnd < 0:
return i
```

Thus, `weighted_choice()`

gives an index to the chosen element of an
array. With this index I can fill a random shuffle \(r\) with elements of
\(a\).

As removing elements from lists is a bit expensive, I chose to set the weight \(w_{i}\) of the chosen element to zero. This ensures that the element \(i\) cannot be selected the next time.

```
def weighted_shuffle(a,w):
w = list(w) # make a copy of w
if len(a) != len(w):
print("weighted_shuffle: Lenghts of lists don't match.")
return
r = [0]*len(a) # contains the random shuffle
for i in range(len(a)):
j = weighted_choice(w)
r[i]=a[j]
w[j] = 0
return r
```

## 3. Fast shuffling

Once I had the above code, I went to Gerlach van der Heide. He managed to speed it up by several orders of magnitude.

```
from random import random
from bisect import bisect_right
import numpy as np
def weighted_shuffle(a,w):
r = np.empty_like(a)
cumWeights = np.cumsum(w)
for i in range(len(a)):
rnd = random() * cumWeights[-1]
j = bisect_right(cumWeights,rnd)
#j = np.searchsorted(cumWeights, rnd, side='right')
r[i]=a[j]
cumWeights[j:] -= w[j]
return r
a = np.arange(1,1000)
w = np.arange(1,1000)
r = weighted_shuffle(a,w)
```

Here is the result

```
print(r[:2])
```

Some testing shows, for a reason unknown to me, that `np.searchsorted`

is slower than `bisect_right`

.

## 4. A wrong idea

While searching the web I came across the one-liner below. It is elegant, but wrong. The weights according to which the elements of \(a\) are chosen are not in proportion to \(w\). Some basic probability theory shows why this is the case. It comes down to proving that \[P\{w_1 U_1 \leq w_2 U_2\} \neq \frac{w_1}{w_1+w_1},\] where \(w_i\) are the weights and \(U_i\) uniform random deviates on \([0,1]\).

```
def wrong(a,w):
aa = range(len(a))
aa.sort(key=lambda i: w[i]*random()*beta[i])
return [a[i] for i in aa]
```