# Weighted Random Shuffling

For some application I needed a random shuffle $r$ of a list of numbers $a$. The shuffle, however, should be weighted, such that, given a list of weights $w$, the element $a_i$ with weight $w_i$ should be the first element of the random shuffle $r$ with probability $w_i/\sum_k w_k$. I searched for a python implementation of such an algorithm, but I couldn’t find one. Also, numpy doesnt’ provide it, and numpy.random.shuffle only offers shuffling with uniform weights. Thus, this code.

## Weighted shuffling in pure python

Weighted random shuffling is the same as weighted random sampling from a list $a$ without replacement. That is, choose with probability $w[i]/sum(w)$ element $a[i]$ from $a$. Store this element in a list $r$. Then, remove element $a[i]$ from $a$ and $w[i]$ from $w$, and select a new element of the modified list $a$, and so on until $a$ is empty.

From Eli Bendersky’s site I got the code below for weighted random sampling.

from random import random

def weighted_choice(weights):
rnd = random() * sum(weights)
for i, w in enumerate(weights):
rnd -= w
if rnd < 0:
return i


Thus, weighted_choice gives an index to the chosen element of an array. With this index I can fill a random shuffle $r$ with elements of $a$.

As removing elements from lists is a bit expensive, I chose to set the weight $w[i]$ of the chosen element to zero. This ensures that the element $i$ cannot be selected the next time.

def weighted_shuffle(a,w):
w = list(w) # make a copy of w
if len(a) != len(w):
print("weighted_shuffle: Lenghts of lists don't match.")
return

r = [0]*len(a) # contains the random shuffle
for i in range(len(a)):
j = weighted_choice(w)
r[i]=a[j]
w[j] = 0
return r


## Fast shuffling

Once I had the above code, I went to Gerlach van der Heide. He managed to speed it up by several orders of magnitude.

from random import random
from bisect import bisect_right
import numpy as np

def weighted_shuffle(a,w):
r = np.empty_like(a)
cumWeights = np.cumsum(w)
for i in range(len(a)):
rnd = random() * cumWeights[-1]
j = bisect_right(cumWeights,rnd)
#j = np.searchsorted(cumWeights, rnd, side='right')
r[i]=a[j]
cumWeights[j:] -= w[j]
return r

a = np.arange(1,1000)
w = np.arange(1,1000)
r = weighted_shuffle(a,w)


Here is the result

print(r[:2])

[933 624]


Some testing shows, for a reason unknown to me, that np.searchsorted is slower than bisect_right.

## A Wrong idea

While searching the web I came across the one-liner below. It is elegant, but wrong. The weights according to which the elements of $a$ are chosen are not in proportion to $w$. Some basic probability theory shows why this is the case. It comes down to proving that

where $w_i$ are the weights and $U_i$ uniform random deviates on $[0,1]$.

def wrong(a,w):
aa = range(len(a))
aa.sort(key=lambda i: w[i]*random()*beta[i])
return [a[i] for i in aa]
`