Weighted Random Shuffling
For some application I needed a random shuffle $r$ of a list of numbers $a$.
The shuffle, however, should be weighted, such that, given a list of weights $w$, the element $a_i$ with weight $w_i$ should be the first element of the random shuffle $r$ with probability $w_i/\sum_k w_k$.
I searched for a python implementation of such an algorithm, but I couldn’t find one.
numpy doesnt’ provide it, and
numpy.random.shuffle only offers shuffling with uniform weights.
Thus, this code.
Weighted shuffling in pure python
Weighted random shuffling is the same as weighted random sampling from a list $a$ without replacement. That is, choose with probability $w[i]/sum(w)$ element $a[i]$ from $a$. Store this element in a list $r$. Then, remove element $a[i]$ from $a$ and $w[i]$ from $w$, and select a new element of the modified list $a$, and so on until $a$ is empty.
From Eli Bendersky’s `site I got the code below for weighted random sampling.
from random import random def weighted_choice(weights): rnd = random() * sum(weights) for i, w in enumerate(weights): rnd -= w if rnd < 0: return i
weighted_choice gives an index to the chosen element of an
array. With this index I can fill a random shuffle $r$ with elements
As removing elements from lists is a bit expensive, I chose to set the weight $w[i]$ of the chosen element to zero. This ensures that the element $i$ cannot be selected the next time.
def weighted_shuffle(a,w): w = list(w) # make a copy of w if len(a) != len(w): print("weighted_shuffle: Lenghts of lists don't match.") return r = *len(a) # contains the random shuffle for i in range(len(a)): j = weighted_choice(w) r[i]=a[j] w[j] = 0 return r
Once I had the above code, I went to Gerlach van der Heide. He managed to speed it up by several orders of magnitude.
from random import random from bisect import bisect_right import numpy as np def weighted_shuffle(a,w): r = np.empty_like(a) cumWeights = np.cumsum(w) for i in range(len(a)): rnd = random() * cumWeights[-1] j = bisect_right(cumWeights,rnd) #j = np.searchsorted(cumWeights, rnd, side='right') r[i]=a[j] cumWeights[j:] -= w[j] return r a = np.arange(1,1000) w = np.arange(1,1000) r = weighted_shuffle(a,w)
Here is the result
Some testing shows, for a reason unknown to me, that
np.searchsorted is slower than
A Wrong idea
While searching the web I came across the one-liner below. It is elegant, but wrong. The weights according to which the elements of $a$ are chosen are not in proportion to $w$. Some basic probability theory shows why this is the case. It comes down to proving that
where $w_i$ are the weights and $U_i$ uniform random deviates on $[0,1]$.
def wrong(a,w): aa = range(len(a)) aa.sort(key=lambda i: w[i]*random()*beta[i]) return [a[i] for i in aa]