Randomness in Python set-structure


I recently investigated our analysis code for the source of unwanted randomness. The culprit turned out to be the set-datatype in Python. For example, running following code produces varying order of the items between different runs:

idList = [ "ID" + str(i) for i in range(10) ]
idSet = set(idList)
print(idSet)

I asked the question in StackOverflow and in a few minutes jonrsharpe kindly pointed out the source of the randomness. However, because the question was soon closed (and rightly so) as a duplicate of a similar question, I decided to make a short post for the benefit of other people wondering the strange behaviour.

  1. Set order is implemented using hash table.
  2. For security purposes, since Python 3.3 a random seed has been included in hashes.
  3. Disabling the random hash seed (export PYTHONHASHSEED=0) produces reproducible results between runs.
python 

See also