I recently investigated our analysis code for the source of unwanted randomness. The culprit turned out to be the set-datatype in Python. For example, running following code produces varying order of the items between different runs:
idList = [ "ID" + str(i) for i in range(10) ]
idSet = set(idList)
print(idSet)
I asked the question in StackOverflow and in a few minutes jonrsharpe kindly pointed out the source of the randomness. However, because the question was soon closed (and rightly so) as a duplicate of a similar question, I decided to make a short post for the benefit of other people wondering the strange behaviour.
- Set order is implemented using hash table.
- For security purposes, since Python 3.3 a random seed has been included in hashes.
- Disabling the random hash seed (
export PYTHONHASHSEED=0
) produces reproducible results between runs.