CodeSOD: Ordering the Hash

Last week, we took a look at a hash array anti-pattern in JSON. This week, we get to see a Python version of that idea, with extra bonus quirks, from an anonymous submitter.

In this specific case, the code needed to handle CSV files. The order of the columns absolutely matters, and thus the developer needed to make sure that they always handled columns in the correct order. This led to code like this:

FIELD_NAME_ORDER = collections.OrderedDict({ 1: 'Field1', 2: 'Field2', # etc. There are over a hundred fields. }) # Elsewhere in the code, the only usage of FIELD_NAME_ORDER... for field_name in FIELD_NAME_ORDER.values(): AddField(field_name)

Now, the first thing you notice is that this is, once again, a hash array. The keys are the indexes. It doesn't look like that much of a WTF, and you'll note the use of OrderedDict which ensures that the dictionary retains insertion order. So this is just a silly little block of code…

Except, there are a few problems. First, starting around Python 3.7, OrderedDict became the default data structure for all dicts, so you don't really need the OrderedDict constructor in there. That's no big deal, except that prior to that version, a dictionary literal like {1: 'Field1', 2: 'Field2'} wouldn't be represented as an ordered dict- it would just be a hash, which means the order of the keys is arbitrary.

From the docs:

Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.

Now, this code targets Python 2.7, which is old and out of support, and clearly TRWTF. But it 2.7, this absolutely was how dictionaries worked, so this code, on the surface, shouldn't work. But it does, and the reason isn't surprising once you think about it: what would you expect the unique hash of the number 1 to be?

CPython, the main implementation of Python, quite reasonably hashes ints to their value: hash(1) == 1. Non-OrderedDicts sort the keys in the order of their hash values. So the dict literal will iterate in the order of the numeric keys, and when we insert that into an OrderedDict it will preserve the insertion order, which is the numeric order.

The developer who wrote this blundered into a working solution by what appears to be an accident.

Our anonymous submitter took the extra few seconds to replace the OrderedDict with a list, which, y'know, is already going to guarantee order without you needing to blunder into how hashes work.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!

This post originally appeared on The Daily WTF.

Comments are closed.