In previous posts, I’ve talked about the importance of naming to make it easier for whoever maintains the code. One thing I’ve left out is the importance of not misusing standard names.
Naming Fail
In a previous position, I was manipulating a collection of data, when I hit a bug. After running some new code, the collection was a couple of items shorter than it had been. After quite a bit of troubleshooting, I tracked the problem to a method on the collection object named sort
. I was somewhat surprised to find that this method removed items from the container while sorting. In fact, this was such a surprising result, that I didn’t even check for it during most of the troubleshooting. What I discovered was that sort
didn’t just sort the collection, it also removed duplicates.
Unfortunately, this violates the Principle of Least Surprise. Every (other) sort algorithm I have ever seen has taken a list or array and returned another with the following properties:
- Contained only elements from the original
- Contains all elements from the original
- Are ordered based on the supplied comparison
Since generating a unique sorted list is useful, many libraries have some form of uniq
method. In a few cases, you might find a combined method or an option that can be supplied to the sort function. (An example would be the -u
options for the Unix sort
utility.)
Better Name
In the example, the method did exactly what it was designed to do. The original programmer even argued that we were never going to want the container to have duplicates in this system. So having a sort
that didn’t remove duplicates was useless. (I also argued that we should have avoided inserting duplicates into the container, if there should never be duplicates.) Although he had a point, the name is still wrong. The method should have been called sort_and_uniq()
or unique_sort()
or even sort_u()
. Any other name would have served as a warning to any maintainer that the method does more than just sort.
Using the name sort
in that case is a bad name, because it violates the expectations of the maintenance programmers.