Data Structure Identifiers

By | May 13, 2023

Any time you have a long-term data structure, it’s important to have some form of unique identifier used to look up or recognize the structure. It doesn’t matter if the long-term data is a database record, a document in some form of storage, an object, or whatever. If the data lives for a while, you’ll need to be able to find it and tell it from other similar data. (To simplify things, I’m going to use the term entry for an instance of one of these data structures going forward.)

Primary Identifier (ID)

If the identifier is not unique, then you’ll never be able to tell which of the entries with the same ID is the real/most recent one. Most developers are aware of this concept if they’ve done any development that accesses a database. In database contexts, it is usually called a primary key.

There are a number of ways to generate these unique IDs. A constantly increasing integer is not a bad start. A real GUID is better in some ways. If you can’t guarantee that all of the places that create the entries can use the same system to generate your IDs or if you don’t want people to guess IDs easily, the GUID is the better solution.

In some cases, people use a hash of the contents of the entry. That only works if the entry is immutable. Said another way, if changing the data in any way results in a new entry, then the hash is a pretty good solution. (Think git commit hashes.) If you expect to update the entry and still have it refer to the same thing, then it’s not immutable, and a hash would be inappropriate.

One thing you don’t want to do is use part of the data of the entry, especially anything that would be meaningful to a user (e.g. the title of an article).

Display Name

At one of my early programming jobs, we had a database of articles that was the core of an on-line system. (This was pre-web, but close enough that most people will see the issues.) These articles were indexed by titles, and the system had been in use for a few years before we noticed that changing the title on an article, made a new copy, instead of changing the original. This caused quite a bit of a mess for a little while. We quickly separated the concept of ID from Title. (Over time, I’ve come to prefer DisplayName for this concept.)

The problem with using a displayable name as the unique key is that people will find a reason to want to change it. Once they can change it, this fake ID can stop being unique, or it can be changed and cause references to it to be lost. Separating the Primary ID from the DisplayName solves this problem.

External ID (slug)

Because the IDs used internally are not necessarily easy for humans to work with (and are likely to be long), it’s often useful to have a separate ID available for the outside world to request your entry. A CMS will often refer to this kind of external ID as a slug. This must also be unique, but the slugs would only be used in something like URLs or other external APIs. Separating these two IDs is not always necessary, but can sometimes be useful if the primary ID should not be exposed for some reason.

Type

Sometimes it’s useful to be able to group entries for some kinds of operations. I that case, it is sometimes useful to have a type associated with your entries. This kind of ID is obviously not unique, since that’s the whole reason for the type.

If you need more than one way of grouping, multiple tags would be more useful.

Revision

If you want to be able to change an entry, but still want the entries to be immutable, creating a new entry for each change with a revision value allows keeping a history of changes. The older versions could be stored in a separate table or archive storage.

Conclusion

These are some of the different ways that IDs can be used for entries. It’s important to be aware of the different ways you may need to use IDs for an entry. You may not need all of these uses, but if you do, you should really consider using separate IDs for each use case.

  • Primary ID
  • DisplayName
  • External ID (or slug)
  • Type (or Tags)
  • Revision

You wouldn’t believe how many times I’ve gone back to fix this.

Leave a Reply

Your email address will not be published. Required fields are marked *