Published on

Argument For Using Atoms

Authors
  • avatar
    Name
    Luke Wood
    Twitter

In erlang atoms are a primitive data structure. These data structures are highly underrated and will (hopefully) become more common across programming languages in the near future. While similar to strings in structure, they are imbued with a few additional properties:

  • immutability
  • identical memory addresses between identical atoms

Atoms have first class support in Elixir or erlang. Atom construction is as simple as foo = :bar.

Many other languages, such as Javascript, have a somewhat similar type, the Symbol. Unfortunately in the most cases proper language level support is missing, rendering their use less readable.

This article deep dives into the importance of the first class atom type in erlang.

Explanation Through Example

It is a bit easier to explain the use of erlang's atom data type through example. The example in this blog post is written in Elixir, a functional programming language which compiles to erlang.

Names

Proper string usage in elixir is for containing dynamic data. Atoms on the other hand are names.

Examine the following Javascript code:

someObject = {
  "hello": "world"
}

So Why Atoms?

For the average user, string comparison takes up an absurd 22% of all compute cycles. This is due to the O(n) cost to string comparison. Operations on primitive types, such as integer addition, integer multiplication, etc, tend to be constant time. Many programmers are no away of the fact that string comparison is actually O(n).

This has led to strings being used carelessly all over the place. For example, in Javascript strings are used to index into just about everything.

How do atoms help with this? Their immutability and shared memory addressing allow us to perform constant time comparison, simply by comparing their memory address. This makes them an outstanding candidate to use as a key in a map.

Atoms in Action

String Indexing

Examine the structure for the Player type on the client side of bulletz.io:

interface Player {
  "position": Point;
  // ...
  "spritesheet": SpritesheetURL;
}

So how exactly would the program go about accessing the spritesheet property?

While object lookups are amortized O(1) with respect to the number of items in the map, they are O(n) with respect to the maximum key size. First, the program performs a lookup. In v8, the actual key lookup algorithm differs a bit depending on map size. For the sake of simplicity, we will assume are map to have more than 32 keys. This means that the underlying data structure of the object is a HashMap.

The engine will first hash the key, an O(keysize) operation. Then, the engine will perform a hash table look up to find the appropriate bucket in which the key would be contained. Following this, the engine must compare the string to each value in the bucket. Each of these operations is an O(keysize) operation.

This is not so bad in practice, but does add up over time.

Atom Indexing

Consider the same case, but with atom keys.

// the corresponding Elixir code on the server side
defstruct Player {
  position: Point,
  # ...
  spritesheet: string
}

Let's look into the corresponding access operation with atoms, accessing the key :spritesheet. In Elixir, structs are simply Maps under the hood. First, the Elixir runtime hashes the atom :spritesheet. Hashing an atom is as simple as hashing it's memory address, a constant time operation.

Then, the runtime compares each key to each value in the corresponding bucket: also a constant time operation. Fetching the result is O(1) with respect to map size, as well as with respect to key size.

Over time this adds up. Many attribute first class atom support to be one of the driving factors of the strong performance of erlang and Elixir systems.

So, What to Do With Your New Love for Atoms?

If you're using Elixir or erlang, use atoms whenever you logically can. The :syntax is simple and expressive.

If you're using a language without first class support, it's unfortunately not worth the readability tradeoff to use atoms. The one exception to this rule would be in writing performance critical software.

Next time you're hoping to squeeze out a few clock cycles from your system, consider using atoms!