How Keras Users Write Code and How to Design APIs Accordingly

How does a Keras user begin writing code?

While I cannot speak for all users, I can speak to the journey of an advanced Keras user. I have been using Keras for almost 6 years. Despite this, when I begin a new project my first stop is always keras.io! Need to override train_step? I go to keras.io for a boilerplate. Need something specific that isn’t on keras.io? I search public GitHub repositories.

The takeaway here is that in 90% of cases, Keras users copy and paste their code. Most of the novel code they write will glue together building blocks, modify hyperparameters, load custom data, and tweak the organization of components. While in some domains (data structures, fetch APIs in the web, data querying) users write code based on the API spec being memorized, this is a minority case for Keras. Most Keras users do not think to themselves: “I need a convolutional layer”, go to the API documentation, look at the API, and write the layer themself based on the documentation.

This is actually quite unique to Keras, and I believe it to be driven by the fact that many users do not actually have a deep machine learning background. I also attribute the library's great success to its ability to be so “Copy pasteable”.

Recently, in KerasCV we were designing an API to manage bounding box formats. We narrowed the API down to two options.

1.) Annotate Tensors with a format using a utility function:

images, bounding_boxes = load_data()
bounding_boxes = keras_cv.data.bounding_box.annotate_format(
    bounding_boxes, format="coco"
)

data = {"images": images, "bounding_boxes": bounding_boxes}

preprocessing_model = keras.Sequential([
  keras_cv.CutMix(),
  keras_cv.MixUp(),
  keras_cv.RandAugment(value_range=(0, 255))
])

augmented_data = preprocessing_model(data)

and

2.) Explicitly receive a bounding box format in every layer that requires a format

images, bounding_boxes = load_data()
data = {"images": images, "bounding_boxes": bounding_boxes}

preprocessing_model = keras.Sequential([
  keras_cv.CutMix(box_format="coco"),
  keras_cv.MixUp(box_format="coco"),
  keras_cv.RandAugment(
      value_range=(0, 255),
      box_format="coco"
  )
])

augmented_data = preprocessing_model(data)

Let's Compare the Options

On the surface, these two options do not look all that different. However, let’s think about this from the view of a hypothetical Keras user. This user has an existing pipeline fitting a RetinaNet on some images with some PascalVOC boxes:

images, bounding_boxes = load_data()
retinanet.fit(images, data)

They stumble upon some code somewhere, perhaps in a repo that Keras does not control:

preprocessing_model = keras.Sequential([
  keras_cv.CutMix(),
  keras_cv.MixUp(),
  keras_cv.RandAugment(
      value_range=(0, 255),
  )
])

def augment_data(images, boxes):
  return preprocessing_model({"images": images, "bounding_boxes": boxes})

“Perfect!”, our unsuspecting user thinks. “All I have to do is copy and paste the preprocessing model into my pipeline and I’ll have an augmented dataset”. They press on, and are greeted with an error:

ValueError: `augment_bounding_box()` received a bounding box tensor without a "bounding_box_format" annotation.  Please annotate your bounding boxes with `keras_cv.data.bounding_box.annotate_format()`.

They carry on, they Google keras_cv.data.bounding_box.annotate_format(), they read 3 articles on the topic, they eventually figure it out, and it works. They’re slightly frustrated, the pipeline is somewhat opaque, and all in all we wasted about 30 minutes of their time. They annotate their boxes with PascalVOC, and carry on. This isn’t the end of the world, but let’s look at the alternative.

preprocessing_model = keras.Sequential([
  keras_cv.CutMix(box_format="xywy"),
  keras_cv.MixUp(box_format="xywh"),
  keras_cv.RandAugment(
      value_range=(0, 255),
      box_format="xywh"
  )
])

def augment_data(images, boxes):
  return preprocessing_model({"images": images, "bounding_boxes": boxes})

Our user finds the code, presumably in a repo that Keras does not own. They notice: “Oh! CutMix, MixUp and RandAugment take a bounding box format. We are using PascalVOC.”

They update the string “coco” to PascalVOC and integrate the pipeline. They copy and paste the code, and it works. That is all, no digging in the docs, no error after running a pipeline that may take hours to get to the augmentation point (depending on the resource allocation and data loading process…), and they land on the result.

So, which option is better?

There are two key points that I’d like to highlight here: Option 2 is more copy pasteable and relies only on changes to the “glue code”. It is clear that there IS glue code no matter where the user finds the code, and it is clear that the format will need to match your data format.

Additionally, option 2 is self documenting. The augmentation layers rely on bounding box format. You must provide one, and this is clear as a code reader.

For these two reasons, Option 2 is a better option for the Keras ecosystem. This is the option we have decided to proceed with.

Takeaways For an API Owner

The first takeaway is: know your user base! In Keras, most users aren’t writing new Keras code. They’re writing new glue code, they’re modifying structure, but they typically aren’t writing keras.Conv2D(filters=filters, …) themselves. They find these blocks in keras.io tutorials and various samples around the internet. Knowing this, we can design APIs for these types of users.

The second takeaway is to prefer clarity over terseness. There is a temptation to make APIs as terse as possible: this is often done providing defaults for all sorts of values, even at the cost of hiding expectations about the input format of the data.

The tf.image API is a prime example of this. If you have never used tf.image, you have no idea what the expected input format is. The input format is unclear from the code itself:

image = tf.image.random_hue_adjust(image)

What is the format of the image? Do you know? If you do, it is probably because you have used tf.image, or been bitten by the fact that you got this wrong once and paid the price of tons of debugging time.

The answer is if the image is a Tensor of type float it has values in the range [0, 1]. If the image is a Tensor of type int, the value range is [0, 255]. How does this compare to an explicit API?

random_hue = keras_cv.layers.RandomHue(value_range=(0, 255))
image = random_hue(image)

What is the image format? Even an image processing beginner will be able to tell you that the values are in the range [0, 255]. It is written explicitly in the code: and better yet the configuration will be carried forward every time the code is copy pasted into a new repo.

The final takeaway: as an API owner write explicit, self correcting, self documenting, copy pasteable APIs. Don’t force defaults on your users when there are legitimate choices at play.

How does a Keras user begin writing code?

A Related Anecdote

Let's Compare the Options

So, which option is better?

Takeaways For an API Owner