A scalable data structure for your configuration database

Most software of reasonable complexity are configurable—that is, they allow you change their functionality through the use of flags/settings. While you could store configuration information in code or text files, for web apps, it’s more common to store them in databases. This allows you make changes to the configs at runtime i.e without having to restart or redeploy your app.

In this article, I’ll be exploring the ways I thought about improving the config storage mechanism in a large enterprise system I worked on, which made use of thousands of configs. The issues I identified were:

  • Configs scattered across multiple tables/databases and in different formats, making it difficult to provide a centralized view of the system’s configuration state: Want to know if feature X is enabled? Easy-peasy! You only need to check that these 5 rows in these 3 different tables have these specific values. And there’s a different mechanism for determining feature Y’s state. This was made worse by the fact that not everyone had access to all the databases where configurations were stored. The solution here is creating a standard format for storing your configs and storing them in a single location to make access, query and analysis easier.
  • Lack of schemas for configs, leading to type errors: One time, someone updated a config with invalid data which brought a business-critical part of the system to its knees. I also had my fair share of breaking production because of silly mistakes like typos. Having a schema to validate changes goes a long way in preventing human error like this.
  • Lack of documentation on what each config does: You had to read and understand the code to know how the configs worked. This takes time and is not feasible for non-coders (such as a non-technical product owner), who should be able to change the configs without the aid of a developer. While detailed documentation on how to use certain configs would be invaluable to your team, a simple description would do a lot of good.
  • Lack of audit logging: Not all config modifications were logged so it was sometimes hard to figure out issues resulting from problematic configurations by correlating the changes with incidents. An audit logging system with view of what changed, how it changed and why goes a long way in helping an incident response team troubleshoot and resolve production issues. They could for example, query for all the changes that were made leading to an incident, and pull in the config owners/changers to help out.
  • Lack of access for developers/product owners: As a developer, I should be able to make changes to a config I created—or at lease someone on my team should. Having to open a change request ticket kind of defeats the purpose because it might even be faster to make the change in code and get it deployed. Also, a decision maker (such as a product owner or operations staff) should not have to pull in a developer when they want to change something. If they are allowed to make a decision about whether or not something should be changed, they should also be able to click a button or invoke an API endpoint to do so. Because only admin users could make changes to the config databases, it was unnecessarily stressful and time-consuming to get things done.
  • Lack of automation for config state changes: While we had circuit breakers for temporarily switching off or diverting traffic from failing services, we lacked the ability to temporarily toggle the switches ourselves. It was very common for someone to forget to switch something back on that they switched off a while ago. For instance, an on-call engineer would forget to switch a service that experienced a down time back on when the issue had been resolved. Having a functionality that allows the engineer to say “turn off service X for Y hours”, for example, could solve this.

Functional configs

The big takeaway from all these issues is that one should centralize their system’s configurations and organize them by function rather than product, service or feature. Of course, there are downsides to centralization such as introducing a bottleneck and a single point of failure, but for a system such as this, it’s a good trade-off in my opinion. Instead of having a config row, table or database for each product, service or feature, you could break down your configs by the following functions:

Feature flags

A feature flag is a config used for releasing new changes. There’s always some risk involved in deploying new code. You can manage that risk by putting new changes behind a feature. If something goes wrong, you can turn off your feature to mitigate the incident and then look into it later, instead of scrambling to deploy a fix or revert the change. Also, instead of releasing a feature to all your users, you could limit it to a small subset as a way to pilot the feature, and then gradually increase the traffic as you gain confidence that things are working as expected.

It’s important to factor in granularity when designing feature flags i.e

  • Binary flags: feature is enabled if flag is true and disabled when false.
  • Key-based flags: feature is only enabled for entities (such a users or merchants) whose keys (like user ID and merchant name) have been specified.
  • Percentage-based flags: feature is only enabled for a specified percentage of invocations e.g 10% of traffic.

You could create a features table for your feature flags like so:

id feature_name
description
enabled percentage enabled_for_entities entity_key cache_for_secs expires_at created_at updated_at
public (available to clients?)
deleted_at
1 GOD_CLASS_REFACTOR_SHADOW_MODE The class that does everything and single-handedly runs this business has been broken down into 100+ smaller classes for readability and extensibility. We are running the new changes in shadow mode to ensure the outputs are correct. Enabling this flag will result in increased latency because we’ll be running both old and new code. true 20 NULL NULL 60 2023-01-01 17:06:09 2022-12-01 09:41:09 2022-12-01 09:42:09 false NULL
2 SHOW_NEW_PRODUCT_PAGE We’ve redesigned the product page using our fancy new design system. We want to pilot it with a few users to get feedback before rolling it out to everyone. true 100 [1, 2, 69, 419, 420] users.id 180 2023-01-01 17:06:09 2022-12-01 09:41:09 2022-12-01 09:42:09 true NULL

Kill switches

A kill switch is just like a feature flag but it is permanent and meant for turning off components of your system when things go wrong or when you’re doing some maintenance. Say you’re an e-commerce store on Black Friday experiencing much more traffic than anticipated. You can use a kill switch to turn off non-business-critical parts of your system (like product recommendations) so that the more important components (like checkout and payments) can breathe.

You could create a switchboard table for your kill switches:

id switch_name
description
enabled percentage enabled_for_entities entity_key cache_for_secs created_at updated_at deleted_at public
1 PRODUCT_RECOMMENDATION_SERVICE Recommendation system that suggests new products to users based on their purchase and browsing history. false 69 NULL NULL 420 2022-12-01 09:41:09 2022-12-01 09:42:09 NULL true

Application configs

These are configurations for variables in your application that you might want or need to change at runtime e.g the default rate limit for an endpoint or the API URL for your payment service provider.

Say you need to send your users notifications via SMS, so you decide to build an SMS notification service for your product. Perhaps, there’s no one SMS provider that can send SMSes in all the countries you operate. More so, local providers offer better delivery rates and pricing. And you want to have backups should a provider experience a downtime. You have decided that the best solution would be to integrate multiple SMS providers. You could structure the configs for this application/service like this:

Table: data_types

id name
description
validator (JSON schema)
default_value
cardinality (max number of configs of a given type per table/account; -1 => infinite)
account_configurable (can an account holder modify this config?)
created_at updated_at deleted_at
1 SMS_PROVIDER An SMS service provider. {"$schema":"https://json-schema.org/draft/2020-12/schema","$id":"https://schemas.alphacoder.xyz/sms-provider.json","title":"SMS Provider","description":"An SMS provider.","type":"object","properties":{"name":{"description":"The name of the SMS provider.","type":"string"},"priority":{"description":"Priority of the SMS provider. We will prefer providers with higher priority. A lower value means a higher priority.","type":"integer","minimum":1},"available_countries":{"description":"Countries supported by SMS provider.","type":"array","items":{"type":"string","minLength":2,"maxLength":2},"minItems":1,"uniqueItems":true},"api_url":{"description":"SMS provider API URL.","type":"string"},"api_secret_alias":{"description":"SMS provider API secret alias.","type":"string"}},"required":["name","priority","api_url","api_secret_alias"]} NULL -1 false 2022-12-01 09:41:09 2022-12-01 09:42:09 NULL

Table: app_configs

id name data_type_id
value
created_at updated_at deleted_at
1 sms_bros 1 {"name":"SMS Bros","priority":1,"available_countries":["NL","NG","US"],"api_url":"https://sms.bros.io/send","api_secret_alias":"SMS_BROS_SMS_API_SECRET"} 2022-12-01 09:41:09 2022-12-01 09:42:09 NULL
2 fatsms 1 {"name":"Fat SMS","priority":3,"available_countries":["NG","GH"],"api_url":"https://fatsms.co/api/smses","api_secret_alias":"FAT_SMS_API_SECRET"} 2022-12-01 09:41:09 2022-12-01 09:42:09 NULL

Account configs

These are just like app configs, but for accounts. Say your product has multiple pricing plans, each with its set of features. You could use account config tables to store data on what features each user has access to based on the plan they paid for.

Table: data_types

id name
description
validator (JSON schema)
default_value
cardinality (max number of configs of a given type per table/account; -1 => infinite)
account_configurable (can an account holder modify this config?)
created_at updated_at deleted_at
2 FEATURE_UNLIMITED_UPLOADS Feature allowing users upload as many files as they want. {"$schema":"https://json-schema.org/draft/2020-12/schema","$id":"https://schemas.alphacoder.xyz/boolean.json","title":"Boolean","type":"boolean"} true 1 false 2022-12-01 09:41:09 2022-12-01 09:42:09 NULL
3 FEATURE_UNLIMITED_COLLABORATORS Feature allowing users permit as many other users as they want to view and edit their files. {"$schema":"https://json-schema.org/draft/2020-12/schema","$id":"https://schemas.alphacoder.xyz/boolean.json","title":"Boolean","type":"boolean"} false 1 false 2022-12-01 09:41:09 2022-12-01 09:42:09 NULL
4 FEATURE_PLUGINS Feature allowing users install plugins to customize their setup. {"$schema":"https://json-schema.org/draft/2020-12/schema","$id":"https://schemas.alphacoder.xyz/boolean.json","title":"Boolean","type":"boolean"} false 1 false 2022-12-01 09:41:09 2022-12-01 09:42:09 NULL

Table: account_configs

id data_type_id account_type account_id value created_at updated_at deleted_at
1 2 user 69 true 2022-12-01 09:41:09 2022-12-01 09:42:09 NULL
2 3 user 69 true 2022-12-01 09:41:09 2022-12-01 09:42:09 NULL

Config system features

Creating a scalable data structure for your configuration system is just one side of the coin. You also need to provide a feature set that makes the system operationally efficient.

Access control

Like I mentioned earlier, the people deciding what state your system should be in should also be able to make the changes necessary to achieve the desired state. Of course, at the same time, you need to have proper controls in place. Your config database is the control panel for your app, so you don’t want people who have no business interacting with it be able to gain access to it.

Audit logging

Since potentially lots of people would be making lots of changes to your config database, you want proper audit logging. The logs would be useful during incidents as described earlier, but also for security and documentation.

Change queue

A change queue is basically a way to queue changes to the config database. Its purpose is automation. For example, if you wanted to turn off a service temporarily or release a new feature at a specific time, you could push that to the change queue and a job that consumes the queue would execute it so you don’t have do it manually.

Tagging

Another important feature is tagging. Since configs would be organized by function, you need a way to group your configs by product, service, feature or any other category you fancy. This is purely for the benefit of the people interacting with the configuration database. You could create a tag for your SMS notification system, for example, which would let one pull up all the configs related to this service.


Last modified on 2022-10-23