Constraints and Logic in Your Database

Most people aren't aware of the power of constraints in a database. They're there to enforce data rules, and you're probably familiar with a few of them, including:

The null constraint, which won't allow null values.
The unique constraint, which ensures a record is always unique.
The primary key constraint, which I don't need to explain.

Assuming you've used these before, let me ask you a question: is this business logic? I think it's fair to say that some logic is being used here in the form of a very basic conditional statement: if null then no. Some people hate this idea.

What about foreign key constraints? Articles like this one claim you shouldn't, claiming that "rules change, deployment becomes harder, unit testing, and application reuse".

I clearly have an opinion on this one, which is simple and direct: if your database can protect your data, do it. That's what constraints are there for, and protecting your data is far more important than reusing your application later on.

Let's test this notion.

A Real World Scenario: The Program Schedule

Back in 2021, I was tasked with creating a scheduler for Microsoft's LearnTV. It had to work like a guide on a regular television, which you can visualize as this:

Here are the requirements I had to meet:

The start and end time needed to be at the top or bottom of the hour (like 14:00 or 14:30).
The duration had to be in 30 minute blocks, up to 2 hours (30, 60, 90, or 120).
There can be no overlap between programs.

How would you implement this in a domain model? The start and end seem simple enough, as does the duration calculation. Checking for overlap, however, gets a bit tricky.

Or you could write up a few constraints. Here, I'm using Postgres:

If you're not used to SQL, this is going to be jarring. I suppose the same goes for learning any programming language or platform - it takes a little time, then you have a whole new toolset at the ready.

With 13 lines of code, we have defined our table and knocked out every single requirement above, including one more, which is to ensure that the start happens before the end.

Every program has a start and an end, so it makes sense to have a range as the data type (which is what tstzrange is - timestamp with a timezone range). You get at the range values using lower and upper, which define the boundaries of the range.

I can define a constraint using the check keyword, and then pass in a boolean expression that can be any valid SQL expression.

The duration is a computed column using generate, which I'm then popping a constraint onto so that we can check the 30 minute block thing. We would need duration anyway in our code, so having it here, calculated by Postgres, makes good sense.

I had never used an exclusion constraint before I did this project and, if I'm honest, that syntax is a bit weird and is something you just have to memorize. That one line is what keeps things from overlapping, using the && operator.

Is This a Better Solution?

If you don't know SQL, I don't blame you if you look at this and wonder if I've lost my mind. That doesn't make it wrong. There are 13 lines of code here and it would take 30 minutes to Google and figure out what it does, and the reward for learning it is tremendous in my opinion.

Having data rules as close to the data makes sense. You can get around rules defined in code easily by writing a SQL query when you're in a hurry (like trying to fix a bug or running some report).

Can you unit test this? Of course. I know that many programmers will utterly refuse to test their code if it requires a database, and I understand that. It can slow things down, and also cause failure for ... reasons.

Rails freed me from this mindset, and I don't mind involving a test database. But that's me. It works for what I do, and it feels good to let go and do what feels right.

Will this logic change? Maybe. Perhaps we'll allow programs to be 15 minutes long in the future. This means we need to change code somewhere, either in our project, or in our database in the form of drop constraint on the slot, making the change, then alter table add constraint with the update.

Of course, you would want to check both.

Here's the thing, however: when you rewrite that application, you won't need to rewrite this logic. In my experience, rewrites happen far more than database changes, at least for me.

It'a All About The Guarantee

In the end, what matters most to me is when my boss pings me and asks "are you sure about these numbers?" Reports are going to be based on these numbers (who viewed what, for how long, etc.) and whatever data is in the programs table needs to be as correct as possible.

As you think about your reply to this question, you might wonder whether your tests captured every case. You also might wonder if some other bit of code might have altered your data without you knowing it (observers, triggers, hooks, weird ORM relationships).

For me, having constraints like this are guarantees. Data cannot exist in the table without complying, full stop. When you're asked about numbers being right, you should feel comfortable replying "yes, they are".

It's all about the data, just be sure you're right.

Learning More

I made a video about this exact thing just over a year ago, and you can watch it up on YouTube.

I also created a full course on Postgres and working with data, which I just put on sale for $20. This is a fun one, as we work with real data from the Cassini mission, and its exploration of Enceladus.

Happy querying!

Rob

🥷🏽 Notes From an Imposter Programmer

Constraints and Logic in Your Database

A Real World Scenario: The Program Schedule

Is This a Better Solution?

It'a All About The Guarantee

Learning More

👹 The Wrath of the Junior Developer

Following your imagination

It wasn't what you said, but how you said it