One of the secret powers of a relational database system, is that the query describing the data you want and the execution path for fetching it have no direct connection. The database is free to find optimizations and alternate paths for fetching the data. Coupled with database configurations like indexes and partitions, you can frequently take huge quantities of data and run arbitrary queries against it without having to think too hard about performance when writing the query (while spending a lot of time thinking about performance as you manage indexes and statistics gathering).
That doesn't mean that we don't see some… unique choices in terms of how we organize our data "for performance". Chris needed to add a field to one of their data models. Said data model was generated from object oriented mapping, so it seemed like it should be easy to do. Just add a field to the object oriented model, generate a migration script, and then start the rollout process.
It was not, in fact, easy. At first it was because whatever they had done in their underlying configuration meant that the ORM tool they were using couldn't successfully generate migration scripts. Fine, Chris could generate it by hand- it was one field after all. But applying that change in a test database showed that just adding the field wasn't sufficient- a bunch of materialized views needed to be modified to also fetch the field. Upon opening the code for the views, Chris also saw that simply changing the views wasn't going to be sufficient, because they didn't work the way anyone expected.
Each view queried many hundreds of tables, all with the same schema. They were all named in the pattern MyDataset10OCT2023
. Each one had a foreign key back to the previous day's table- MyDataset10OCT2023
linked back to MyDataset09OCT2023
, which went back to MyDataset08OCT2023
, and so on. The views joined all of those tables together to aggregate the data across the entire history, essentially summarizing a daily snapshot of all the ways the data changed.
All of this was managed by a set of PL/SQL stored procedures, some of which generated the daily table, some of which regenerated the queries for the materialized views. All of this meant that Chris either had to add the field to every single daily table, or had to touch PL/SQL code that munged strings together to generate SQL queries that replaced materialized views.
Given the options, changing every single table seemed easier and at least something that could be automated and tested. But Chris has no idea why there's essentially a linked list of database tables with daily snapshots, and asking the people who had been on the team for awhile didn't yield better answers than, "I think they did that for performance."
This post originally appeared on The Daily WTF.