Skip to content

Database Migrations

What are migrations?

As Integrates and the business evolves, it is natural for the structure of the data to change. In order to keep backwards compatibility, it is necessary to run data migrations that change all existing data so it complies with the latest data schema.

For example:

  1. We have a cars database and are storing two attributes, color and brand.
  2. At some point in time we decide to also store the price attribute.
  3. When this happens, we have to go through all the already-created cars and add the new price attribute accordingly.

Writing migration scripts

You can find all the already-executed migrations here. The latest of them may be helpful as inspiration when creating your own migration.

Basic properties

All migration scripts have a comment including:

  1. A basic description of what they do
  2. An Execution time that specifies when it started running.
  3. A Finalization Time that specifies when it finished running.

The main function

Your migration script should contain a main function, which will be called when the migration runs.

from aioextensions import (
run,
)
import logging
import logging.config
from settings import (
LOGGING,
)
import time
logging.config.dictConfig(LOGGING)
LOGGER_CONSOLE = logging.getLogger("console")
async def main() -> None:
"""Your code goes here"""
if __name__ == "__main__":
execution_time = time.strftime(
"Execution Time: %Y-%m-%d at %H:%M:%S %Z"
)
run(main())
finalization_time = time.strftime(
"Finalization Time: %Y-%m-%d at %H:%M:%S %Z"
)
LOGGER_CONSOLE.info("\n%s\n%s", execution_time, finalization_time)

You can call dataloaders, domain functions, data model functions and even direct calls to the corresponding datastore module, depending of the level of abstraction best suited to achieve the intended change.

Running migrations

Dry runs

As migrations affect production data, it is very important that you take all necessary measures so they work as expected.

A very useful measure are dry runs. Dry runs allow you to run migrations on your local environment.

To execute a dry run:

  1. Write your migration.
  2. Turn on your development environment.
  3. Run m . /integrates/db/migration dev name_of_script.py

This approach allows you to locally test your migration until you feel comfortable enough to run it on production.

Running locally

If you have the required role to modify the database, migrations can be executed from your machine by running: m . /integrates/db/migration prod name_of_script.py

Quality check

Use lintPython for linting migrations scripts as:

Terminal window
m . /lintPython/dirOfModules/integrates/migrations

Also, run its architecture lint with:

Terminal window
m . /integrates/back/migrations/arch_lint

This architecture lint will verify that forbidden modules are not being imported in migrations scripts, this is done for avoiding direct usage of modules which directly modify the database.

Unless strictly necessary, prefer using already existing utils in db_model, otherwise, add exception rules to your migration script.

If you require adding a rule exception, then modify the /integrates/back/migrations/arch_lint/test_arch.py::forbidden_allowlist function as follows:

def forbidden_allowlist() -> Dict[FullPathModule, FrozenSet[FullPathModule]]:
_raw: Dict[str, FrozenSet[str]] = {
"boto3": frozenset({ "migrations._aaa_example_script"}),
"dynamodb": frozenset({ "migrations._bbb_example_script"}),
}
return {
FullPathModule.assert_module(k): frozenset(
FullPathModule.assert_module(i) for i in v
)
for k, v in _raw.items()
}

This would allow _aaa_example_script to import boto3 module and _bbb_example_script to import dynamodb.

Running on AWS Batch

Once you know that your migration does what it is supposed to do, it is recommended to execute it using a Batch schedule:

  1. Write your migration.
  2. Create a batch schedule that executes the migration.
  3. Deploy both changes to production
  4. Wait until the schedule executes.
  5. Access the AWS console to review the logs of the migration.

This allows the migration to execute on an external environment from your own machine that is faster and more reliable.

Restoring to a previous state

If something goes wrong, you have the option to restore data from a backup.

  1. Follow these instructions to restore a Point In Time into a new table.
  2. Restore the data by reading from the recovery table, and writing into the main table
  3. Remove the recovery table

Delete migrations

Migrations should be kept in the repository for at least one year. After that, they should be deleted to avoid compatibility issues.

There is no need to delete a migration immediately after one year has passed (We have no test in the CI for that). The usual procedure is simply to delete them in bulk at least twice a year, once in January and another time in July.