Database Migrations
What are migrations?
As Integrates and the business evolves, it is natural for the structure of the data to change. In order to keep backwards compatibility, it is necessary to run data migrations that change all existing data so it complies with the latest data schema.
For example:
- We have a cars database and are storing two attributes,
color
andbrand
. - At some point in time we decide to also store
the
price
attribute. - When this happens, we have to go through all the already-created cars
and add the new
price
attribute accordingly.
Writing migration scripts
You can find all the already-executed migrations here. The latest of them may be helpful as inspiration when creating your own migration.
Basic properties
All migration scripts have a comment including:
- A basic description of what they do
- An
Execution time
that specifies when it started running. - A
Finalization Time
that specifies when it finished running.
The main function
Your migration script should contain a main function, which will be called when the migration runs.
You can call dataloaders, domain functions, data model functions and even direct calls to the corresponding datastore module, depending of the level of abstraction best suited to achieve the intended change.
Running migrations
Dry runs
As migrations affect production data, it is very important that you take all necessary measures so they work as expected.
A very useful measure are dry runs. Dry runs allow you to run migrations on your local environment.
To execute a dry run:
- Write your migration.
- Turn on your development environment.
- Run
m . /integrates/db/migration dev name_of_script.py
This approach allows you to locally test your migration until you feel comfortable enough to run it on production.
Running locally
If you have the required role to modify the database,
migrations can be executed from your machine by running:
m . /integrates/db/migration prod name_of_script.py
Quality check
Use lintPython
for linting migrations scripts as:
Also, run its architecture lint with:
This architecture lint will verify that forbidden modules are not being imported in migrations scripts, this is done for avoiding direct usage of modules which directly modify the database.
Unless strictly necessary, prefer using already existing utils in db_model, otherwise, add exception rules to your migration script.
If you require adding a rule exception, then modify the
/integrates/back/migrations/arch_lint/test_arch.py::forbidden_allowlist
function as follows:
This would allow _aaa_example_script
to import boto3
module and
_bbb_example_script
to import dynamodb
.
Running on AWS Batch
Once you know that your migration does what it is supposed to do, it is recommended to execute it using a Batch schedule:
- Write your migration.
- Create a batch schedule that executes the migration.
- Deploy both changes to production
- Wait until the schedule executes.
- Access the AWS console to review the logs of the migration.
This allows the migration to execute on an external environment from your own machine that is faster and more reliable.
Restoring to a previous state
If something goes wrong, you have the option to restore data from a backup.
- Follow these instructions to restore a Point In Time into a new table.
- Restore the data by reading from the recovery table, and writing into the main table
- Remove the recovery table
Delete migrations
Migrations should be kept in the repository for at least one year. After that, they should be deleted to avoid compatibility issues.
There is no need to delete a migration immediately after one year has passed (We have no test in the CI for that). The usual procedure is simply to delete them in bulk at least twice a year, once in January and another time in July.