Skip to content

Compute

Compute is the component of Common in charge of providing out-of-band processing. It can run jobs both on-demand and on-schedule.

Public Oath

Fluid Attacks will constantly look for out-of-band computing solutions that balance:

  • Cost
  • Security
  • Scalability
  • Speed
  • Traceability

Such solutions must also be:

  • Cloud based
  • Integrable with the rest of our stack

Architecture

Architecture-light Architecture-dark
  1. The module is managed as code using Terraform.
  2. Batch jobs use AWS EC2 Spot machines.
  3. Spot machines have Internet access.
  4. Spot machines are of aarch64-linux architecture.
  5. Batch jobs are able to run jobs, but for as long as an EC2 SPOT instance last (so design with idempotency, and retrial mechanisms in mind).
  6. Jobs can be sent to batch in two ways:
    • Using curl, boto3, or any other tool that allows interacting with AWS API.
    • Defining a schedule, which periodically submits a job to a queue.
  7. AWS EventBridge is used to trigger scheduled jobs.
  8. On failure, an email is sent to development@fluidattacks.com
  9. Batch machines come in two sizes:
    • small with 1 vcpu and 8 GiB memory.
    • large with 2 vcpu and 16 GiB memory.
  10. All runners have internal solid-state drives for maximum performance.
  11. A special compute environment called warp meant for cloning repositories via Cloudflare WARP uses 2 vcpu and 4 GiB memory machines on a x86_64-linux architecture.
  12. Compute environments use subnets on all availability zones within us-east-1 for maximum spot availability

Contributing

Please read the contributing page first.

General

  • You can access the Batch console after authenticating to AWS via Okta.
  • If a scheduled job takes longer than six hours, it should generally run in Batch; otherwise, you can use the CI.

Schedules

Schedules are a powerful way to run tasks periodically.

You can find all schedules here.

Creating a new schedule

We highly advise you to take a look at the currently existing schedules to get an idea of what is required.

Some special considerations are:

  1. The scheduleExpression option follows the AWS schedule expression syntax.

Testing the schedules

Schedules are tested by two Makes jobs:

  1. m . /common/compute/schedule/test Grants that
    • all schedules comply with a given schema;
    • all schedules have at least one maintainer with access to the universe repository;
    • every schedule is reviewed by a maintainer on a monthly basis.
  2. m . /deployTerraform/commonCompute Tests infrastructure that will be deployed when new schedules are created

Deploying schedules to production

Once a schedule reaches production, required infrastructure for running it is created.

Technical details can be found here.

Local reproducibility in schedules

Once a new schedule is declared, A Makes job is created with the format computeOnAwsBatch/schedule_<name> for local reproducibility.

Generally, to run any schedule, all that is necessary is to export the UNIVERSE_API_TOKEN variable. Bear in mind that data.nix becomes the single source of truth regarding schedules. Everything is defined there, albeit with a few exceptions.

Testing compute environments

Testing compute environments is hard for multiple reasons:

  1. Environments use init data that is critical for properly provisioning machines.
  2. Environments require AWS AMIs that are especially optimized for ECS.
  3. When upgrading an AMI, many things within the machines change, including the cloud-init (the software that initializes the machine using the init data provided) version, GLIBC version, among many others.
  4. There is not a comfortable way to test this locally or in CI, which forces us to rely on productive test environments.

test is the compute environment for testing. It uses the common/compute/infra/init/test init data.

Below is a step-by-step guide to testing environments.

  1. Change the test environment with whatever changes you want to test.

  2. direnv to the AWS prod_common role.

  3. Export CACHIX_AUTH_TOKEN on your environment. You can find this variable in GitLab’s CI/CD variables. If you do not have access to this, ask a maintainer.

  4. Deploy changes made to the environment you want to test with m . /deployTerraform/commonCompute.

  5. Queue compute test jobs with

    Terminal window
    m . /computeOnAwsBatch/schedule_common_compute_test_environment_default

    or

    Terminal window
    m . /computeOnAwsBatch/schedule_common_compute_test_environment_warp
  6. Review that jobs are running properly on the test environment.

  7. Extend your changes to the production environments.