Skip to content

Introduction

Matches allows to correlate Fluid Attacks’ vulnerabilities with custom vulnerabilities provided by the client via Integrates

Public Oath

  1. Fluid Attacks handles client data with the utmost care and respect, including all custom criteria threats provided by clients.
  2. Any AI tool (external or self-hosted) will be used ethically and responsibly, ensuring client data is only used for its agreed-upon purpose.
  3. Fluid Attacks has opted out of any AI embedding provider’s data collection for training purposes, guaranteeing that client data is not used to improve external models.
  4. All data processing and storage related to custom threats is performed securely within Fluid Attacks’ infrastructure, with no personally identifiable information (PII) or sensitive data manipulated by Matches.
  5. Fluid Attacks is committed to maintaining transparency and upholding the highest standards of privacy and security for all client data processed by Matches.

Architecture

Architecture-light Architecture-dark
  1. Criteria embedding: Fluid Attacks’ criteria library is processed to produce vector embeddings with VoyageAI whose collections from chromadb vector database are stored in AWS S3.
  2. Custom threat definition: The client can upload their custom threats criteria through the integrates platform for being eventually stored in AWS S3.
  3. Threat mapping: once uploaded, a task is triggered to extract meaningful threats from the client’s documents with a pre-trained BERT classification model, then the client’s custom threats are mapped against the Fluid Attacks’ criteria library to find the most correlated criteria and eventually store those mappings into integrates Dynamo DB main table via SQS.
  4. See mapped threats in platform: The client can query the mappings through the integrates platform. These are visible in the matches tab for every group.

Data Security and Privacy

Matches does not manipulate personally identifiable information (PII), compromising or sensitive data, as it only processes custom vulnerabilities provided by the client. Such custom threats are Stored in DynamoDB and S3, all inside Fluid Attacks’ AWS Account

Voyage AI

Voyage AI, by default, utilizes customer data for training and improving AI models. However, Fluid Attacks explicitly opts out of this default setting, ensuring that client data provided to Voyage AI is used solely for generating embeddings and is not leveraged for any model training or improvement purposes. (Voyage AI Privacy Policy)

Some additional points to consider:

  1. Voyage AI hosts its infrastructure in USA.
  2. Since Fluid Attacks opted out of data collection, the data is not stored in Voyage AI’s servers (zero storage time).
  3. Data transmitted to Voyage AI is encrypted in transit with SSL in all its APIs.
  4. Voyage AI has GDPR, SOC 2 and HIPAA compliance certifications.

Chroma vector database

Chroma is an open source vector database which supports self management locally, which means that the data is not sent to any cloud provider.

Yet, chroma contains an anonymous telemetry feature which is enabled by default. We have opted out of this feature by setting the anonymized_telemetry to false in the Settings object.

Contributing

Please read the contributing page first.

All matches executions must be run within its root directory:

Terminal window
pushd matches

This will set up the development shell with the environment to run matches logged in as dev.

Running lint

We use ruff to lint the code and mypy to type check it. These quality checks can be run with the following command

Terminal window
nix run .#matches-lint

Running unit tests

Tests are based on pytest and can be run with the following command:

Terminal window
nix run .#matches-test

These tests are expected to be pure in terms of third party services, which means that they should not call external services or even communicate with the outside world, this is why the pytest socket plugin is configured to block all socket connections. IO on file system is allowed, using pytest tmp_path fixtures is the recommended flow for avoiding persistent side effects.

For mocking langchain chat models, we use the GenericFakeChatModel class.

Debugging

If using VsCode based IDEs, you can use the Debug: Start Debugging(F5) command to start a debug session and choose any of the debug configurations available in the launch.json file. Set up breakpoints in the code and start debugging!

Running matches

Matches can be run with the following command target to prod or dev environments :

Terminal window
nix run .#matches <environment> <command>

or in an active development shell (which will default environment to dev):

Terminal window
matches <command>

Matches extract command

Terminal window
nix run .#matches <environment> extract <group_name>

This will upload the matches results s3 and push an SQS message to the integrates platform to update the matches results as records in the integrates database. This async layer avoids coupling between the matches processing cli and the platform database DAL.

Other commands

The matches main cli also includes other commands, which can we seen by launching the matches cli with matches --help flag:

Terminal window
Usage: matches [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
augment Augment training data.
benchmark Run end-to-end benchmark.
embed-defines Embed Fluid Attacks criteria in the vector database.
eval-model Evaluate a model with the given MODEL_NAME.
eval-refinement Evaluate refinement quality.
eval-translation Evaluate translation quality.
extract Extract matches for a group.
gen-unlabeled Generate unlabeled data.
label Label data for a target (train/test).
list-profiles List bedrock inference profiles.
semantic-compare Run semantic comparison experiment.
train Train a model with the given MODEL_NAME.

See also