Introduction
Matches allows to correlate Fluid Attacks’ vulnerabilities with custom vulnerabilities provided by the client via Integrates
Public Oath
- Fluid Attacks handles client data with the utmost care and respect, including all custom criteria threats provided by clients.
- Any AI tool (external or self-hosted) will be used ethically and responsibly, ensuring client data is only used for its agreed-upon purpose.
- Fluid Attacks has opted out of any AI embedding provider’s data collection for training purposes, guaranteeing that client data is not used to improve external models.
- All data processing and storage related to custom threats is performed securely within Fluid Attacks’ infrastructure, with no personally identifiable information (PII) or sensitive data manipulated by Matches.
- Fluid Attacks is committed to maintaining transparency and upholding the highest standards of privacy and security for all client data processed by Matches.
Architecture
- Criteria embedding: Fluid Attacks’ criteria library is processed to produce vector embeddings with VoyageAI whose collections from chromadb vector database are stored in AWS S3.
- Custom threat definition: The client can upload their custom threats criteria through the integrates platform for being eventually stored in AWS S3.
- Threat mapping: once uploaded, a task is triggered to extract meaningful threats from the client’s documents with a pre-trained BERT classification model, then the client’s custom threats are mapped against the Fluid Attacks’ criteria library to find the most correlated criteria and eventually store those mappings into integrates Dynamo DB main table via SQS.
- See mapped threats in platform: The client can query the mappings
through the integrates platform. These are visible in the
matches
tab for every group.
Data Security and Privacy
Matches does not manipulate personally identifiable information (PII), compromising or sensitive data, as it only processes custom vulnerabilities provided by the client. Such custom threats are Stored in DynamoDB and S3, all inside Fluid Attacks’ AWS Account
Voyage AI
Voyage AI, by default, utilizes customer data for training and improving AI models. However, Fluid Attacks explicitly opts out of this default setting, ensuring that client data provided to Voyage AI is used solely for generating embeddings and is not leveraged for any model training or improvement purposes. (Voyage AI Privacy Policy)
Some additional points to consider:
- Voyage AI hosts its infrastructure in USA.
- Since Fluid Attacks opted out of data collection, the data is not stored in Voyage AI’s servers (zero storage time).
- Data transmitted to Voyage AI is encrypted in transit with SSL in all its APIs.
- Voyage AI has GDPR, SOC 2 and HIPAA compliance certifications.
Chroma vector database
Chroma is an open source vector database which supports self management locally, which means that the data is not sent to any cloud provider.
Yet, chroma contains an
anonymous telemetry feature
which is enabled by default.
We have opted out of this feature by setting the anonymized_telemetry
to
false
in the Settings
object.
Contributing
Please read the contributing page first.
All matches executions must be run within its root directory:
pushd matches
This will set up the development shell with the environment to run
matches logged in as dev
.
Running lint
We use ruff to lint the code and mypy to type check it. These quality checks can be run with the following command
nix run .#matches-lint
Running unit tests
Tests are based on pytest and can be run with the following command:
nix run .#matches-test
These tests are expected to be pure in terms of third party services, which means that they should not call external services or even communicate with the outside world, this is why the pytest socket plugin is configured to block all socket connections. IO on file system is allowed, using pytest tmp_path fixtures is the recommended flow for avoiding persistent side effects.
For mocking langchain chat models, we use the GenericFakeChatModel
class.
Debugging
If using VsCode based IDEs, you can use the Debug: Start Debugging(F5)
command
to start a debug session and choose any of the debug configurations available
in the launch.json
file. Set up breakpoints in the code and start debugging!
Running matches
Matches can be run with the following command target to prod
or dev
environments :
nix run .#matches <environment> <command>
or in an active development shell (which will default environment to dev
):
matches <command>
Matches extract command
nix run .#matches <environment> extract <group_name>
This will upload the matches results s3 and push an SQS message to the integrates platform to update the matches results as records in the integrates database. This async layer avoids coupling between the matches processing cli and the platform database DAL.
Other commands
The matches main cli also includes other commands, which can we seen by
launching the matches cli with matches --help
flag:
Usage: matches [OPTIONS] COMMAND [ARGS]...
Options: --help Show this message and exit.
Commands: augment Augment training data. benchmark Run end-to-end benchmark. embed-defines Embed Fluid Attacks criteria in the vector database. eval-model Evaluate a model with the given MODEL_NAME. eval-refinement Evaluate refinement quality. eval-translation Evaluate translation quality. extract Extract matches for a group. gen-unlabeled Generate unlabeled data. label Label data for a target (train/test). list-profiles List bedrock inference profiles. semantic-compare Run semantic comparison experiment. train Train a model with the given MODEL_NAME.