Search Engine

To overcome the main tradeoff due to DynamoDB’s limitations, we use OpenSearch as our primary search engine to provide fast and efficient search capabilities across the data in our system. OpenSearch is deployed in a highly available configuration, ensuring reliability and performance.

Key Features

High Availability: The OpenSearch cluster is configured with zone awareness enabled, distributing nodes across multiple availability zones for maximum uptime.
Scalability: The cluster uses r6gd.large.search instances, providing a good balance of memory and compute resources.
Monitoring: Comprehensive logging is implemented using OpenTelemetry and a custom implementation for OpenSearch, tracking application logs and index operations.

Technical Implementation

The system uses the official Async OpenSearch Python client providing:

Asynchronous operations for better performance
Automatic retry mechanisms for handling transient failures
Telemetry integration for monitoring and debugging

The search functionality is implemented with a focus on:

Full-text search capabilities
Fields filtering
Listing unique values
Pagination and sorting
Aggregation support

Development and Testing

For development and testing purposes, we maintain:

Local OpenSearch instance for development.
Mock implementations for testing using openmock.
Comprehensive logging and monitoring with Jaeger.

All the test data is populated and updated automatically with Streams once application is started. You can see this behavior locally with:

  integrates-local

A folder structure is used to organize the code for entities that has an index. See an example with the vulnerabilities index:

Directoryvulnerabilities/
- Directoryindex/
  - enums.py
  - filter.py
  - search.py
  - sort.py
  - types.py

types.py and enums.py contains important types for the search engine.

filter.py contains the methods to sanitize, parse and validate the filters received from the API.

sort.py contains the methods to build valid OpenSearch sort parameters using the sort received from the API.

search.py contains the methods to build and perform a valid OpenSearch query using the filters and sorting parameters.

All filter, sort and search will have an apply() method to use all its features in a single place and facilitate implementation like this:

from integrates.vulnerabilities.index import filter as vulns_filters
from integrates.vulnerabilities.index import sort as vulns_sort
from integrates.vulnerabilities.index import search as vulns_search

...

def resolve(parent: Vulnerability, info: GraphQLResolveInfo, **kwargs: dict) -> list[Vulnerability]:
    formatted_filters = vulns_filters.apply(kwargs.get("filters", {}))
    formatted_sort = vulns_sort.apply(kwargs.get("sort", {}))

    results = await vulns_search.apply(formatted_filters, formatted_sort)

    return [format_vuln(result) for result in results]

Unit testing

Test files can be found beside the implementation files.

Directoryvulnerabilities/
- Directoryindex/
  - filter_test.py
  - search_test.py
  - sort_test.py

filters tests must test most important sanitization, parsing and authorization cases.

sort tests must test valid construction of sort parameters.

search tests must test valid construction of OpenSearch queries using the filters and sort parameters.

Functional testing

When application is started locally, OpenSearch instance will be running in http://localhost:9200/. You can make requests to this endpoint using tools like Insomnia or Postman.

Refer to OpenSearch official documentation to get more information about queries.