Search Engine
To overcome the main tradeoff due to DynamoDB’s limitations, we use OpenSearch as our primary search engine to provide fast and efficient search capabilities across the data in our system. OpenSearch is deployed in a highly available configuration, ensuring reliability and performance.
Key Features
- High Availability: The OpenSearch cluster is configured with zone awareness enabled, distributing nodes across multiple availability zones for maximum uptime.
- Scalability: The cluster uses r6gd.large.search instances, providing a good balance of memory and compute resources.
- Monitoring: Comprehensive logging is implemented using OpenTelemetry and a custom implementation for OpenSearch, tracking application logs and index operations.
Technical Implementation
The system uses the official Async OpenSearch Python client providing:
- Asynchronous operations for better performance
- Automatic retry mechanisms for handling transient failures
- Telemetry integration for monitoring and debugging
The search functionality is implemented with a focus on:
- Full-text search capabilities
- Fields filtering
- Listing unique values
- Pagination and sorting
- Aggregation support
Development and Testing
For development and testing purposes, we maintain:
- Local OpenSearch instance for development.
- Mock implementations for testing using openmock.
- Comprehensive logging and monitoring with Jaeger.
All the test data is populated and updated automatically with Streams once application is started. You can see this behavior locally with:
m . /integrates
A folder structure is used to organize the code for entities that has an index. See an example with the vulnerabilities index:
Directoryvulnerabilities/
Directoryindex/
- enums.py
- filter.py
- search.py
- sort.py
- types.py
types.py
and enums.py
contains important types for the search engine.
filter.py
contains the methods to sanitize, parse and validate the filters
received from the API.
sort.py
contains the methods to build valid OpenSearch sort parameters using
the sort received from the API.
search.py
contains the methods to build and perform a valid OpenSearch query
using the filters and sorting parameters.
All filter, sort and search will have an apply() method to use all its features in a single place and facilitate implementation like this:
from integrates.vulnerabilities.index import filter as vulns_filtersfrom integrates.vulnerabilities.index import sort as vulns_sortfrom integrates.vulnerabilities.index import search as vulns_search
...
def resolve(parent: Vulnerability, info: GraphQLResolveInfo, **kwargs: dict) -> list[Vulnerability]: formatted_filters = vulns_filters.apply(kwargs.get("filters", {})) formatted_sort = vulns_sort.apply(kwargs.get("sort", {}))
results = await vulns_search.apply(formatted_filters, formatted_sort)
return [format_vuln(result) for result in results]
Unit testing
Test files can be found beside the implementation files.
Directoryvulnerabilities/
Directoryindex/
- filter_test.py
- search_test.py
- sort_test.py
filters tests must test most important sanitization, parsing and authorization cases.
sort tests must test valid construction of sort parameters.
search tests must test valid construction of OpenSearch queries using the filters and sort parameters.
Functional testing
When application is started locally, OpenSearch instance will be running in http://localhost:9200/. You can make requests to this endpoint using tools like Insomnia or Postman.
Refer to OpenSearch official documentation to get more information about queries.

