RTR Data Compare validation tool
On this page
Overview
The Data Compare tool is an optional RTR validation service that allows STLT users to compare data processed by RTR against the classic ETL pipeline and identify differences.
This service is optional. STLTs may choose to install it only if they require RTR validation capabilities.
The tool consists of two containerized services that communicate asynchronously via Kafka:
- Data Compare API — pulls and prepares data from designated tables, then uploads it to an S3 bucket
- Data Compare Processor — retrieves data from the S3 bucket and performs the comparison logic
Database changes are managed by Liquibase, integrated within the DataCompareAPI service. Schema changes are applied automatically during deployment. The database objects in the following directory are for reference only: NEDSS-DataCompare/DataCompareAPIs/…/db/data_internal
Prerequisites
- Access to an AWS S3 bucket for data exchange
- Keycloak configured with the Data Compare API profile: NEDSS-Helm/charts/keycloak/extra
In your values.yaml, provide the Keycloak auth URI:
authUri: "http://keycloak.default.svc.cluster.local/auth/realms/NBS"
This value only needs to change if the Keycloak pod’s name or namespace is modified.
Deploy via Helm
Data Compare API
Helm chart location: charts/data-compare-api-service
-
Validate the image tag in
values.yaml:image: repository: "quay.io/us-cdcgov/cdc-nbs-modernization/data-compare-api-service" pullPolicy: IfNotPresent tag: <release-version-tag> # e.g. v1.0.1 -
Update JDBC and other configuration values:
ingressHost: "data.EXAMPLE_DOMAIN" jdbc: dbserver: "EXAMPLE_DB_ENDPOINT" username: "EXAMPLE_ODSE_DB_USER" password: "EXAMPLE_ODSE_DB_USER_PASSWORD" authUri: "http://keycloak.default.svc.cluster.local/auth/realms/NBS" s3: region: "AWS REGION" bucketName: "S3 BucketName" -
Install the Helm chart:
helm install data-compare-api-service -f ./data-compare-api-service/values.yaml data-compare-api-service -
Verify the pod is running:
kubectl get pods -
Validate the service by opening the Swagger UI:
https://<data.EXAMPLE_DOMAIN>/comparison/swagger-ui/index.html
Data Compare Processor
Helm chart location: charts/data-compare-processor-service
The Processor is a Kafka consumer microservice and does not expose any API endpoints.
-
Validate the image tag in
values.yaml:image: repository: "quay.io/us-cdcgov/cdc-nbs-modernization/data-compare-processor-service" pullPolicy: IfNotPresent tag: <release-version-tag> # e.g. v1.0.1 -
Update JDBC and other configuration values:
ingressHost: "data.EXAMPLE_DOMAIN" jdbc: dbserver: "EXAMPLE_DB_ENDPOINT" username: "EXAMPLE_ODSE_DB_USER" password: "EXAMPLE_ODSE_DB_USER_PASSWORD" authUri: "http://keycloak.default.svc.cluster.local/auth/realms/NBS" s3: region: "AWS REGION" bucketName: "S3 BucketName" -
Install the Helm chart:
helm install data-compare-processor-service -f ./data-compare-processor-service/values.yaml data-compare-processor-service -
Verify the pod is running:
kubectl get pods
Ingress
The Data Compare API uses the same ingress as the data ingestion service. Reuse the ingress config as needed: dataingestion-service/templates/ingress.yaml
Verify the deployment
Confirm both services are running without crashes:
kubectl get pods
kubectl logs <pod-name>
The system is ready when both services are healthy and the Processor begins consuming from Kafka.
Use the Data Compare tool
The comparison process relies on the Data_Compare_Config table, which is created and populated by Liquibase when the Data Compare API is deployed. The table comes preloaded with records containing table names and queries that determine what data to compare.
To start a comparison, call:
POST /comparison/api/data-compare
Pass the runNowMode header to control scope:
true— runs only on records in the config table whererunNow = true; resetsrunNowtofalsewhen completefalse— runs on all records in the config table
This is an asynchronous endpoint. If authentication passes and there are no logical errors, it returns a success response immediately. The actual comparison runs in the background.
Data flow:
API → Pull data from SQL table → Upload to S3 → Kafka → Processor → Pull from S3 → Perform comparison → Upload results to S3