Table of contents
- Deploy Data Compare (Optional Validation Tool for RTR)
- Deploy via Helm
- Spot Testing & Validation
- How to Use Data Compare
Deploy Data Compare (Optional Validation Tool for RTR)
Overview
The validation tool for RTR (Real-Time Reporting) allows STLT users to compare and identify differences between data processed by RTR and the classic ETL pipeline.
This service is optional. STLTs may choose to install it only if they require RTR validation capabilities.
Requirements
- Users must have access to an AWS S3 bucket for data exchange.
Components
1. Data Compare API
- A containerized service that:
- Pulls and prepares data from designated tables
- Uploads the data to a specified AWS S3 bucket
2. Data Compare Processor
- A containerized service that:
- Retrieves data from the S3 bucket
- Performs data comparison logic
The two services communicate asynchronously using Kafka.
Database Setup
- Database changes are managed by Liquibase, integrated within the
DataCompareAPI
service. - Schema changes are automatically applied during deployment.
- The database objects listed in the following directory are for reference only:
Keycloak Configuration
Provide the Keycloak Auth URI in your values.yaml
file:
authUri: "http://keycloak.default.svc.cluster.local/auth/realms/NBS"
This value only needs to change if the Keycloak pod’s name or namespace is modified.
- The Data Compare API requires a Keycloak profile:
https://github.com/CDCgov/NEDSS-Helm/tree/main/charts/keycloak/extra
Deploy via Helm
Data Compare API
Helm Chart Location: charts/data-compare-api-service
Validate the image tag
image:
repository: "quay.io/us-cdcgov/cdc-nbs-modernization/data-compare-api-service"
pullPolicy: IfNotPresent
tag: <release-version-tag> # e.g. v1.0.1
Update JDBC and other configurations
ingressHost: "data.EXAMPLE_DOMAIN"
jdbc:
dbserver: "EXAMPLE_DB_ENDPOINT"
username: "EXAMPLE_ODSE_DB_USER"
password: "EXAMPLE_ODSE_DB_USER_PASSWORD"
authUri: "http://keycloak.default.svc.cluster.local/auth/realms/NBS"
s3:
region: "AWS REGION"
bucketName: "S3 BucketName"
Install Pod
helm install data-compare-api-service -f ./data-compare-api-service/values.yaml data-compare-api-service
Verify Pod
kubectl get pods
Validate the service
Open:
https://<data.EXAMPLE_DOMAIN>/comparison/swagger-ui/index.html
Data Compare Processor
Helm Chart Location: charts/data-compare-processor-service
Validate the image tag
image:
repository: "quay.io/us-cdcgov/cdc-nbs-modernization/data-compare-processor-service"
pullPolicy: IfNotPresent
tag: <release-version-tag> # e.g. v1.0.1
Update JDBC and other configurations
ingressHost: "data.EXAMPLE_DOMAIN"
jdbc:
dbserver: "EXAMPLE_DB_ENDPOINT"
username: "EXAMPLE_ODSE_DB_USER"
password: "EXAMPLE_ODSE_DB_USER_PASSWORD"
authUri: "http://keycloak.default.svc.cluster.local/auth/realms/NBS"
s3:
region: "AWS REGION"
bucketName: "S3 BucketName"
Install Pod
helm install data-compare-processor-service -f ./data-compare-processor-service/values.yaml data-compare-processor-service
Verify Pod
kubectl get pods
Note: The Processor service is a Kafka consumer microservice and does not expose any API endpoints.
Ingress Setup
- The Data Compare API uses the same ingress as the data ingestion service.
- Reuse the following ingress config as needed:
Spot Testing & Validation
- Ensure both containers are running without crashes:
kubectl get pods
kubectl logs <pod-name>
✅ The system is ready when both services are healthy and the processor begins consuming from Kafka.
How to Use Data Compare
- The Data Compare process requires the
Data_Compare_Config
table, which is created and managed by Liquibase when the Data Compare API is deployed. - The
Data_Compare_Config
table comes preloaded with several records containing table names and queries. These records are crucial, as the comparison process relies on them to determine what data to compare.
Invoke Comparison Endpoint
Users must call the following endpoint to start the comparison process:
POST /comparison/api/data-compare
Required Headers:
runNowMode
:true
orfalse
- If
true
: The comparison will run only on records in the config table whererunNow = true
. Once completed,runNow
will automatically be reset tofalse
. - If
false
: The comparison will run on all records found in the configuration table.
- If
🛈 This is an asynchronous endpoint — if authentication passes and there are no logical errors, it will return a success response immediately. However, the actual comparison process may take time as it involves:
Data Flow
API → Pull data from SQL table → Upload to S3 → Kafka → Processor → Pull from S3 → Perform comparison → Upload results to S3