Documentation for DEX products
The Public Health Data Observability (PHDO) Processing Status (PS) API is one tool in the broader Centers for Disease Control and Prevention (CDC) Data Exchange(DEX) service offering. It was developed to support public health data senders in their efforts to share critical public health data with internal CDC Programs. Data senders are CDC partners across the country, including:
The PHDO PS API is a self-hosted, containerized, and pre-packaged tool for data submission visibility. It is configurable and can be run both locally and in any cloud environment. PHDO PS API was developed to provide visibility into the status, performance, and downstream processing results of file uploads. It enables users to request reports about the status of uploads and to develop custom queries to learn detailed information about the data being uploaded and processed.
The PS API has three pillars of functionality: Reports, Queries, and Notifications.
Reports are an essential component of the data observability aspect of PHDO. In PHDO, data is ingested to the system typically through a file upload. As the upload progresses through the service line, several processing stages occur, including upload, routing, data validation, data transformations, etc. Within each of those stages one or more actions may occur. As data moves through CDC systems, services both internal to PHDO and downstream of PHDO indicate the processing status of these stages through Reports.
For example, one action within the upload stage may be to first verify that all the required metadata associated with the uploaded file is provided and reject it if not. Other upload actions may include the file upload itself or the disposition of the upload for further downstream processing.
Queries provide a mechanism to ask questions about processing status using GraphQL, a flexible data query and manipulation language that allows a client to specify what data it wants. Queries can provide a wide range of insights, from the status of an upload to the number of uploads for a given data stream over a period of time. Also available are queries to provide data analysis, such as discovering duplicate file uploads, counts of error messages by type, counts of file uploads by data stream, and more.
Notifications are a way for end users to automatically receive a variety of analytical products from the PS API. Some examples include sending an email whenever an upload fails, invoking a webhook to inform a server when downstream processing completes an upload, and sending a daily digest about the number of uploads received by jurisdiction for a given data stream. There are a host of options built into the PS API and an endless number of customized analytical products that can be provided through the dynamic business rules engine.
The system requirements for deployment and operation of the PS API will vary depending on the use case. The PS API is designed to be cloud agnostic and is capable of running in Azure and AWS, as well as locally on a single machine.
When running PS API in the cloud, the requirements of each service will depend on the load on the system. System load includes the rate reports that are coming to the PS API and the number and complexity of concurrent queries.
When running locally, there is an all-in-one option for spinning up all of the services and dependencies of the PS API. This option can be run from docker-compose or pod compose, which means the PS API can be run from Window, Mac, and Linux. When all of the services are running, approximately 3.5G of memory will be consumed during typical load and 2.5G when idle. The amount of disk space for local deployment and operation of the all-in-one image is around 2GB.
The PHDO PS API is deployed to the cloud for CDC Enterprise use. The endpoints for the staging and production environments are provided below.
Component | Staging | Production |
---|---|---|
GraphQL Playground (Web App) | https://pstatusgraphql.ocio-eks-stg-ede.cdc.gov/graphiql | https://pstatusgraphql.ocio-eks-prd-ede.cdc.gov/graphiql |
GraphQL Service | https://pstatusgraphql.ocio-eks-stg-ede.cdc.gov/graphql | https://pstatusgraphql.ocio-eks-prd-ede.cdc.gov/graphql |
Report Message Service |
Staging | Production |
---|---|
<<insert ARN>> | <<insert ARN>> |
Please see the GraphQL introspection documentation for a detailed description of each of the responses for the GraphQL queries and mutations.
Error handling for the resultant codes should be in place for all the 4xx errors .
Component | Staging | Production |
---|---|---|
GraphQL Service | https://pstatusgraphql.ocio-eks-stg-ede.cdc.gov/health | https://pstatusgraphql.ocio-eks-prd-ede.cdc.gov/health |
Report Sink Service | https://pstatusreport.ocio-eks-stg-ede.cdc.gov/health | https://pstatusreport.ocio-eks-prd-ede.cdc.gov/health |
Notifications Rules Engine Service | https://pstatusnotification.ocio-eks-stg-ede.cdc.gov/health | https://pstatusnotification.ocio-eks-prd-ede.cdc.gov/health |
Notifications Workflow Service | https://notifications-workflow-service.ocio-eks-stg-ede.cdc.gov/health | https://notifications-workflow-service.ocio-eks-prd-ede.cdc.gov/health |
The PS API can be used independently or in tandem with the Upload API. Within each of these two scenarios, stand-alone or in concert with Upload API, there are a number of use cases.
Regardless of the use case, the PS API brings core capabilities, including:
When PS API is used independently of other PHDO services like the Upload API, the primary users have their own mechanism for ingesting data into their system. In this configuration, users incorporate PS API as a way to capture and observe the health of their data as it progresses through their service line.
For example, a data processing pipeline may be defined to run in Azure Data Factory (ADF), Databricks, function apps, lambda functions, services running in the cloud, or on prem. There is an endless number of environments that a data pipeline may be running in. The PS API is like a software sidecar that is informed of the status from within whatever environment the user has. Calls are made at various points along the processing timeline into PS API. Those calls can be made via GraphQL or through a messaging system for async or high-bandwidth situations.
Below are some of the possible stand-alone use cases:
PS API can work seamlessly alongside the PHDO Upload API. The Upload API is aware of the PS API and attempts to provide status as an upload occurs. For example, if an upload is rejected because it is missing a manifest, fails to complete uploading, or can’t be delivered once uploaded, those events are automatically relayed to the PS API when the Upload API is used.
Below are some of the possible PHDO use cases:
PS API is made up of several microservices that, when deployed together, allow for data upload visibility, report generation, and personalized notifications.
GraphQL: a microservice that can be built as a docker container image to provide reports, dead letter reports, and notifications
Notifications: a microservice that can be configured to provide customized analytical products through the dynamic business rules engine
Notifications Workflow: a workflow orchestration microservice for processing and evaluating the active notification rules using Temporal as its workflow engine
Report-Sink: a microservice that listens for messages on Azure Service Bus queues and topics or RabbitMQ queues (for local runs), validates the messages, and persists them to Cosmos DB
Event-Reader-Sink: a microservice using Apache Camel to handle message ingestion from different cloud messaging systems and store the messages in the respective storage solutions
The build.gradle file of each microservice contains a code snippet which creates the image using the source code and pushes the image to ImageHub.
The highlighted GitHub actions have code with which to build the image and push
command:
./gradlew jib
.
Once the GitHub actions job is complete, you can find images available with imagehub.
For instructions about deploying and managing PS API microservices in Kubernetes using Helm charts, visit the following links:
For information about building and deploying PS API’s microservices to EKS clusters, visit the following GitHub repositories:
Users can find general information about PS API and application code in the CDCGov repository.
GitHubENT is a private CDC repository where users can request permission to view folders with relevant configuration files.
.github Folder: Users can find workflow files (.yml) to:
AKS/EKS Folders: Users can find the helm values.yml file, which contains information about:
The following docker-compose.yml
file sets up a RabbitMQ server running inside
a docker container.
rabbitmq: management
includes the management plugin that allows access to RabbitMQ’s web-base
d management UI where users can create exchanges and queues and bind queues to the exchanges with the unique routing key.Port 15672: Enables access to RabbitMQ’s web-based UI, localhost:15672
RABBITMQ_USERNAME
or RABBITMQ_PASSWORD
is not provided, the default guest
is used for both.To start the RabbitMQ container in detach mode, run following command:
docker-compose up -d
To view the logs and ensure the server is running successfully, run the following command:
docker-compose logs rabbitmq
To stop and remove the container, run the following command:
docker-compose down
Users can configure AWS resources via terraform. The following GitHub repositories and relevant folders provide more detailed information and resources about this configuration.
Terraform-coe: this repo includes Terraform modules for cloud resources, including Azure and AWS. These module resources are shareable across the projects.
Infra repo: This repo houses the infra code. Under the Terraform folder, you can find the environment folder name, and under that are application-specific folders (e.g., processing-status).
App: a folder for AWS applications (e.g., Lambda and FunctionApps).
Data: a folder for AWS data resources (e.g., SQS, SNS, DynamoDB, and RDS-Postgres).
Network: a folder for network resources (e.g., VPC, Subnets, SecurityGroup, and ASG).
The below code snippet represents where the Terraform state file stores processing status resources like RDS, DynamoDB, SNS, SQS:
backend "s3" {
bucket = "cdc-dex-tf-state-dev"
key = "psapi/data.tfstate"
region = "us-east-1"
}
To create the AWS resource, navigate to the respective folders and add the .tf(terraform) file. Users can create the Terraform resource for PS API by using the tf module reference code.
Similar to AWS, users can configure Azure resources with Terraform. The following GitHub repositories and relevant folders provide more detailed information and resources about this configuration.
Terraform-coe: this repo includes Terraform modules for cloud resources, including Azure and AWS. These module resources are shareable across the projects.
Infra repo: this repo houses the infra code. Under the Terraform folder, you can find the environment folder name, and under that are application-specific folders (e.g., processing-status).
Apps: a folder for Azure applications (e.g., functionAPP, app insights, and App service plan).
Data: a folder for Azure data resources (e.g., Cosmo-db, Azure storageaccount, and Servicebus).
Network: a folder for Azure network resources (e.g., RG, subnets, and SecurityGroups).
To create the Azure resource, navigate to the respective folders and add the .tf (Terraform) file. “Source” refers to the GitHub Terraform module repo. Users can create the Terraform resource for PS API by using the tf module reference code.
Infra, resources, and microservices created and deployed by PS API are supported by Linux-based systems. Windows and MacOS are not supported.
For authentication purposes, PS API supports OAuth 2.0 and JWT (JSON Web Tokens) for secure data transmission between parties. There are no further security measures available.
PS API supports standard OAuth 2.0 resource provider behaviors, offering flexibility and security based on configuration. When OAuth 2.0 authentication is configured, PS API follows established industry protocols for token validation.
Our system supports the following OAuth 2.0 authentication protocol:
When using JWT for authentication, PS API follows these behaviors:
The system will return appropriate HTTP response codes to the requesting party, providing clear feedback on the success or failure of the authentication process.
The following diagram explores PS API system and network configurations:
As data is ingested by PS API through a file upload, data processing occurs in stages. Each processing stage generates a processing status report, an essential part of PS API’s data observability model.
The processing status report-sink listens for messages, validates the messages, and persists them to one of the supported databases (Azure Cosmos DB, AWS Dynamo DB, Couchbase, or MongoDB). If validation is successful, the message persists under a Reports container. If the validation fails due to missing fields or malformed data, the message persists under a dead-letter container.
Review detailed information about PS API environment variables for databases and messaging systems.
Review detailed information about PS API environment variables for GraphQL mutations.
The combination of these tools helps to monitor the Kubernetes infrastructure. The PS API team uses these tools to monitor metrics and create dashboards to visually represent computer and memory usage.
PS API can provide users with both passive and active notifications. To send passive notifications, PS API looks at the content of the individual reports and determines whether a notification is sent based on predetermined rules.
Active notifications are scheduled to examine a subset of data to determine whether a notification gets sent based on predetermined rules (e.g., ADF jobs that mine the Reports database for information).
Notify data senders using intermediaries like IZGW that an upload occurred.
Notify technical assistants that the system denied an end-user data access.
Temporal is cloud-agnostic, highly scalable, and designed to manage complex, long-running workflows. It can run on any cloud provider or on-premises infrastructure, making it an excellent choice for cloud-agnostic applications.
Review the official Temporal documentation for more information about installing Temporal in your preferred environment (self-hosted, cloud, etc.).
Prerequisites
Temporal provides a pre-configured Docker Compose setup that simplifies running Temporal locally.
Once inside the cloned docker-compose directory, you can start the Temporal services.
The docker-compose up
command will pull and start several containers for different components:
The services will start on the following default ports:
localhost:7233
For individuals using Elasticsearch, the Temporal Web UI is available at http://localhost:8080. This UI allows you to monitor and manage your workflows, view task queues, and see the status of running and completed workflows.
Gradle will help with managing dependencies, building, and configuring your application to add necessary dependencies for both Ktor and Temporal.
Step 4.1: Set up a Gradle project
Generate a Kotlin-based Gradle project using IntelliJ IDEA, or manually create the build.gradle.kts
file in your project.
Step 4.2: Add dependencies
dependencies {
implementation("io.temporal:temporal-sdk:1.9.0")
}
Workflows in Temporal are designed to be durable, meaning they can survive process restarts, outages, and other failures. The workflow methods you define in your application will be run within Temporal’s managed environment.
Step 5.1: Define the workflow interfaces for each scenario!
Step 5.2: Create each workflow method with corresponding implementation
Step 5.3: Set up a Temporal worker that will poll the Temporal server for tasks and execute workflows
docker-compose up
commandTemporal activities allow users to separate application responsibilities, enabling asynchronous execution and providing built-in features like automatic retries and timeouts. By using activity stubs in Temporal workflows, you can define tasks that Temporal manages and executes and specify how they should behave.
To set up activities, you must define the tasks that your workflow will execute, write the logic for these tasks, and implement them within your workflow definition. This requires defining the activity stub and pass in the activity class, the activity method, and any timeout and retry options. An example of an activity stub can be seen below.
Note: Activities must be implemented as Java classes that define the methods invoked by the workflow, following the activity interface.
The pstatus-graphql-ktor
microservice lets users query the PS API to get reports, dead letter reports, and notifications. More information can be found in the GitHub ReadMe.
The Environmental Variable Setup section under the ReadMe provides a detailed list of environment variables for local development.
localhost:8080/graphiql
Using Apache Camel to route messages, the event-reader-sink microservice is designed to handle message ingestion from different cloud messaging systems and store the messages in respective storage solutions. The event-reader-sink can be integrated with both AWS and Azure services.
Follow these instructions to set up and configure the PS API event-reader-sink.
The GraphQL service is down.
Verify the status of the service at http://localhost:8080/graphiql/getHealth
If the response status is “UP,” the service and the Cosmos database it is connected to are healthy and available.
The environment variables used in the application are incorrect. This usually happens during initial application setup or when the PS API team introduces a new variable.
Confirm that the environment variables are set up following the PS API guidelines.
The following configuration variable values are incorrect.
Ensure these variables are properly configured based on the target environment. All variables must come from the Azure portal. At minimum, users should include contributor roles in the resource group to get access to the service bus and its settings.
The notification service is down, indicated by a “DOWN” response for Azure Service Bus.
Users should run a health check to ensure that the service is healthy and the components used in the service are up and running. If a “DOWN” response is received, confirm that Azure Service Bus is running and that users have access to the service.
The notifications service is not sending email notifications due to an invalid SMTP server and port.
Confirm when running and testing the service that the SMTP server and port is valid and the application can send email notifications.
The web hook URL is configured incorrectly, and the notifications service cannot send real-time data between systems.
Ensure the URL (the endpoint where the server sends the data or payload) is healthy and reachable and that it is properly configured under the application configuration.
Temporal cannot be set up due to missing admin credentials.
After requesting an admin SU account, users can install and set up Docker, GitHub, and Temporal under the SU account.
Docker is unavailable due to incorrect installation.
Run the following command to ensure Docker Desktop is installed correctly: docker-compose version
Docker is unavailable due to missing admin credentials.
Confirm that you are using an admin SU account to run Docker Desktop.
The Docker engine is not working and shows the following error message:
After installing Docker Desktop, be sure to restart your machine. If the problem persists, uninstall and reinstall Docker and restart the machine.
Containers related to Temporal do not work as expected, due to incorrect installation.
Install and deploy Temporal to the docker containers using Git Bash. Using an admin SU account, install Git Bash and run the following commands to install Temporal as container images in Docker. Complete the following commands to install the Temporal prerequisites:
The ports needed for Temporal are in use (e.g., 7233 and 8080).
Ensure these ports are open and not in use, especially by any API or Ktor microservices.
The docker-compose GitHub window, which runs in the background and keeps track of the lifecycle of the workflow, was closed.
Keep the Git Bash Docker Compose up and running while Temporal is running inside the Docker container. If the window was closed, run the docker-compose-up
command to start the Temporal DB server again for workflow tracking and management.
Review detailed information about supported report-sink plugins.
The MSG_SYSTEM
environment variable config has an unsupported value.
Check for MSG_SYSTEM errors in the following ways:
MSG_SYSTEM
errors at application startup MSG_SYSTEM
environment variable to AWS
, AZURE_SERVICE_BUS
, or RABBITMQ
.The Cosmos DB database is not reachable due to incorrect COSMOS_DB_CLIENT_ENDPOINT
.
Check for incorrect environment variable settings for the Cosmos DB endpoint:
/health
endpoint status shows “DOWN” COSMOS_DB_CLIENT_ENDPOINT
and rerun the application.There is an incorrect or missing COSMOS_DB_CLIENT_KEY
. The /health
endpoint status for Cosmos DB shows “DOWN” due to incorrect config settings or key regeneration.
Correct the COSMOS_DB_CLIENT_KEY
. The most recent keys can be found on the Azure Cosmos DB portal.
The RabbitMQ server inside the Docker container is not running.
Check for server errors in two ways:
/health
endpoint: An incorrect or missing RabbitMQ queue was provided.
The SERVICE_BUS_CONNECTION_STRING
is incorrect.
Check for connection string errors in two ways:
/health
endpoint status shows “DOWN” There is an incorrect or missing SERVICE_BUS_REPORT_QUEUE_NAME
.
Check for queue name errors in two ways:
/health
endpoint status shows “DOWN” for Azure Service Bus There is an incorrect or missing SERVICE_BUS_REPORT_TOPIC_NAME
. The logs will show the following error for an invalid AWS_REGION
:
Log in to the AWS console and verify the correct option from the region drop down.
There is an incorrect or missing AWS_ACCESS_KEY
.
Check that the config setting includes the correct AWS_ACCESS_KEY
. If the error persists, log in to the AWS console for the most recent AWS_ACCESS_KEY
associated with IAM user.
There is an incorrect or missing AWS_SECRET_ACCESS_KEY
for the IAM role.
Check that the config setting includes the correct AWS_SECRET_ACCESS_KEY
. If the error persists, log in to the AWS console for the most recent AWS_SECRET_ACCESS_KEY
associated to IAM user.
The appropriate environment variables are not set for the AWS or the Azure environments.
Update the respective environment variable values specific to the environment you are trying to connect to.
The AWS Access Key or Secret Key are invalid.
Verify that the specified AWS Access Key or Secret Key are set to the correct values.
Unable to establish connection to the AWS Queue/ Topic.
Verify that the specified AWS resources exist and are valid. Set the correct values for the respective environment variables.
The AWS region values are unavailable or set up incorrectly.
Verify that the specified AWS resources exist and are valid.
The Azure Service Bus queue/topics are unavailable or invalid.
Verify that the specified Azure resources exist, have respective environment configuration values and are valid.
The Azure Service Bus connection could not be established due to an invalid service bus environment config value.
Verify and update the respective Azure resources for respective local environment variables and provide the valid values.
If you need additional information, have questions, or need to report an incident, please contact the PHDO PS API Team at dexuploadapi@cdc.gov.