DEX (Data Exchange) Product Documentation

Documentation for DEX products

View the Project on GitHub CDCgov/data-exchange

Complete Product Guide for PHDO PS API

Introduction

Product Overview

The Public Health Data Observability (PHDO) Processing Status (PS) API is one tool in the broader Centers for Disease Control and Prevention (CDC) Data Exchange(DEX) service offering. It was developed to support public health data senders in their efforts to share critical public health data with internal CDC Programs. Data senders are CDC partners across the country, including:

The PHDO PS API is a self-hosted, containerized, and pre-packaged tool for data submission visibility. It is configurable and can be run both locally and in any cloud environment. PHDO PS API was developed to provide visibility into the status, performance, and downstream processing results of file uploads. It enables users to request reports about the status of uploads and to develop custom queries to learn detailed information about the data being uploaded and processed.

Key Features

The PS API has three pillars of functionality: Reports, Queries, and Notifications.

image02

Reports

Reports are an essential component of the data observability aspect of PHDO. In PHDO, data is ingested to the system typically through a file upload. As the upload progresses through the service line, several processing stages occur, including upload, routing, data validation, data transformations, etc. Within each of those stages one or more actions may occur. As data moves through CDC systems, services both internal to PHDO and downstream of PHDO indicate the processing status of these stages through Reports.

For example, one action within the upload stage may be to first verify that all the required metadata associated with the uploaded file is provided and reject it if not. Other upload actions may include the file upload itself or the disposition of the upload for further downstream processing.

Queries

Queries provide a mechanism to ask questions about processing status using GraphQL, a flexible data query and manipulation language that allows a client to specify what data it wants. Queries can provide a wide range of insights, from the status of an upload to the number of uploads for a given data stream over a period of time. Also available are queries to provide data analysis, such as discovering duplicate file uploads, counts of error messages by type, counts of file uploads by data stream, and more.

Notifications

Notifications are a way for end users to automatically receive a variety of analytical products from the PS API. Some examples include sending an email whenever an upload fails, invoking a webhook to inform a server when downstream processing completes an upload, and sending a daily digest about the number of uploads received by jurisdiction for a given data stream. There are a host of options built into the PS API and an endless number of customized analytical products that can be provided through the dynamic business rules engine.

System Requirements

The system requirements for deployment and operation of the PS API will vary depending on the use case. The PS API is designed to be cloud agnostic and is capable of running in Azure and AWS, as well as locally on a single machine.

When running PS API in the cloud, the requirements of each service will depend on the load on the system. System load includes the rate reports that are coming to the PS API and the number and complexity of concurrent queries.

When running locally, there is an all-in-one option for spinning up all of the services and dependencies of the PS API. This option can be run from docker-compose or pod compose, which means the PS API can be run from Window, Mac, and Linux. When all of the services are running, approximately 3.5G of memory will be consumed during typical load and 2.5G when idle. The amount of disk space for local deployment and operation of the all-in-one image is around 2GB.

Endpoint Documentation

The PHDO PS API is deployed to the cloud for CDC Enterprise use. The endpoints for the staging and production environments are provided below.

Detailed Endpoint Descriptions
Component Staging Production
GraphQL Playground (Web App) https://pstatusgraphql.ocio-eks-stg-ede.cdc.gov/graphiql https://pstatusgraphql.ocio-eks-prd-ede.cdc.gov/graphiql
GraphQL Service https://pstatusgraphql.ocio-eks-stg-ede.cdc.gov/graphql https://pstatusgraphql.ocio-eks-prd-ede.cdc.gov/graphql
Report Message Service    
AWS ARNs for Messaging
Staging Production
<<insert ARN>> <<insert ARN>>
Request and Response Formats (JSON, XML)

Please see the GraphQL introspection documentation for a detailed description of each of the responses for the GraphQL queries and mutations.

Status Codes and Error Handling

Error handling for the resultant codes should be in place for all the 4xx errors .

Health Checks
Component Staging Production
GraphQL Service https://pstatusgraphql.ocio-eks-stg-ede.cdc.gov/health https://pstatusgraphql.ocio-eks-prd-ede.cdc.gov/health
Report Sink Service https://pstatusreport.ocio-eks-stg-ede.cdc.gov/health https://pstatusreport.ocio-eks-prd-ede.cdc.gov/health
Notifications Rules Engine Service https://pstatusnotification.ocio-eks-stg-ede.cdc.gov/health https://pstatusnotification.ocio-eks-prd-ede.cdc.gov/health
Notifications Workflow Service https://notifications-workflow-service.ocio-eks-stg-ede.cdc.gov/health https://notifications-workflow-service.ocio-eks-prd-ede.cdc.gov/health

Use Cases

The PS API can be used independently or in tandem with the Upload API. Within each of these two scenarios, stand-alone or in concert with Upload API, there are a number of use cases.

Regardless of the use case, the PS API brings core capabilities, including:

Stand-Alone Use Cases

When PS API is used independently of other PHDO services like the Upload API, the primary users have their own mechanism for ingesting data into their system. In this configuration, users incorporate PS API as a way to capture and observe the health of their data as it progresses through their service line.

For example, a data processing pipeline may be defined to run in Azure Data Factory (ADF), Databricks, function apps, lambda functions, services running in the cloud, or on prem. There is an endless number of environments that a data pipeline may be running in. The PS API is like a software sidecar that is informed of the status from within whatever environment the user has. Calls are made at various points along the processing timeline into PS API. Those calls can be made via GraphQL or through a messaging system for async or high-bandwidth situations.

Below are some of the possible stand-alone use cases:

PHDO Use Cases

PS API can work seamlessly alongside the PHDO Upload API. The Upload API is aware of the PS API and attempts to provide status as an upload occurs. For example, if an upload is rejected because it is missing a manifest, fails to complete uploading, or can’t be delivered once uploaded, those events are automatically relayed to the PS API when the Upload API is used.

Below are some of the possible PHDO use cases:

Installation and Setup

General Installation Guidelines

Microservices

PS API is made up of several microservices that, when deployed together, allow for data upload visibility, report generation, and personalized notifications.

GraphQL: a microservice that can be built as a docker container image to provide reports, dead letter reports, and notifications

Notifications: a microservice that can be configured to provide customized analytical products through the dynamic business rules engine

Notifications Workflow: a workflow orchestration microservice for processing and evaluating the active notification rules using Temporal as its workflow engine

Report-Sink: a microservice that listens for messages on Azure Service Bus queues and topics or RabbitMQ queues (for local runs), validates the messages, and persists them to Cosmos DB

Event-Reader-Sink: a microservice using Apache Camel to handle message ingestion from different cloud messaging systems and store the messages in the respective storage solutions

Image Creation

The build.gradle file of each microservice contains a code snippet which creates the image using the source code and pushes the image to ImageHub.

image03

The highlighted GitHub actions have code with which to build the image and push command:
./gradlew jib.

image04

Once the GitHub actions job is complete, you can find images available with imagehub.

image05

image06

Helm Deployment

For instructions about deploying and managing PS API microservices in Kubernetes using Helm charts, visit the following links:

GitHub Repositories

For information about building and deploying PS API’s microservices to EKS clusters, visit the following GitHub repositories:

CDCGov

Users can find general information about PS API and application code in the CDCGov repository.

GitHubENT

GitHubENT is a private CDC repository where users can request permission to view folders with relevant configuration files.

.github Folder: Users can find workflow files (.yml) to:

AKS/EKS Folders: Users can find the helm values.yml file, which contains information about:

Local Installation and Deployment

Docker Container Installation and Deployment

Prerequisites
Step 1: Set Up the RabbitMQ Server

The following docker-compose.yml file sets up a RabbitMQ server running inside a docker container.
image07

Step 2: Start the Container

To start the RabbitMQ container in detach mode, run following command:
docker-compose up -d

Step 3: Check the Logs

To view the logs and ensure the server is running successfully, run the following command:
docker-compose logs rabbitmq

Step 4: Stop the Container

To stop and remove the container, run the following command:
docker-compose down

Cloud Installation and Deployment

AWS Installation and Deployment

Users can configure AWS resources via terraform. The following GitHub repositories and relevant folders provide more detailed information and resources about this configuration.

GitHub Repositories

Terraform-coe: this repo includes Terraform modules for cloud resources, including Azure and AWS. These module resources are shareable across the projects.
Infra repo: This repo houses the infra code. Under the Terraform folder, you can find the environment folder name, and under that are application-specific folders (e.g., processing-status).

Application-Specific Folders

App: a folder for AWS applications (e.g., Lambda and FunctionApps).
Data: a folder for AWS data resources (e.g., SQS, SNS, DynamoDB, and RDS-Postgres).
Network: a folder for network resources (e.g., VPC, Subnets, SecurityGroup, and ASG).

The below code snippet represents where the Terraform state file stores processing status resources like RDS, DynamoDB, SNS, SQS:

backend "s3" { 
    bucket = "cdc-dex-tf-state-dev" 
    key = "psapi/data.tfstate" 
    region = "us-east-1" 
} 

To create the AWS resource, navigate to the respective folders and add the .tf(terraform) file. Users can create the Terraform resource for PS API by using the tf module reference code.

image08

Azure Installation and Deployment

Similar to AWS, users can configure Azure resources with Terraform. The following GitHub repositories and relevant folders provide more detailed information and resources about this configuration.

GitHub Repositories

Terraform-coe: this repo includes Terraform modules for cloud resources, including Azure and AWS. These module resources are shareable across the projects.
Infra repo: this repo houses the infra code. Under the Terraform folder, you can find the environment folder name, and under that are application-specific folders (e.g., processing-status).

Application-Specific Folders

Apps: a folder for Azure applications (e.g., functionAPP, app insights, and App service plan).
Data: a folder for Azure data resources (e.g., Cosmo-db, Azure storageaccount, and Servicebus).
Network: a folder for Azure network resources (e.g., RG, subnets, and SecurityGroups).

To create the Azure resource, navigate to the respective folders and add the .tf (Terraform) file. “Source” refers to the GitHub Terraform module repo. Users can create the Terraform resource for PS API by using the tf module reference code.

image09

Operating Systems

Infra, resources, and microservices created and deployed by PS API are supported by Linux-based systems. Windows and MacOS are not supported.

Security

For authentication purposes, PS API supports OAuth 2.0 and JWT (JSON Web Tokens) for secure data transmission between parties. There are no further security measures available.

OAuth 2.0 Authentication

PS API supports standard OAuth 2.0 resource provider behaviors, offering flexibility and security based on configuration. When OAuth 2.0 authentication is configured, PS API follows established industry protocols for token validation.

Supported Protocols and Standards

Our system supports the following OAuth 2.0 authentication protocol:

OAuth 2.0 Behaviors for JWT

When using JWT for authentication, PS API follows these behaviors:

The system will return appropriate HTTP response codes to the requesting party, providing clear feedback on the success or failure of the authentication process.

Configuration and Usage

PS API System Overview

The following diagram explores PS API system and network configurations:

image10

Report-Sinking

As data is ingested by PS API through a file upload, data processing occurs in stages. Each processing stage generates a processing status report, an essential part of PS API’s data observability model.

The processing status report-sink listens for messages, validates the messages, and persists them to one of the supported databases (Azure Cosmos DB, AWS Dynamo DB, Couchbase, or MongoDB). If validation is successful, the message persists under a Reports container. If the validation fails due to missing fields or malformed data, the message persists under a dead-letter container.

Setting Up Databases and Messaging Systems

Review detailed information about PS API environment variables for databases and messaging systems.

Setting Up GraphQL Mutations

Review detailed information about PS API environment variables for GraphQL mutations.

Terraform Configuration

Supported Databases

AWS Databases

Azure Databases

Supported Messaging Tools

AWS Messaging Tools
Azure Messaging Tools

Supported Monitoring Tools

Prometheus, Grafana, and Loki

The combination of these tools helps to monitor the Kubernetes infrastructure. The PS API team uses these tools to monitor metrics and create dashboards to visually represent computer and memory usage.

Notifications

Notification Types

PS API can provide users with both passive and active notifications. To send passive notifications, PS API looks at the content of the individual reports and determines whether a notification is sent based on predetermined rules.

Active notifications are scheduled to examine a subset of data to determine whether a notification gets sent based on predetermined rules (e.g., ADF jobs that mine the Reports database for information).

Passive Notifications

Active Notifications

Creating and Configuring Notifications with Temporal

Temporal is cloud-agnostic, highly scalable, and designed to manage complex, long-running workflows. It can run on any cloud provider or on-premises infrastructure, making it an excellent choice for cloud-agnostic applications.

Setting Up and Configuring Temporal

Review the official Temporal documentation for more information about installing Temporal in your preferred environment (self-hosted, cloud, etc.).

Prerequisites

Step 1: Clone the Temporal Docker Repository

Temporal provides a pre-configured Docker Compose setup that simplifies running Temporal locally.

Step 2: Start Temporal Using Docker Compose

Once inside the cloned docker-compose directory, you can start the Temporal services.

The docker-compose up command will pull and start several containers for different components:

The services will start on the following default ports:

Step 3: Access the Temporal Web UI

For individuals using Elasticsearch, the Temporal Web UI is available at http://localhost:8080. This UI allows you to monitor and manage your workflows, view task queues, and see the status of running and completed workflows.

Step 4: Set Up a Ktor Temporal Client in Your Application

Gradle will help with managing dependencies, building, and configuring your application to add necessary dependencies for both Ktor and Temporal.

Step 4.1: Set up a Gradle project
Generate a Kotlin-based Gradle project using IntelliJ IDEA, or manually create the build.gradle.kts file in your project.

Step 4.2: Add dependencies

dependencies {   
 implementation("io.temporal:temporal-sdk:1.9.0")   
}   
Step 5: Define Workflow Parameters

Workflows in Temporal are designed to be durable, meaning they can survive process restarts, outages, and other failures. The workflow methods you define in your application will be run within Temporal’s managed environment.

Step 5.1: Define the workflow interfaces for each scenario! image11

Step 5.2: Create each workflow method with corresponding implementation
image12

Step 5.3: Set up a Temporal worker that will poll the Temporal server for tasks and execute workflows
image13

Infrastructure Setup on Azure
Infrastructure Setup on AWS
Using Kubernetes for Cloud-Agnostic Deployments

Setting Up Activities Using Temporal Workflows

Temporal activities allow users to separate application responsibilities, enabling asynchronous execution and providing built-in features like automatic retries and timeouts. By using activity stubs in Temporal workflows, you can define tasks that Temporal manages and executes and specify how they should behave.

Activity Types
How to Set Up Activities

To set up activities, you must define the tasks that your workflow will execute, write the logic for these tasks, and implement them within your workflow definition. This requires defining the activity stub and pass in the activity class, the activity method, and any timeout and retry options. An example of an activity stub can be seen below.

Note: Activities must be implemented as Java classes that define the methods invoked by the workflow, following the activity interface.

image14

GraphQL

Overview

The pstatus-graphql-ktor microservice lets users query the PS API to get reports, dead letter reports, and notifications. More information can be found in the GitHub ReadMe.

Setting Up and Configuring GraphQL

The Environmental Variable Setup section under the ReadMe provides a detailed list of environment variables for local development.

Step 1: Load the gradle dependencies for the initial setup and launch the application.
Step 2: The application and the respective API can now be accessed for querying in two ways:

Event-Reader-Sink

Overview

Using Apache Camel to route messages, the event-reader-sink microservice is designed to handle message ingestion from different cloud messaging systems and store the messages in respective storage solutions. The event-reader-sink can be integrated with both AWS and Azure services.

Setting Up and Configuring Event-Reader-Sink

Follow these instructions to set up and configure the PS API event-reader-sink.

Troubleshooting and Debugging

GraphQL

Common Issues and Solutions

Issue:

The GraphQL service is down.

Solution:

Verify the status of the service at http://localhost:8080/graphiql/getHealth
image15

Issue:

The environment variables used in the application are incorrect. This usually happens during initial application setup or when the PS API team introduces a new variable.

Solution:

Confirm that the environment variables are set up following the PS API guidelines.

Notifications

Common Issues and Solutions

Issue:

The following configuration variable values are incorrect.
image16

Solution:

Ensure these variables are properly configured based on the target environment. All variables must come from the Azure portal. At minimum, users should include contributor roles in the resource group to get access to the service bus and its settings.

Issue:

The notification service is down, indicated by a “DOWN” response for Azure Service Bus.

Solution:

Users should run a health check to ensure that the service is healthy and the components used in the service are up and running. If a “DOWN” response is received, confirm that Azure Service Bus is running and that users have access to the service.
image17

Issue:

The notifications service is not sending email notifications due to an invalid SMTP server and port.

Solution:

Confirm when running and testing the service that the SMTP server and port is valid and the application can send email notifications.

Issue:

The web hook URL is configured incorrectly, and the notifications service cannot send real-time data between systems.

Solution:

Ensure the URL (the endpoint where the server sends the data or payload) is healthy and reachable and that it is properly configured under the application configuration.

Temporal Issues and Solutions

Issue:

Temporal cannot be set up due to missing admin credentials.

Solution:

After requesting an admin SU account, users can install and set up Docker, GitHub, and Temporal under the SU account.

Issue:

Docker is unavailable due to incorrect installation.

Solution:

Run the following command to ensure Docker Desktop is installed correctly: docker-compose version

Issue:

Docker is unavailable due to missing admin credentials.

Solution:

Confirm that you are using an admin SU account to run Docker Desktop.

Issue:

The Docker engine is not working and shows the following error message:
image18

Solution:

After installing Docker Desktop, be sure to restart your machine. If the problem persists, uninstall and reinstall Docker and restart the machine.

Issue:

Containers related to Temporal do not work as expected, due to incorrect installation.

Solution:

Install and deploy Temporal to the docker containers using Git Bash. Using an admin SU account, install Git Bash and run the following commands to install Temporal as container images in Docker. Complete the following commands to install the Temporal prerequisites:
image19

Issue:

The ports needed for Temporal are in use (e.g., 7233 and 8080).

Solution:

Ensure these ports are open and not in use, especially by any API or Ktor microservices.

Issue:

The docker-compose GitHub window, which runs in the background and keeps track of the lifecycle of the workflow, was closed.

Solution:

Keep the Git Bash Docker Compose up and running while Temporal is running inside the Docker container. If the window was closed, run the docker-compose-up command to start the Temporal DB server again for workflow tracking and management.

Report-Sink

Review detailed information about supported report-sink plugins.

Issues and Solutions for All Report-Sink Modules

Issue:

The MSG_SYSTEM environment variable config has an unsupported value.

Solution:

Check for MSG_SYSTEM errors in the following ways:

  1. Check the logs for MSG_SYSTEM errors at application startup
    image20
  2. Review the /health endpoint
    image21
  3. Change the MSG_SYSTEM environment variable to AWS, AZURE_SERVICE_BUS, or RABBITMQ.
Issue:

The Cosmos DB database is not reachable due to incorrect COSMOS_DB_CLIENT_ENDPOINT.

Solution:

Check for incorrect environment variable settings for the Cosmos DB endpoint:

  1. The /health endpoint status shows “DOWN”
    image22
  2. The logs will show the following error:
    image23
  3. Correct the COSMOS_DB_CLIENT_ENDPOINT and rerun the application.
Issue:

There is an incorrect or missing COSMOS_DB_CLIENT_KEY. The /health endpoint status for Cosmos DB shows “DOWN” due to incorrect config settings or key regeneration.
image24

Solution:

Correct the COSMOS_DB_CLIENT_KEY. The most recent keys can be found on the Azure Cosmos DB portal.

Issues and Solutions for RabbitMQ

Issue:

The RabbitMQ server inside the Docker container is not running.

Solution:

Check for server errors in two ways:

  1. Review the /health endpoint:
    image25
  2. Check the logs for the following message:
    image26
  3. Start the RabbitMQ server.
Issue:

An incorrect or missing RabbitMQ queue was provided.
image27

Solution:
  1. Access the RabbitMQ Management UI by navigating to http://localhost:15672/.
  2. Log in and navigate to the “Queues and Streams” tab to see a list of all queues.
  3. Verify your queue by checking that the configured queue is listed. If the queue is not present, you should recreate it.

Issues and Solutions for Azure Service Bus

Issue:

The SERVICE_BUS_CONNECTION_STRING is incorrect.

Solution:

Check for connection string errors in two ways:

  1. The /health endpoint status shows “DOWN”
    image28
  2. The logs will show the following error:
    image29
  3. Correct the connection string. The most recent Azure Service Bus connection string can be found on the AWS console.
Issue:

There is an incorrect or missing SERVICE_BUS_REPORT_QUEUE_NAME.

Solution:

Check for queue name errors in two ways:

  1. The /health endpoint status shows “DOWN” for Azure Service Bus
    image30
  2. The logs will show the following error
    image31
  3. Verify the correct queue name from Service Bus Data Explorer.
Issue:

There is an incorrect or missing SERVICE_BUS_REPORT_TOPIC_NAME. The logs will show the following error for an invalid AWS_REGION:
image32

Solution:

Log in to the AWS console and verify the correct option from the region drop down.
image33

Issue:

There is an incorrect or missing AWS_ACCESS_KEY.

Solution:

Check that the config setting includes the correct AWS_ACCESS_KEY. If the error persists, log in to the AWS console for the most recent AWS_ACCESS_KEY associated with IAM user.

Issue:

There is an incorrect or missing AWS_SECRET_ACCESS_KEY for the IAM role.

Solution:

Check that the config setting includes the correct AWS_SECRET_ACCESS_KEY. If the error persists, log in to the AWS console for the most recent AWS_SECRET_ACCESS_KEY associated to IAM user.

Event-Reader-Sink

Common Issues and Solutions

Issue:

The appropriate environment variables are not set for the AWS or the Azure environments.

Solution:

Update the respective environment variable values specific to the environment you are trying to connect to.

Issue:

The AWS Access Key or Secret Key are invalid.
image34

Solution:

Verify that the specified AWS Access Key or Secret Key are set to the correct values.

Issue:

Unable to establish connection to the AWS Queue/ Topic.
image35

Solution:

Verify that the specified AWS resources exist and are valid. Set the correct values for the respective environment variables.

Issue:

The AWS region values are unavailable or set up incorrectly.
image36

Solution:

Verify that the specified AWS resources exist and are valid.

Issue:

The Azure Service Bus queue/topics are unavailable or invalid.
image37

Solution:

Verify that the specified Azure resources exist, have respective environment configuration values and are valid.

Issue:

The Azure Service Bus connection could not be established due to an invalid service bus environment config value.

image37

Solution:

Verify and update the respective Azure resources for respective local environment variables and provide the valid values.

Product Support and Resources

If you need additional information, have questions, or need to report an incident, please contact the PHDO PS API Team at dexuploadapi@cdc.gov.