Skip to main content

The Trump Administration is working to reopen the government for the American people. Mission-critical activities of CDC will continue during the Democrat-led government shutdown. Certain federal government activities have ceased due to a lack of appropriated funding. During the government shutdown, only web sites supporting excepted functions will be updated. As a result, the information on this website may not be up to date and the agency may not be able to respond to inquiries.

Our products

Our products create a data architecture that is:

  • Flexible and modular
  • Open source
  • Easy to understand
  • Easy to integrate with existing workflows
  • Easy to test and implement

DIBBs Pipeline

A cloud-based data ingestion and processing pipeline that validates, cleans, standardizes, links, and stores public health data leveraging a core set of five Building Blocks. Public health departments can integrate our pipeline into their existing workflows to ingest and process multiple data streams (including eCR, ELR, ADT, and VXU) to create a single source of truth.

Building Blocks

Below, you will find a description of how the five core Building Blocks work to clean and transform data as part of the DIBBs pipeline. To see the full suite of containerized services, check out our containers repository.
ValidationReads and validates all eCR fields of interest based on specified, custom preferences; ensures that its XML structure is valid, that the required fields are present and in the correct format, and that the data is trustworthy
FHIR ConverterConverts incoming messages into the FHIR (Fast Healthcare Interoperability Resources) standard, which acts as a common language between data streams, thereby standardizing data streams for record linkage and making 1:1 comparisons
Ingestion

Consists of two separate steps: (1) Standardization; (2) Geocoding

Standardization: Standardizes data fields (including record name, date of birth, phone number, and geolocation) based on preset defaults to ensure consistency

Geocoding: Enriches data by providing precise geographic locations based on patient street addresses from input data

Record LinkageIdentifies multiple records referring to the same individual and combines them into a single, more complete patient record
Message ParserExtracts relevant data from an eCR into a tabular format (i.e. spreadsheet); customizable depending on user needs
OrchestrationEnables coordinated execution of DIBBs Building Blocks in any order, allowing for fully automated workflows