Managing Multiple Catalogs¶
This guide explains how to work with multiple catalog repositories in the CFA DataOps system.
Overview¶
The DataOps system is designed around catalog repositories that you create using dataops_catalog_init
. Multiple catalogs can be installed in the same Python environment, providing unified access to all datasets and reports through datacat
and reportcat
.
Creating Your First Catalog¶
-
Create a catalog repository:
-
Install in development mode:
-
Verify installation:
Working with Multiple Catalogs¶
Installing Multiple Catalogs¶
You can install multiple catalog libraries in the same environment:
# Create and install different catalogs
dataops_catalog_init scenarios /path/to/scenarios-catalog
dataops_catalog_init surveillance /path/to/surveillance-catalog
dataops_catalog_init my_project /path/to/my-project-catalog
# Install each catalog
cd /path/to/scenarios-catalog && pip install -e .[dev]
cd /path/to/surveillance-catalog && pip install -e .[dev]
cd /path/to/my-project-catalog && pip install -e .[dev]
Unified Access¶
All datasets and reports become accessible through unified interfaces:
from cfa.dataops import datacat
from cfa.dataops.reporting import reportcat
# Access datasets from any installed catalog
datacat.private.scenarios.covid19vax_trends.load.get_dataframe()
datacat.private.surveillance.flu_trends.load.get_dataframe()
datacat.private.my_project.custom_dataset.load.get_dataframe()
# Access reports from any installed catalog
reportcat.private.scenarios.examples.basics_ipynb
reportcat.private.surveillance.weekly.summary_ipynb
reportcat.private.my_project.analysis.trend_report_ipynb
Listing Available Resources¶
# List all datasets across all catalogs
print("Available datasets:", datacat.__namespace_list__)
# List all reports across all catalogs
print("Available reports:", reportcat.__namespace_list__)
# Explore specific catalog namespaces
print("Scenarios datasets:", dir(datacat.scenarios))
print("Surveillance reports:", dir(reportcat.surveillance))
Catalog Repository Structure¶
Each catalog repository contains:
my-catalog/
├── cfa/
│ └── catalog/
│ └── my_catalog/
│ ├── __init__.py
│ ├── catalog_defaults.toml
│ ├── datasets/ # Dataset configurations (TOML files)
│ │ ├── dataset1.toml
│ │ └── dataset2.toml
│ ├── reports/ # Jupyter notebook templates
│ │ ├── examples/
│ │ └── analysis/
│ └── workflows/ # ETL and processing scripts
│ ├── etl/
│ ├── multistage/
│ └── reference_data/
├── pyproject.toml
├── MANIFEST.in
└── .gitignore
Best Practices¶
Organization by Domain¶
- scenarios: COVID-19 modeling and forecasting datasets
- surveillance: Disease surveillance and monitoring data
- reference: Static reference data used across projects
- my_project: Project-specific datasets and analyses
Naming Conventions¶
- Use descriptive catalog names that reflect their purpose
- Keep dataset names consistent within each catalog
- Use clear, hierarchical organization for reports
Development Workflow¶
- Create separate catalogs for different data domains
- Install all relevant catalogs in your development environment
- Use
datacat
andreportcat
for unified access - Develop datasets and reports within their appropriate catalog repositories
Sharing Catalogs¶
- Catalog repositories can be shared via Git repositories
- Teams can install each other's catalogs to access shared datasets
- Use proper versioning and documentation for shared catalogs
Common Patterns¶
Cross-Catalog Analysis¶
# Combine data from multiple catalogs
scenarios_data = datacat.scenarios.covid19vax_trends.load.get_dataframe()
surveillance_data = datacat.surveillance.flu_trends.load.get_dataframe()
# Create combined analysis
combined_analysis = analyze_trends(scenarios_data, surveillance_data)
Catalog-Specific Reports¶
# Generate reports using data from specific catalogs
report = reportcat.private.scenarios.analysis.trend_analysis_ipynb
report.nb_to_html_file(
html_out_path="trend_report.html",
dataset_namespace="scenarios.covid19vax_trends"
)
Troubleshooting¶
Catalog Not Found¶
- Ensure the catalog is properly installed:
pip list | grep cfa.catalog
- Check that you're in the correct Python environment
- Verify the catalog was created successfully
Import Errors¶
- Reinstall the catalog in development mode:
pip install -e .[dev]
- Check for naming conflicts between catalogs
- Ensure all dependencies are installed
Namespace Conflicts¶
- Use unique catalog names to avoid conflicts
- Check
datacat.__namespace_list__
for existing namespaces - Consider renaming conflicting catalogs