Architecture¶

Raw data comes in whatever form
They get cleaned up to some common schema
A "model" is everything between the common-schema input and some common-schema output
There might be further post-processing beyond that, that is still upstream of visualizations and so forth
An "evaluator" is some thing that coordinates the action of multiple score functions

flowchart TD

%% Raw data and common pre-processing
raw_data[/Raw data/] --> preproc[Common pre-processing]
preproc --> clean_training[/Clean training data/]
preproc --> clean_eval[/Clean eval data/]

%% Individual models
subgraph model1[Model 1]
  preproc1[Model-specific pre-processing]
  comp1[Mathematical modeling]
  postproc1[Model-specific post-processing]
end

subgraph model2[Model 2]
  preproc2[Model-specific pre-processing]
  comp2[Mathematical modeling]
  postproc2[Model-specific post-processing]
end

output1[/Model 1 output/]
output2[/Model 2 output/]

%% Flow through the models
clean_training --> preproc1 --> comp1 --> postproc1 --> output1
clean_training --> preproc2 --> comp2 --> postproc2 --> output2

%% Evaluation
subgraph Evaluator
    score1[Score function 1]
    score2[Score function 2]
end

outputs --> score1
outputs --> score2
clean_eval --> score1
clean_eval --> score2
score1 --> scores
score2 --> scores
scores[/Scores/]

%% Common post-processing
postproc[Common post-processing] --> outputs[/Clean outputs/]

%% Diagnostics
outputs --> dx_scripts[Diagnostic scripts] --> dx[/Diagnostic files & figures/]
output1 --> postproc
output2 --> postproc

%% Visualizations
viz_scripts[Visualization scripts]
outputs --> viz_scripts
scores --> viz_scripts