Skip to content

griddler

Griddles

A griddle is an intuitive file format for specifying lists of parameter sets. The syntax is inspired by the "matrix" strategy in GitHub Workflow files.

The griddle format assumes that parameters come in three flavors:

  • Baseline: You want these parameters in common across most iterations of your simulation.
  • Grid: You want to perform a Cartesian product over lists of values for some parameters.
  • Nested: For certain combinations of "gridded" parameters, you want to specify additional parameters, potentially overwriting the baseline parameters.

The griddle format is easy for humans to read and write. See "The griddle format" for a complete specification of the format along with examples.

API overview

The important functions in this package are griddler.griddle.read(), griddler.run_squash(), and griddler.replicated(), each of which are described below in overview and in the API reference in detail.

flowchart TD
    griddle[/my_parameter_griddle.yaml/]
    read_griddle["**griddle.read()**"]
    pss[/list of ParameterSet objects/]

    griddle --> read_griddle --> pss

    fun[/"my_simulation()"/]
    ps[/ParameterSet/]
    result1[/"one simulation's output (pl.DataFrame)"/]
    fun --> result1
    ps --> result1

    run_squash["**run_squash()**"]
    results[/"multiple simulations' outputs (pl.DataFrame)"/]
    fun --> run_squash
    pss --> run_squash
    run_squash --> results

    replicated["**replicated()**"]
    replicated_fun[/"my_replicated_simulation()"/]
    fun --> replicated --> replicated_fun

Parsing griddles

The griddler.griddle.read() function takes a YAML and returns a list of ParameterSet objects:

parameter_sets = griddler.griddle.read("griddle.yaml")

with open("parameter_sets.yaml", "w") as f:
    yaml.dump(parameter_sets, f)

The package includes a console script so that you can use this functionality straight from the command line:

griddler parse < griddle.yaml > parameter_sets.yaml

will read a griddle YAML and output a YAML file. This output file is a list of named lists. Each named list is a parameter set, one for each element of the grid.

Running a function and "squashing" results

Given

  1. a function, say simulate(), that takes a parameter set and returns a polars DataFrame, and
  2. a list of parameter sets, such as from griddler.griddle.read(),

then griddler.run_squash(simulate, parameter_sets) will return a "squashed" version of the results. This is a single DataFrame consisting of the other DataFrames, vertically concatenated.

If add_parameters=True and parameter_columns=None (which is the default), then each resulting DataFrame has all the input parameters. This will only work if all the parameters have scalar values that can be coerced with pl.lit(). A subset of parameters can be added using parameter_columns.

If add_hash=True (which is the default), then a column (with name equal to hash_column, which is "hash" by default) with the parameter set hash will also be added.

Running a function with replicates

Given

  • a function, say simulate(), like in the example above, and
  • a list of parameter sets, as above, but with each parameter set having keys "n_replicates" and "seed",

then replicated(simulate) is itself a function, with the same signature as simulate(). It removes the "n_replicates" and "seeds" parameters from each parameter set, sets the seed with random.seed(), and then runs simulate() on the remaining parameters n_replicate times. The results gets squashed with a column "replicate" (or whatever name you specify).