This guide covers everything you need to work with existing benchmarks in DecisionFocusedLearningBenchmarks.jl: generating datasets, assembling DFL pipeline components, applying algorithms, and evaluating results.
A benchmark bundles a problem family (an instance generator, a combinatorial solver, and a statistical model architecture) into a single object. It provides everything needed to run a Decision-Focused Learning experiment out of the box, without having to create each component from scratch. Three abstract types cover the main settings:
AbstractBenchmark: static problems (one instance, one decision)AbstractStochasticBenchmark{exogenous}: stochastic problems (type parameter indicates whether uncertainty is exogenous)AbstractDynamicBenchmark: sequential / multi-stage problems
The sections below explain what changes between these settings. For most purposes, start with a static benchmark to understand the core workflow.
Every benchmark exposes three key methods. For any static benchmark:
bench = ArgmaxBenchmark()
model = generate_statistical_model(bench; seed=0) # Flux model
maximizer = generate_maximizer(bench) # combinatorial oracle
dataset = generate_dataset(bench, 100; seed=0) # Vector{DataSample}generate_statistical_model: returns an untrained neural network that maps input featuresxto cost parametersθ.generate_maximizer: returns a callable(θ; context...) -> ythat solves the combinatorial problem given cost parameters.generate_dataset: returns labeled training data as aVector{DataSample}.
At inference time these two pieces compose naturally as an end-to-end policy:
θ = model(sample.x) # predict cost parameters
y = maximizer(θ; sample.context...) # solve the optimization problemAll data in the package is represented as DataSample objects.
| Field | Type | Description |
|---|---|---|
x |
any | Input features (fed to the statistical model) |
θ |
any | Intermediate cost parameters |
y |
any | Output decision / solution |
context |
NamedTuple |
Solver kwargs spread into maximizer(θ; sample.context...) |
extra |
NamedTuple |
Non-solver data (scenario, reward, step, …), never passed to the solver |
Not all fields are populated in every sample, depending on the setting. For convenience, named entries inside context and extra can be accessed directly on the sample via property forwarding:
sample.instance # looks up :instance in context first, then in extra
sample.scenario # looks up :scenario in context first, then in extraFor static benchmarks (<:AbstractBenchmark), generate_dataset may compute a default ground-truth label y if the benchmark implements it:
bench = ArgmaxBenchmark()
dataset = generate_dataset(bench, 100; seed=0) # Vector{DataSample} with x, y, contextYou can override the labels by providing a target_policy:
my_policy = sample -> DataSample(; sample.context..., x=sample.x, y=my_algorithm(sample.instance))
dataset = generate_dataset(bench, 100; seed=0, target_policy=my_policy)For AbstractStochasticBenchmark{true} benchmarks the default call returns unlabeled samples, each sample carries one scenario in sample.extra.scenario:
bench = StochasticVehicleSchedulingBenchmark()
dataset = generate_dataset(bench, 20; seed=0) # y = nothingRequest multiple scenarios per instance with nb_scenarios:
dataset = generate_dataset(bench, 20; seed=0, nb_scenarios=5)
# returns 20 × 5 = 100 samplesTo compute labels, wrap your algorithm as a target_policy:
anticipative = generate_anticipative_solver(bench) # (scenario; kwargs...) -> y
policy = (sample, scenarios) -> [
DataSample(; sample.context..., x=sample.x,
y=anticipative(ξ; sample.context...))
for ξ in scenarios
]
labeled = generate_dataset(bench, 20; seed=0, nb_scenarios=5, target_policy=policy)Dynamic benchmarks use a two-step workflow:
bench = DynamicVehicleSchedulingBenchmark()
# Step 1: create environments (reusable across experiments)
envs = generate_environments(bench, 10; seed=0)
# Step 2: roll out a policy to collect training trajectories
policy = generate_baseline_policies(bench)[1] # e.g. lazy policy
dataset = generate_dataset(bench, envs; target_policy=policy)
# dataset is a flat Vector{DataSample} of all steps across all trajectoriestarget_policy is required to create datasets for dynamic benchmarks (there is no default label).
It must be a callable (env) -> Vector{DataSample} that performs a full episode
rollout and returns the resulting trajectory.
All generate_dataset and generate_environments calls accept either seed
(creates an internal MersenneTwister) or rng for full control:
using Random
rng = MersenneTwister(42)
dataset = generate_dataset(bench, 50; rng=rng)# Average relative optimality gap across a dataset
gap = compute_gap(bench, dataset, model, maximizer)obj = objective_value(bench, sample, y)generate_baseline_policies returns a collection of named callables that can serve as
reference points or as target_policy arguments:
policies = generate_baseline_policies(bench)
pol = policies[1] # e.g. greedy, lazy, or anticipative policy- Static / stochastic:
pol(sample) -> DataSample - Dynamic:
pol(env) -> Vector{DataSample}(full episode trajectory)
For dynamic benchmarks you can evaluate a policy over multiple episodes:
rewards, samples = evaluate_policy!(pol, envs, n_episodes)Where implemented, benchmarks provide benchmark-specific plotting helpers:
plot_data(bench, sample) # overview of a data sample
plot_instance(bench, instance) # raw problem instance
plot_solution(bench, sample, y) # overlay solution on instance