Using Benchmarks

This guide covers everything you need to work with existing benchmarks in DecisionFocusedLearningBenchmarks.jl: generating datasets, assembling DFL pipeline components, applying algorithms, and evaluating results.

What is a benchmark?

A benchmark bundles a problem family (an instance generator, a combinatorial solver, and a statistical model architecture) into a single object. It provides everything needed to run a Decision-Focused Learning experiment out of the box, without having to create each component from scratch. Three abstract types cover the main settings:

AbstractBenchmark: static problems (one instance, one decision)
AbstractStochasticBenchmark{exogenous}: stochastic problems (type parameter indicates whether uncertainty is exogenous)
AbstractDynamicBenchmark: sequential / multi-stage problems

The sections below explain what changes between these settings. For most purposes, start with a static benchmark to understand the core workflow.

Core workflow

Every benchmark exposes three key methods. For any static benchmark:

bench = ArgmaxBenchmark()
model = generate_statistical_model(bench; seed=0)   # Flux model
maximizer = generate_maximizer(bench)               # combinatorial oracle
dataset = generate_dataset(bench, 100; seed=0)      # Vector{DataSample}

generate_statistical_model: returns an untrained neural network that maps input features x to cost parameters θ.
generate_maximizer: returns a callable (θ; context...) -> y that solves the combinatorial problem given cost parameters.
generate_dataset: returns labeled training data as a Vector{DataSample}.

At inference time these two pieces compose naturally as an end-to-end policy:

θ = model(sample.x)                  # predict cost parameters
y = maximizer(θ; sample.context...)  # solve the optimization problem

DataSample at a glance

All data in the package is represented as DataSample objects.

Field	Type	Description
`x`	any	Input features (fed to the statistical model)
`θ`	any	Intermediate cost parameters
`y`	any	Output decision / solution
`context`	`NamedTuple`	Solver kwargs spread into `maximizer(θ; sample.context...)`
`extra`	`NamedTuple`	Non-solver data (scenario, reward, step, …), never passed to the solver

Not all fields are populated in every sample, depending on the setting. For convenience, named entries inside context and extra can be accessed directly on the sample via property forwarding:

sample.instance   # looks up :instance in context first, then in extra
sample.scenario   # looks up :scenario in context first, then in extra

Benchmark type specifics

Static benchmarks

For static benchmarks (<:AbstractBenchmark), generate_dataset may compute a default ground-truth label y if the benchmark implements it:

bench = ArgmaxBenchmark()
dataset = generate_dataset(bench, 100; seed=0)   # Vector{DataSample} with x, y, context

You can override the labels by providing a target_policy:

my_policy = sample -> DataSample(; sample.context..., x=sample.x, y=my_algorithm(sample.instance))
dataset = generate_dataset(bench, 100; seed=0, target_policy=my_policy)

Stochastic benchmarks (exogenous)

For AbstractStochasticBenchmark{true} benchmarks the default call returns unlabeled samples, each sample carries one scenario in sample.extra.scenario:

bench   = StochasticVehicleSchedulingBenchmark()
dataset = generate_dataset(bench, 20; seed=0)   # y = nothing

Request multiple scenarios per instance with nb_scenarios:

dataset = generate_dataset(bench, 20; seed=0, nb_scenarios=5)
# returns 20 × 5 = 100 samples

To compute labels, wrap your algorithm as a target_policy:

anticipative = generate_anticipative_solver(bench)   # (scenario; kwargs...) -> y

policy = (sample, scenarios) -> [
    DataSample(; sample.context..., x=sample.x,
                 y=anticipative(ξ; sample.context...))
    for ξ in scenarios
]
labeled = generate_dataset(bench, 20; seed=0, nb_scenarios=5, target_policy=policy)

Dynamic benchmarks

Dynamic benchmarks use a two-step workflow:

bench = DynamicVehicleSchedulingBenchmark()

# Step 1: create environments (reusable across experiments)
envs = generate_environments(bench, 10; seed=0)

# Step 2: roll out a policy to collect training trajectories
policy = generate_baseline_policies(bench)[1]          # e.g. lazy policy
dataset = generate_dataset(bench, envs; target_policy=policy)
# dataset is a flat Vector{DataSample} of all steps across all trajectories

target_policy is required to create datasets for dynamic benchmarks (there is no default label). It must be a callable (env) -> Vector{DataSample} that performs a full episode rollout and returns the resulting trajectory.

Seed / RNG control

All generate_dataset and generate_environments calls accept either seed (creates an internal MersenneTwister) or rng for full control:

using Random
rng = MersenneTwister(42)
dataset = generate_dataset(bench, 50; rng=rng)

Evaluation

# Average relative optimality gap across a dataset
gap = compute_gap(bench, dataset, model, maximizer)

Objective value for a single decision

obj = objective_value(bench, sample, y)

Baseline policies

generate_baseline_policies returns a collection of named callables that can serve as reference points or as target_policy arguments:

policies = generate_baseline_policies(bench)
pol = policies[1]   # e.g. greedy, lazy, or anticipative policy

Static / stochastic: pol(sample) -> DataSample
Dynamic: pol(env) -> Vector{DataSample} (full episode trajectory)

For dynamic benchmarks you can evaluate a policy over multiple episodes:

rewards, samples = evaluate_policy!(pol, envs, n_episodes)

Visualization

Where implemented, benchmarks provide benchmark-specific plotting helpers:

plot_data(bench, sample)            # overview of a data sample
plot_instance(bench, instance)      # raw problem instance
plot_solution(bench, sample, y)     # overlay solution on instance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Benchmarks

What is a benchmark?

Core workflow

DataSample at a glance

Benchmark type specifics

Static benchmarks

Stochastic benchmarks (exogenous)

Dynamic benchmarks

Seed / RNG control

Evaluation

Objective value for a single decision

Baseline policies

Visualization

FilesExpand file tree

using_benchmarks.md

Latest commit

History

using_benchmarks.md

File metadata and controls

Using Benchmarks

What is a benchmark?

Core workflow

DataSample at a glance

Benchmark type specifics

Static benchmarks

Stochastic benchmarks (exogenous)

Dynamic benchmarks

Seed / RNG control

Evaluation

Objective value for a single decision

Baseline policies

Visualization