First version of VSP doc

BatyLeo · BatyLeo · commit 8d5f6c5798cf · 2026-04-01T14:56:28.000+02:00
diff --git a/docs/src/benchmarks/vsp.md b/docs/src/benchmarks/vsp.md
@@ -2,5 +2,92 @@
 
 [`StochasticVehicleSchedulingBenchmark`](@ref).
 
-!!! warning
-    Documentation for this benchmark is still under development. Please refer to the source code and API for more details.
+The Stochastic Vehicle Scheduling Problem (StoVSP) is a stochastic combinatorial optimization benchmark. The problem consists in assigning vehicles to cover a set of scheduled tasks, minimizing base operational costs while accounting for random delays that propagate along vehicle tours.
+
+## Problem Description
+
+### Overview
+
+In the **Vehicle Scheduling Problem (VSP)**, we consider a set of tasks $V$. Each task $v\in V$ has a scheduled beginning time $t_v^b$ and a scheduled end time $t_v^e$, such that $t_v^e > t_v^b$. We denote $t^{tr}_{(u, v)}$ the travel time from task $u$ to task $v$. A task $v$ can be scheduled consecutively after another task $u$ only if we can reach it in time, i.e.,
+```math
+t_v^b \geq t_u^e + t^{tr}_{(u, v)}
+```
+
+An instance of VSP can be modeled with an acyclic directed graph where nodes are tasks and edges represent feasible successions. A solution is a set of disjoint paths such that all tasks are fulfilled exactly once to minimize total costs. The constraints matrix of this deterministic version is totally unimodular, so integrity constraints can be relaxed and the problem easily solved using a standard linear programming solver.
+
+In the **Stochastic Vehicle Scheduling Problem (StoVSP)**, we consider the same setting but after the scheduling decision is set, we observe random delays which propagate along the tours of the vehicles. The objective becomes minimizing the sum of the vehicles' base operational costs and the expected total delay costs over a finite set of scenarios $s \in S$.
+
+### Mathematical Formulation
+
+The deterministic problem can be formulated as a minimum-cost network flow problem. The stochastic version introduces scenarios that add complexities to the objective function.
+
+**Variables:**
+Let $y_{u,v} \in \{0, 1\}$ be the binary decision variable indicating if a vehicle performs task $v$ immediately after task $u$. Formally, this defines the edges of the selected disjoint paths.
+
+**Delay Propagation:**
+
+For each task $v$, we denote:
+- $\gamma_v^s \in \mathbb{R}_+$: The intrinsic delay of task $v$ in scenario $s$.
+- $d_v^s \in \mathbb{R}_+$: The total delay accumulated by task $v$ in scenario $s$.
+- $\delta_{u, v}^s = t_v^b - (t_u^e + t^{tr}_{(u, v)})$: The slack time between tasks $u$ and $v$.
+
+These quantities follow the *delay propagation equation*. When $u$ and $v$ are consecutively operated by the same vehicle ($y_{u,v} = 1$), the total delay transfers with the following dynamic:
+```math
+d_v^s = \gamma_v^s + \max(d_u^s - \delta_{u, v}^s, 0)
+```
+
+This leads to a much more difficult problem to solve since the recursive max-function breaks the total unimodularity. This makes it an excellent benchmark for Decision-Focused Learning, where predicting robust base costs that account for expectation of future delays yields superior scheduling decisions.
+
+**Objective**: Find a scheduling policy (defined by $y$) that minimizes the total cost:
+```math
+\min_{y} \quad \sum_{(u,v)} c_{u,v} y_{u,v} + \mathbb{E}_{s \in S}\left[ \sum_v C_d d_v^s \right]
+```
+where $c_{u,v}$ are the deterministic transition costs and $C_d$ is the unit penalty for delays.
+
+## Key Components
+
+### [`StochasticVehicleSchedulingBenchmark`](@ref)
+
+The main benchmark configuration with the following parameters:
+
+- `nb_tasks`: Number of tasks to schedule in each instance (default: 25)
+- `nb_scenarios`: Number of scenarios to evaluate the expected delay costs (default: 10)
+
+### Instance Generation
+
+Each problem instance is generated by simulating a geographic city landscape with depots and task locations:
+- **Tasks**: Generated with realistic scheduled start and end times respecting spatial bounds.
+- **Scenarios**: Random intrinsic delays $\gamma$ drawn from probability distributions (e.g. Log-Normal).
+- **Features**: A 20-dimensional feature vector ($d=20$) describing the tasks and network properties (spatial coordinates, start times, route density, etc.).
+
+## Benchmark Policies
+
+The benchmark provides the following baseline policies:
+
+### Deterministic Policy
+[`svs_deterministic_policy`](@ref) solves the deterministic version of the VSP using a Mixed Integer Programming (MIP) solver. It completely ignores scenario delays and slack capacities.
+
+### Sample Average Approximation (SAA)
+This approach builds a stochastic instance using a finite set of $K$ available scenarios and minimizes the empirical expected cost. Two formulations are provided:
+- **SAA (col gen)** ([`svs_saa_policy`](@ref)): Solves the stochastic MIP using a column generation algorithm.
+- **SAA (exact MIP)** ([`svs_saa_mip_policy`](@ref)): Solves the exact stochastic MIP via a compact linearized formulation.
+
+### Local Search Policy
+[`svs_local_search_policy`](@ref) begins with a heuristic initialization (usually deterministic) and iteratively explores neighboring schedules, accepting moves that improve the expected cost over the sampled scenarios.
+
+## Decision-Focused Learning Policy
+
+```math
+\xrightarrow[\text{Features}]{x_t \in \mathbb{R}^{20}}
+\fbox{Neural network $\varphi_w$}
+\xrightarrow[\text{Predicted Cost}]{\hat{c}}
+\fbox{Deterministic VSP Solver}
+\xrightarrow[\text{Paths}]{y_t}
+```
+
+**Components**:
+
+1. **Neural Network** ``\varphi_w``: A linear model (mapping 20-dimensional features to 1 scalar) predicting an adjusted edge cost ``\hat{c}_{u,v}`` for each possible assignment.
+2. **Optimization Layer (Maximizer)**: A deterministic mathematical programming solver `StochasticVechicleSchedulingMaximizer` that takes the predicted costs $\hat{c}$ and solves the easily tractable deterministic VSP to map back to a routing decision $y_t$.
+
+By training the neural network end-to-end with the combinatorial solver, the Decision-Focused Learning agent learns to produce adjusted costs $\hat{c}$ that serve as proxies, implicitly hedging against the actual stochastic delays while retaining the rapid evaluation of the deterministic solver.