diff --git a/docs/src/benchmarks/dvsp.md b/docs/src/benchmarks/dvsp.md index f76d443..2282597 100644 --- a/docs/src/benchmarks/dvsp.md +++ b/docs/src/benchmarks/dvsp.md @@ -6,7 +6,7 @@ The Dynamic Vehicle Scheduling Problem (DVSP) is a sequential decision-making pr ### Overview -In the dynamic vehicle scheduling problem, a fleet operator must decide at each time step which customer requests to serve immediately and which to postpone to future time steps. +In the dynamic vehicle scheduling problem, a fleet operator must decide at each time step which customer to serve immediately and which to postpone to future time steps. The goal is to serve all customers by the end of the planning horizon while minimizing total travel time. This is a simplified version of the more complex Dynamic Vehicle Routing Problem with Time Windows (DVRPTW), focusing on the core sequential decision-making aspects without capacity or time window constraints. @@ -24,18 +24,18 @@ The dynamic vehicle scheduling problem can be formulated as a finite-horizon Mar s_t = (R_t, D_t, t) ``` where: -- ``R_t`` are the pending customer requests (not yet served), where each request ``r_i \in R_t`` contains: +- ``R_t`` are the pending customer (not yet served), where each customer ``r_i \in R_t`` contains: - ``x_i, y_i``: 2d spatial coordinates of the customer location - ``\tau_i``: start time when the customer needs to be served - ``s_i``: service time required to serve the customer -- ``D_t`` indicates which requests must be dispatched this time step (i.e. that cannot be postponed further, otherwise they will be infeasible at the next time step because of their start time) +- ``D_t`` indicates which customers must be dispatched this time step (i.e. that cannot be postponed further, otherwise they will be infeasible at the next time step because of their start time) - ``t \in \{1, 2, \ldots, T\}`` is the current time step The state also implicitly includes (constant over time): - Travel duration matrix ``d_{ij}``: time to travel from location ``i`` to location ``j`` - Depot location -**Action Space** ``\mathcal{A}``: The action at time step ``t`` is a set of vehicle routes: +**Action Space** ``\mathcal{A}(s_t)``: The action at time step ``t`` is a set of vehicle routes: ```math a_t = \{r_1, r_2, \ldots, r_k\} ``` @@ -47,7 +47,7 @@ A route is feasible if: **Transition Dynamics** ``\mathcal{P}(s_{t+1} | s_t, a_t)``: After executing routes ``a_t``: -1. **Remove served customers** from the pending request set +1. **Remove served customers** from the pending customer set 2. **Generate new customer arrivals** according to the underlying exogenous distribution 3. **Update must-dispatch set** based on postponement rules @@ -70,7 +70,7 @@ where ``d_{ij}`` is the travel duration from location ``i`` to location ``j``, a The main benchmark configuration with the following parameters: -- `max_requests_per_epoch`: Maximum number of new customer requests per time step (default: 10) +- `max_requests_per_epoch`: Maximum number of new customers per time step (default: 10) - `Δ_dispatch`: Time delay between decision and vehicle dispatch (default: 1.0) - `epoch_duration`: Duration of each decision time step (default: 1.0) - `two_dimensional_features`: Whether to use simplified 2D features instead of full feature set (default: false) @@ -82,51 +82,64 @@ Problem instances are generated from static vehicle routing datasets and include - **Customer locations**: Spatial coordinates for pickup/delivery points - **Depot location**: Central starting and ending point for all routes - **Travel times**: Distance/duration matrix between all location pairs -- **Service requirements**: Time needed to serve each customer +- **Service times**: Service time each customer -The dynamic version samples new customer arrivals from the static instance, drawing new customers by independently sampling their locations and service times. +The dynamic version samples new customer arrivals from the static instance, drawing new customers by independently sampling: +- their locations from the set of static customer locations +- service times, uniformly from the range of service times in the static instance ### Features -The benchmark provides two feature representations: - -**Full Features** (14-dimensional): -- Start times for postponable requests -- End times (start + service time) -- Travel time from depot to request -- Travel time from request to depot -- Slack time until next time step -- Quantile-based travel times to other requests (9 quantiles) +The benchmark provides two feature matrix representations, containing one column per postponable customer in the state: + +**Full Features** (27-dimensional): +- Start times for postponable customers (1) +- End times (start + service time) (2) +- Travel time from depot to customer (3) +- Travel time from customer to depot (4) +- Slack time until next time step (5) +- % of must-dispatch customers that can reach this customer on time (6) +- % of customers reachable from this customer on time (7) +- % of customers that can reach this customer on time (8) +- % of customers reachable or that can reach this customer on time (9) +- Quantile-based travel times to other customers (9 quantiles) (10-18) +- Quantiles of % of reachable new customers (9 quantiles) (19-27) **2D Features** (simplified): -- Travel time from depot to request -- Mean travel time to other requests +- Travel time from depot to customer (1) +- Mean travel time to other customers (2) ## Benchmark Policies ### Lazy Policy -The lazy policy postpones all possible requests, serving only those that must be dispatched. +The lazy policy postpones all possible customers, serving only those that must be dispatched. ### Greedy Policy -The greedy policy serves all pending requests as soon as they arrive, without considering future consequences. +The greedy policy serves all pending customers as soon as they arrive, without considering future consequences. ## Decision-Focused Learning Policy ```math \xrightarrow[\text{State}]{s_t} \fbox{Neural network $\varphi_w$} -\xrightarrow[\text{Priorities}]{\theta} +\xrightarrow[\text{Prizes}]{\theta} \fbox{Prize-collecting VSP} \xrightarrow[\text{Routes}]{a_t} ``` **Components**: -1. **Neural Network** ``\varphi_w``: Takes current state features as input and predicts customer priorities ``\theta = (\theta_1, \ldots, \theta_n)`` -2. **Optimization Layer**: Solves the prize-collecting vehicle scheduling problem to determine optimal routes given the predicted priorities +1. **Neural Network** ``\varphi_w``: Takes current state features as input and predicts customer prizes ``\theta = (\theta_1, \ldots, \theta_n)``, one value per postponable customer. +2. **Optimization Layer**: Solves the prize-collecting vehicle scheduling problem to determine optimal routes given the predicted prizes, by maximizing total collected prizes minus travel costs: + ```math + \max_{a_t\in \mathcal{A}(s_t)} \sum_{r \in a_t} \left( \sum_{i \in r} \theta_i - \sum_{(i,j) \in r} d_{ij} \right) + ``` + This can be modeled as a flow linear program on a directed acyclic graph (DAG) and is solved using standard LP solvers. The neural network architecture adapts to the feature dimensionality: -- **2D features**: `Dense(2 => 1)` followed by vectorization -- **Full features**: `Dense(14 => 1)` followed by vectorization +- **2D features**: `Dense(2 => 1)`, applied in parallel to each postponable customer +- **Full features**: `Dense(27 => 1)` applied in parallel to each postponable customer + +**Note:** one can also use more complex architectures such as a deeper MLP or a graph neural network for better performance. diff --git a/src/DynamicVehicleScheduling/DynamicVehicleScheduling.jl b/src/DynamicVehicleScheduling/DynamicVehicleScheduling.jl index 7e8c23e..06fd54c 100644 --- a/src/DynamicVehicleScheduling/DynamicVehicleScheduling.jl +++ b/src/DynamicVehicleScheduling/DynamicVehicleScheduling.jl @@ -57,6 +57,13 @@ $TYPEDFIELDS two_dimensional_features::Bool = false end +""" +$TYPEDSIGNATURES + +Generate a dataset for the dynamic vehicle scheduling benchmark. +Returns a vector of [`DataSample`](@ref) objects, each containing an [`Instance`](@ref). +The dataset is generated from pre-existing DVRPTW files. +""" function Utils.generate_dataset(b::DynamicVehicleSchedulingBenchmark, dataset_size::Int=1) (; max_requests_per_epoch, Δ_dispatch, epoch_duration, two_dimensional_features) = b files = readdir(datadep"dvrptw"; join=true) @@ -74,6 +81,12 @@ function Utils.generate_dataset(b::DynamicVehicleSchedulingBenchmark, dataset_si ] end +""" +$TYPEDSIGNATURES + +Creates an environment from an [`Instance`](@ref) of the dynamic vehicle scheduling benchmark. +The seed of the environment is randomly generated using the provided random number generator. +""" function Utils.generate_environment( ::DynamicVehicleSchedulingBenchmark, instance::Instance, rng::AbstractRNG; kwargs... ) @@ -81,14 +94,32 @@ function Utils.generate_environment( return DVSPEnv(instance; seed) end +""" +$TYPEDSIGNATURES + +Returns a linear maximizer for the dynamic vehicle scheduling benchmark, of the form: +θ ↦ argmax_{y} θᵀg(y) + h(y) +""" function Utils.generate_maximizer(::DynamicVehicleSchedulingBenchmark) return LinearMaximizer(oracle; g, h) end +""" +$TYPEDSIGNATURES + +Generate a scenario for the dynamic vehicle scheduling benchmark. +This is a wrapper around the generic scenario generation function. +""" function Utils.generate_scenario(b::DynamicVehicleSchedulingBenchmark, args...; kwargs...) return Utils.generate_scenario(args...; kwargs...) end +""" +$TYPEDSIGNATURES + +Generate an anticipative solution for the dynamic vehicle scheduling benchmark. +The solution is computed using the anticipative solver with the benchmark's feature configuration. +""" function Utils.generate_anticipative_solution( b::DynamicVehicleSchedulingBenchmark, args...; kwargs... ) @@ -97,6 +128,14 @@ function Utils.generate_anticipative_solution( ) end +""" +$TYPEDSIGNATURES + +Generate baseline policies for the dynamic vehicle scheduling benchmark. +Returns a tuple containing: +- `lazy`: A policy that dispatches vehicles only when they are ready +- `greedy`: A policy that dispatches vehicles to the nearest customer +""" function Utils.generate_policies(b::DynamicVehicleSchedulingBenchmark) lazy = Policy( "Lazy", @@ -111,11 +150,18 @@ function Utils.generate_policies(b::DynamicVehicleSchedulingBenchmark) return (lazy, greedy) end +""" +$TYPEDSIGNATURES + +Generate a statistical model for the dynamic vehicle scheduling benchmark. +The model is a simple linear chain with a single dense layer that maps features to a scalar output. +The input dimension depends on whether two-dimensional features are used (2 features) or not (27 features). +""" function Utils.generate_statistical_model( b::DynamicVehicleSchedulingBenchmark; seed=nothing ) Random.seed!(seed) - return Chain(Dense((b.two_dimensional_features ? 2 : 14) => 1), vec) + return Chain(Dense((b.two_dimensional_features ? 2 : 27) => 1), vec) end export DynamicVehicleSchedulingBenchmark diff --git a/src/DynamicVehicleScheduling/features.jl b/src/DynamicVehicleScheduling/features.jl index 10e0ab8..3ce8657 100644 --- a/src/DynamicVehicleScheduling/features.jl +++ b/src/DynamicVehicleScheduling/features.jl @@ -1,3 +1,157 @@ +function must_dispatch_in_zone(state::DVSPState) + (; state_instance, is_must_dispatch) = state + + startTimes = state_instance.start_time + serviceTimes = state_instance.service_time + durations = state_instance.duration + + n = length(startTimes) + must_dispatch_counts = zeros(n) + + # For each customer j + for j in 1:n + # Count how many must-dispatch customers i can reach j + for i in 2:n + if is_must_dispatch[i] && i != j + # Check if customer i can reach customer j in time + if startTimes[i] + serviceTimes[i] + durations[i, j] < startTimes[j] + must_dispatch_counts[j] += 1 + end + end + end + end + + return must_dispatch_counts +end + +function count_reachable_from(state::DVSPState) + (; state_instance) = state + + startTimes = state_instance.start_time + serviceTimes = state_instance.service_time + durations = state_instance.duration + + n = length(startTimes) + reachable_counts = zeros(n) + + # For each customer j + for j in 1:n + # Count how many customers i are reachable from j + for i in 2:n + if i != j + # Check if customer i can reach customer j in time + if startTimes[j] + serviceTimes[j] + durations[j, i] < startTimes[i] + reachable_counts[j] += 1 + end + end + end + end + + return reachable_counts +end + +function count_reachable_to(state::DVSPState) + (; state_instance) = state + + startTimes = state_instance.start_time + serviceTimes = state_instance.service_time + durations = state_instance.duration + + n = length(startTimes) + reachable_counts = zeros(n) + + # For each customer j + for j in 1:n + # Count how many customers i can reach j + for i in 2:n + if i != j + # Check if customer i can reach customer j in time + if startTimes[i] + serviceTimes[i] + durations[i, j] < startTimes[j] + reachable_counts[j] += 1 + end + end + end + end + + return reachable_counts +end + +function quantile_reachable_new_requests( + state::DVSPState, + instance::Instance; + n_samples::Int=100, + quantiles=[i * 0.1 for i in 1:9], +) + (; state_instance, current_epoch) = state + (; static_instance, epoch_duration, Δ_dispatch, max_requests_per_epoch) = instance + + startTimes = state_instance.start_time + serviceTimes = state_instance.service_time + durations = state_instance.duration + n_current = length(startTimes) + + # Time window for next epoch + next_time = epoch_duration * current_epoch + Δ_dispatch + min_time = minimum(static_instance.start_time) + max_time = maximum(static_instance.start_time) + N = customer_count(static_instance) + + # Store reachability percentages for each customer across samples + reachability_matrix = zeros(Float64, n_current, n_samples) + + rng = MersenneTwister(42) + for s in 1:n_samples + # Sample new requests similar to scenario generation + coordinate_indices = sample_indices(rng, max_requests_per_epoch, N) + sampled_start_times = sample_times( + rng, max_requests_per_epoch, max(min_time, next_time), max_time + ) + service_time_indices = sample_indices(rng, max_requests_per_epoch, N) + + # Check feasibility (can reach from depot) + depot_durations = static_instance.duration[1, coordinate_indices] + is_feasible = next_time .+ depot_durations .<= sampled_start_times + + feasible_coords = coordinate_indices[is_feasible] + feasible_start_times = sampled_start_times[is_feasible] + feasible_service_times = static_instance.service_time[service_time_indices[is_feasible]] + + n_new = length(feasible_coords) + if n_new == 0 + continue # No reachable requests in this sample + end + + # For each current customer, count how many new requests it can reach + for j in 1:n_current + reachable_count = 0 + for k in 1:n_new + # Get duration from current customer location to new request location + customer_loc = state.location_indices[j] + new_loc = feasible_coords[k] + travel_time = static_instance.duration[customer_loc, new_loc] + travel_time_back = static_instance.duration[new_loc, customer_loc] + + # Check if customer j can reach new request k or if k can reach j + if startTimes[j] + serviceTimes[j] + travel_time < + feasible_start_times[k] || + startTimes[j] > + feasible_start_times[k] + feasible_service_times[k] + travel_time_back + reachable_count += 1 + end + end + reachability_matrix[j, s] = reachable_count / n_new + end + end + + # Compute quantiles for each customer + quantile_features = zeros(Float64, n_current, length(quantiles)) + for j in 1:n_current + quantile_features[j, :] = quantile(reachability_matrix[j, :], quantiles) + end + + return quantile_features +end + function get_features_quantileTimeToRequests(state::DVSPState, instance::Instance) quantiles = [i * 0.1 for i in 1:9] a = instance.static_instance.duration[state.location_indices, 2:end] @@ -6,7 +160,7 @@ function get_features_quantileTimeToRequests(state::DVSPState, instance::Instanc end function compute_model_free_features(state::DVSPState, instance::Instance) - (; state_instance, is_postponable) = state + (; state_instance, is_postponable, is_must_dispatch) = state startTimes = state_instance.start_time endTimes = startTimes .+ state_instance.service_time @@ -15,19 +169,34 @@ function compute_model_free_features(state::DVSPState, instance::Instance) slack_next_epoch = startTimes .- instance.epoch_duration + must_dispatch_counts = must_dispatch_in_zone(state) + nb_must_dispatch = sum(is_must_dispatch) + if nb_must_dispatch > 0 + must_dispatch_counts ./= nb_must_dispatch + end + + reachable_to_ratios = count_reachable_to(state) ./ (length(startTimes) - 1) + reachable_from_ratios = count_reachable_from(state) ./ (length(startTimes) - 1) + reachable_ratios = reachable_to_ratios .+ reachable_from_ratios + model_free_features = hcat( startTimes[is_postponable], # 1 endTimes[is_postponable], # 2 timeDepotRequest[is_postponable], # 3 timeRequestDepot[is_postponable], # 4 - slack_next_epoch[is_postponable], # 5-14 + slack_next_epoch[is_postponable], # 5 + must_dispatch_counts[is_postponable], # 6 + reachable_to_ratios[is_postponable], # 7 + reachable_from_ratios[is_postponable], # 8 + reachable_ratios[is_postponable], # 9 ) return model_free_features end function compute_model_aware_features(state::DVSPState, instance::Instance) quantileTimeToRequests = get_features_quantileTimeToRequests(state, instance) - model_aware_features = quantileTimeToRequests + quantileReachableNewRequests = quantile_reachable_new_requests(state, instance) + model_aware_features = hcat(quantileTimeToRequests, quantileReachableNewRequests) return model_aware_features[state.is_postponable, :] end diff --git a/src/DynamicVehicleScheduling/scenario.jl b/src/DynamicVehicleScheduling/scenario.jl index 9059477..4f7746e 100644 --- a/src/DynamicVehicleScheduling/scenario.jl +++ b/src/DynamicVehicleScheduling/scenario.jl @@ -29,19 +29,24 @@ function Utils.generate_scenario( new_service_time = Vector{Float64}[] new_start_time = Vector{Float64}[] + min_time, max_time = extrema(start_time) for epoch in 1:last_epoch time = epoch_duration * (epoch - 1) + Δ_dispatch coordinate_indices = sample_indices(rng, max_requests_per_epoch, N) - start_time_indices = sample_indices(rng, max_requests_per_epoch, N) + # sampled_start_time = sample_times(rng, max_requests_per_epoch, min_time, max_time) + sampled_start_time = sample_times( + rng, max_requests_per_epoch, max(min_time, time), max_time + ) + # start_time_indices = sample_indices(rng, max_requests_per_epoch, N) service_time_indices = sample_indices(rng, max_requests_per_epoch, N) - is_feasible = - time .+ duration[depot, coordinate_indices] .<= start_time[start_time_indices] + is_feasible = time .+ duration[depot, coordinate_indices] .<= sampled_start_time # start_time[start_time_indices] push!(new_indices, coordinate_indices[is_feasible]) push!(new_service_time, service_time[service_time_indices[is_feasible]]) - push!(new_start_time, start_time[start_time_indices[is_feasible]]) + push!(new_start_time, sampled_start_time[is_feasible]) + # push!(new_start_time, start_time[start_time_indices[is_feasible]]) end return Scenario(new_indices, new_service_time, new_start_time) end diff --git a/src/DynamicVehicleScheduling/static_vsp/parsing.jl b/src/DynamicVehicleScheduling/static_vsp/parsing.jl index 7bd7f92..8cc67be 100644 --- a/src/DynamicVehicleScheduling/static_vsp/parsing.jl +++ b/src/DynamicVehicleScheduling/static_vsp/parsing.jl @@ -7,8 +7,8 @@ It uses time window values to compute task times as the middle of the interval. Round all values to `Int` if `rounded=true`. Normalize all time values by the `normalization` parameter. """ -function read_vsp_instance(filepath::String; rounded::Bool=false, normalization=3600.0) - type = rounded ? Int : Float64 +function read_vsp_instance(filepath::String; normalization=3600.0, digits=2) + type = Float64 #rounded ? Int : Float64 mode = "" local edge_weight_type local edge_weight_format @@ -84,12 +84,17 @@ function read_vsp_instance(filepath::String; rounded::Bool=false, normalization= duration = mapreduce(permutedims, vcat, duration_matrix) coordinate = [ - Point(x / normalization, y / normalization) for + Point(round(x / normalization; digits), round(y / normalization; digits)) for (x, y) in zip(coordinates[:, 1], coordinates[:, 2]) ] service_time ./= normalization start_time ./= normalization duration ./= normalization - return StaticInstance(; coordinate, service_time, start_time, duration) + return StaticInstance(; + coordinate, + service_time=round.(service_time; digits), + start_time=round.(start_time; digits), + duration=round.(duration; digits), + ) end diff --git a/src/DynamicVehicleScheduling/utils.jl b/src/DynamicVehicleScheduling/utils.jl index bd1dfe8..1d9317c 100644 --- a/src/DynamicVehicleScheduling/utils.jl +++ b/src/DynamicVehicleScheduling/utils.jl @@ -8,6 +8,15 @@ sample_indices(rng::AbstractRNG, k, N) = randperm(rng, N)[1:k] .+ 1 """ $TYPEDSIGNATURES +Sample k random time values between min_time and max_time. +""" +function sample_times(rng::AbstractRNG, k, min_time, max_time; digits=2) + return round.(min_time .+ (max_time - min_time) .* rand(rng, k); digits=digits) +end + +""" +$TYPEDSIGNATURES + Compute the total cost of a set of routes given a distance matrix, i.e. the sum of the distances between each location in the route. Note that the first location is implicitly assumed to be the depot, and should not appear in the route. """ diff --git a/src/Utils/policy.jl b/src/Utils/policy.jl index 81d79f3..537335f 100644 --- a/src/Utils/policy.jl +++ b/src/Utils/policy.jl @@ -64,7 +64,7 @@ By default, the environment is reset before running the policy. function evaluate_policy!( policy, env::AbstractEnvironment, episodes::Int; seed=get_seed(env), kwargs... ) - total_reward = 0.0 + rewards = zeros(Float64, episodes) datasets = map(1:episodes) do _i if _i == 1 reset!(env; reset_rng=true, seed=seed) @@ -72,10 +72,10 @@ function evaluate_policy!( reset!(env; reset_rng=false) end reward, dataset = evaluate_policy!(policy, env; reset_env=false, kwargs...) - total_reward += reward + rewards[_i] = reward return dataset end - return total_reward / episodes, vcat(datasets...) + return rewards, datasets end """ @@ -88,11 +88,12 @@ function evaluate_policy!( policy, envs::Vector{<:AbstractEnvironment}, episodes::Int=1; kwargs... ) E = length(envs) - rewards = zeros(Float64, E) + avg_rewards = zeros(Float64, E) datasets = map(1:E) do e - reward, dataset = evaluate_policy!(policy, envs[e], episodes; kwargs...) - rewards[e] = reward + rewards, datasets = evaluate_policy!(policy, envs[e], episodes; kwargs...) + avg_rewards[e] = sum(rewards) / episodes + dataset = vcat(datasets...) return dataset end - return rewards, vcat(datasets...) + return avg_rewards, vcat(datasets...) end diff --git a/test/dynamic_vsp.jl b/test/dynamic_vsp.jl index 131ca89..ff01218 100644 --- a/test/dynamic_vsp.jl +++ b/test/dynamic_vsp.jl @@ -45,7 +45,7 @@ θ2 = model2(x2) y2 = maximizer(θ2; instance=instance2) @test size(x, 1) == 2 - @test size(x2, 1) == 14 + @test size(x2, 1) == 27 anticipative_value, solution = generate_anticipative_solution(b, env; reset_env=true) reset!(env; reset_rng=true)