Skip to content

Commit 9acdd14

Browse files
committed
update documentation
1 parent 9443ac7 commit 9acdd14

File tree

1 file changed

+37
-0
lines changed

1 file changed

+37
-0
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Contextual Stochastic Argmax
2+
3+
[`ContextualStochasticArgmaxBenchmark`](@ref) is a minimalist contextual stochastic optimization benchmark problem.
4+
5+
The decision maker selects one item out of ``n``. Item values are uncertain at decision time: they depend on a base utility plus a context-correlated perturbation revealed only after the decision is made. An observable context vector, correlated with the perturbation via a fixed linear map ``W``, allows the learner to anticipate the perturbation and pick the right item.
6+
7+
## Problem Formulation
8+
9+
**Instance**: ``c_{\text{base}} \sim \mathcal{U}[0,1]^n``, base values for ``n`` items.
10+
11+
**Context**: ``x_{\text{raw}} \sim \mathcal{N}(0, I_d)``, a ``d``-dimensional signal correlated with item values. The feature vector passed to the model is ``x = [c_{\text{base}};\, x_{\text{raw}}] \in \mathbb{R}^{n+d}``.
12+
13+
**Scenario**: the realized item values are
14+
```math
15+
\xi = c_{\text{base}} + W x_{\text{raw}} + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_n)
16+
```
17+
where ``W \in \mathbb{R}^{n \times d}`` is a fixed matrix unknown to the learner.
18+
19+
**Decision**: ``y \in \{e_1, \ldots, e_n\}`` (one-hot vector selecting one item).
20+
21+
## Policies
22+
23+
### DFL Policy
24+
25+
```math
26+
\xrightarrow[\text{Features}]{x}
27+
\fbox{Neural network $\varphi_w$}
28+
\xrightarrow[\text{Predicted values}]{\hat{\theta}}
29+
\fbox{\texttt{one\_hot\_argmax}}
30+
\xrightarrow[\text{Decision}]{y}
31+
```
32+
33+
The neural network predicts item values ``\hat{\theta} \in \mathbb{R}^n`` from the feature vector ``x \in \mathbb{R}^{n+d}``. The default architecture is `Dense(n+d => n; bias=false)`, which can exactly recover the optimal linear predictor ``[I_n \mid W]``, so a well-trained model should reach near-zero gap.
34+
35+
### SAA Policy
36+
37+
``y_{\text{SAA}} = \operatorname{argmax}\bigl(\frac{1}{S}\sum_s \xi^{(s)}\bigr)`` — the exact SAA-optimal decision for linear argmax, accessible via `generate_baseline_policies(bench).saa`.

0 commit comments

Comments
 (0)