Skip to content

Latest commit

 

History

History
37 lines (24 loc) · 1.89 KB

File metadata and controls

37 lines (24 loc) · 1.89 KB

Contextual Stochastic Argmax

ContextualStochasticArgmaxBenchmark is a minimalist contextual stochastic optimization benchmark problem.

The decision maker selects one item out of n. Item values are uncertain at decision time: they depend on a base utility plus a context-correlated perturbation revealed only after the decision is made. An observable context vector, correlated with the perturbation via a fixed linear map W, allows the learner to anticipate the perturbation and pick the right item.

Problem Formulation

Instance: c_{\text{base}} \sim \mathcal{U}[0,1]^n, base values for n items.

Context: x_{\text{raw}} \sim \mathcal{N}(0, I_d), a d-dimensional signal correlated with item values. The feature vector passed to the model is x = [c_{\text{base}};\, x_{\text{raw}}] \in \mathbb{R}^{n+d}.

Scenario: the realized item values are

$$\xi = c_{\text{base}} + W x_{\text{raw}} + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_n)$$

where W \in \mathbb{R}^{n \times d} is a fixed matrix unknown to the learner.

Decision: y \in \{e_1, \ldots, e_n\} (one-hot vector selecting one item).

Policies

DFL Policy

$$\xrightarrow[\text{Features}]{x} \fbox{Neural network \$\varphi_w\$} \xrightarrow[\text{Predicted values}]{\hat{\theta}} \fbox{\texttt{one\_hot\_argmax}} \xrightarrow[\text{Decision}]{y}$$

The neural network predicts item values \hat{\theta} \in \mathbb{R}^n from the feature vector x \in \mathbb{R}^{n+d}. The default architecture is Dense(n+d => n; bias=false), which can exactly recover the optimal linear predictor [I_n \mid W], so a well-trained model should reach near-zero gap.

SAA Policy

y_{\text{SAA}} = \operatorname{argmax}\bigl(\frac{1}{S}\sum_s \xi^{(s)}\bigr) — the exact SAA-optimal decision for linear argmax, accessible via generate_baseline_policies(bench).saa.