You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
Fixes a server-side memory leak in the webapp's SSE helper. Every
aborted SSE connection (client tab close, navigation, timeout) was
pinning its full request/response graph indefinitely on Node 20, so any
long-running webapp process accumulated retained memory proportional to
streaming-request churn.
## Root cause
`apps/webapp/app/utils/sse.ts` combined four abort signals via
`AbortSignal.any([requestAbortSignal, timeoutSignal,
internalController.signal])`. The composite signal tracks its source
signals in an internal `Set<WeakRef>` registered against a
`FinalizationRegistry`; under sustained traffic those entries accumulate
faster than they're cleaned up, pinning every source signal (and its
listeners, and anything those listeners close over) until the parent
signal itself is GC'd or aborts.
This is a long-standing Node issue with multiple open reports:
- [nodejs/node#54614](nodejs/node#54614) —
original report, still open. A [follow-up from
ChainSafe](nodejs/node#54614 (comment))
describes the exact same shape in a Lodestar production workload (req +
timeout signals composed per request accumulating in long-running
worker) and the same mitigation: drop `AbortSignal.any`, compose
manually.
- [nodejs/node#55351](nodejs/node#55351) —
mechanism confirmed by Node member @jasnell: *"the set of dependent
signals known to the AbortSignal are kept in an internal Set using
WeakRefs. The AbortSignals are being properly gc'd but the Set is never
cleaned out of the WeakRefs making those leak."* Partially fixed by [PR
#55354](nodejs/node#55354), shipped in Node
22.12.0 — but only covers the tight-loop case, not long-lived parent
signals.
- [nodejs/node#57584](nodejs/node#57584) —
circular-dependency variant, still open.
- [nodejs/node#62363](nodejs/node#62363) —
regression in Node 24/25 from an unrelated V8 change ("Don't pretenure
WeakCells"). Different root cause, same symptom.
A separate issue in `apps/webapp/app/entry.server.tsx` —
`setTimeout(abort, ABORT_DELAY)` with no `clearTimeout` on success paths
— kept the React render tree + `remixContext` alive for 30s per
successful HTML request. Same pattern fixed upstream in React Router
templates
([react-router#14200](remix-run/react-router#14200)),
never backported to Remix v2.
## What changed
- **`apps/webapp/app/utils/sse.ts`** — single-signal abort chain.
`AbortSignal.any` removed; `AbortSignal.timeout` replaced by a plain
`setTimeout` cleared when the controller aborts; named sentinel
constants used as stackless abort reasons; request-abort handler
explicitly removed on cleanup.
- **`apps/webapp/app/entry.server.tsx`** — clears the `setTimeout(abort,
ABORT_DELAY)` timer in `onShellReady` / `onAllReady` / `onShellError`.
- **`apps/webapp/app/v3/tracer.server.ts` + `env.server.ts`** — gates
OpenTelemetry `HttpInstrumentation` and `ExpressInstrumentation` behind
`DISABLE_HTTP_INSTRUMENTATION=true` as an escape hatch for future
OTel-listener retention patterns. Defaults to enabled.
- **`apps/webapp/app/presenters/v3/RunStreamPresenter.server.ts`** —
uses the shared `ABORT_REASON_SEND_ERROR` sentinel.
## Verification
### Full-app reproduction (memlab)
Isolated local harness, 500 abrupt SSE disconnects against a
dev-presence route, GC between passes, heap snapshot diff with
[memlab](https://facebook.github.io/memlab/):
| Run | Heap delta after 500 conns + GC | memlab retained leaks |
| --- | --- | --- |
| Before | +16.0 MB (linear with request count) | 158 clusters; 250
`ServerResponse`, 1000 `AbortController`, 250 `SpanImpl` retained |
| After | **+3.3 MB (noise)** | **0 app-code leaks** |
### Standalone mechanism isolation
To confirm *which* axis of the change is load-bearing, a separate
standalone Node script (`/tmp/abort-leak-test.mjs`) ran 2000 requests ×
200 KB payload per variant:
| Variant | Heap delta after GC |
| --- | --- |
| baseline (no signal machinery) | 0 MB |
| V1: `AbortSignal.any` + string abort reason | **+9.1 MB** |
| V2: `AbortSignal.any` only (no reason) | **+10.8 MB** |
| V3: string reason only (no `AbortSignal.any`) | 0 MB |
| V4: neither (the fix) | 0 MB |
| V5: `AbortSignal.any` with no listener on the composite | **+10.2 MB**
|
This proves `AbortSignal.any` is the sole mechanism. The reason type
(`.abort()` vs `.abort("string")`) is irrelevant for retention — V3 is
clean, V5 leaks even without a listener on the composite.
## Risk
- `sse.ts` is used by the dev-presence routes. Behaviour is equivalent —
timeouts and client disconnects still abort the stream. `signal.reason`
is now a named string sentinel (`"timeout"`, `"request_aborted"`, etc.)
instead of the previous string arg or default `AbortError`. No in-tree
reader of `signal.reason` exists.
- `entry.server.tsx` change is a standard cleanup of an abort timer,
matches upstream React Router guidance.
- `tracer.server.ts` change is env-gated and defaults to current
behaviour.
- Three other webapp `AbortSignal.timeout()` callsites (alert delivery,
remote-build status) are fire-and-forget passed directly to `fetch` —
not composed with anything long-lived, no retention risk, untouched.
## Test plan
- [ ] Existing SSE integration tests pass
- [ ] Dev-presence SSE behaves normally across tab open/close cycles
- [ ] No heap growth under sustained aborted-connection traffic (heap
snapshot diff)
## Follow-up
The same `AbortSignal.any([userSignal, internalSignal])` pattern exists
in several SDK/core callsites that ship to customers
(`packages/core/src/v3/realtimeStreams/manager.ts`,
`packages/trigger-sdk/src/v3/{ai,chat,chat-client,sessions}.ts`,
`packages/core/src/v3/workers/warmStartClient.ts`). Whether those leak
in practice depends on the user passing a long-lived signal. Tracked
separately.
Fix memory leak where every aborted SSE connection pinned the full request/response graph on Node 20, caused by `AbortSignal.any()` in `sse.ts` retaining its source signals indefinitely (see nodejs/node#54614, nodejs/node#55351). Also clear the `setTimeout(abort)` timer in `entry.server.tsx` so successful HTML renders don't pin the React tree for 30s per request.
0 commit comments