Commit fc71e7d
fix: handle fast-completion race in batch streaming seal check (#3427)
## Problem
When `batchTrigger()` is called with large payloads, each item's payload
is uploaded to R2 server-side during the streaming loop before being
enqueued. This makes the loop slow — around 3 seconds per item. Workers
pick up and execute each item as it's enqueued, running concurrently
with the ongoing stream.
For the last item in the batch, a race exists between the streaming loop
finishing and the batch completion cleanup:
1. The loop enqueues the last item and returns from `enqueueBatchItem()`
2. A waiting worker picks up the item almost instantly and executes it
3. `recordSuccess()` fires, `processedCount` hits the expected total,
`finalizeBatch()` runs
4. `cleanup()` deletes all Redis keys for the batch, including
`enqueuedItemsKey`
5. The streaming loop exits and calls `getBatchEnqueuedCount()` — reads
the now-deleted key — returns 0
The count check finds `enqueuedCount (0) !== batch.runCount`, falls
through to a Postgres fallback, but the fallback only checked `sealed`.
The BatchQueue completion path sets `status = COMPLETED` in Postgres
without setting `sealed = true` (that's the streaming endpoint's job),
so the fallback misses it too.
This causes the endpoint to return `sealed: false`. The SDK treats this
as retryable and retries up to 5 times with exponential backoff. Each
retry calls `enqueueBatchItem()`, which reads the batch meta key from
Redis — also deleted by `cleanup()` — and throws "Batch not found or not
initialized" (500). The final retry gets a 422 because the batch is
already COMPLETED, which the SDK does not retry, causing an `ApiError`
to be thrown from `await batchTrigger()` in the parent run — even though
all child runs completed successfully.
## Fix
In the Postgres fallback inside `StreamBatchItemsService`, also check
`status === "COMPLETED"` alongside `sealed`. This covers the
fast-completion path where the BatchQueue finishes all runs before the
streaming endpoint gets to seal the batch normally.
Also switches `findUnique` to `findFirst` per webapp convention.
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>1 parent 8eb596f commit fc71e7d
2 files changed
Lines changed: 151 additions & 10 deletions
File tree
- apps/webapp
- app/runEngine/services
- test/engine
Lines changed: 27 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
219 | 222 | | |
220 | 223 | | |
221 | 224 | | |
222 | 225 | | |
223 | | - | |
| 226 | + | |
224 | 227 | | |
225 | 228 | | |
226 | 229 | | |
| |||
279 | 282 | | |
280 | 283 | | |
281 | 284 | | |
282 | | - | |
283 | | - | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
284 | 297 | | |
285 | 298 | | |
286 | 299 | | |
| |||
290 | 303 | | |
291 | 304 | | |
292 | 305 | | |
293 | | - | |
294 | | - | |
295 | | - | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
296 | 311 | | |
297 | 312 | | |
298 | 313 | | |
299 | 314 | | |
| 315 | + | |
| 316 | + | |
300 | 317 | | |
301 | 318 | | |
302 | 319 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
384 | 384 | | |
385 | 385 | | |
386 | 386 | | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
387 | 511 | | |
388 | 512 | | |
389 | 513 | | |
| |||
0 commit comments