bug: Block read_pickle and class definitions, restrict custom CSV field to read_csv() only#6257
Draft
chloebyun-wd wants to merge 2 commits intomainfrom
Draft
Conversation
…ld to read_csv() only
Contributor
There was a problem hiding this comment.
Code Review
This pull request enhances security by introducing a specialized validator for custom CSV reading and expanding the list of forbidden Python patterns to prevent unsafe deserialization and class definitions. A security concern was raised regarding the regex for read_pickle, suggesting a broader match to prevent bypasses via variable assignment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The vulnerability
The CSV Agent node lets users supply a "Custom Pandas Read_CSV Code" field. This input is injected into
pd.${customReadCSVFunc}and executed via Pyodide. A denylist blocks dangerous Python keywords, butpd.read_pickle()wasn't on it — allowing an attacker to deserialize a malicious pickle payload and achieve remote code execution.How the fix works
Two independent layers — either one blocks the PoC.
1. Allowlist for the user-supplied field
The field is meant for
read_csv()calls only, so we now enforce exactly that:read_csv(This rejects the PoC's
isnull("") class MiniBytesIO: ... pd.read_pickle(...)payload at the very first check.2. New denylist entries
Four patterns added, protecting both user input and LLM-generated code:
read_pickle— the direct RCE vectorpickle— the deserialization modulemarshal— another unsafe deserializerclassdefinitions — used in the PoC to build a fake file-like objectHow the PoC is blocked
isnull("")instead ofread_csv(class MiniBytesIOclasspd.read_pickle(...)read_picklepickle