| agent | agent |
|---|
Use this workflow to create CodeQL data extensions (Models-as-Data) for third-party libraries and frameworks. Data extensions let you customize taint tracking without writing QL code — you author YAML files that declare which functions are sources, sinks, summaries, barriers, or barrier guards.
For format reference, read the MCP resource: codeql://learning/data-extensions
For language-specific guidance: codeql://languages/{{language}}/library-modeling
-
Confirm the target library and language
- Library name and version: {{libraryName}}
- Target language: {{language}}
- Determine the model format:
- MaD tuple format (9–10 column tuples): C/C++ (
codeql/cpp-all), C# (codeql/csharp-all), Go (codeql/go-all), Java/Kotlin (codeql/java-all) - API Graph format (3–5 column tuples): JavaScript/TypeScript (
codeql/javascript-all), Python (codeql/python-all), Ruby (codeql/ruby-all)
- MaD tuple format (9–10 column tuples): C/C++ (
- Using the wrong format will cause the extension to silently fail to load.
-
Locate a CodeQL database
- Tool: #list_codeql_databases
- Or create one: #codeql_database_create
- The database must contain code that exercises the target library
-
Explore the library's API surface
- Tool: #read_database_source — browse source files to identify relevant API calls
- Tool: #codeql_query_run with
queryName="PrintAST"— visualize how library calls are represented - Skim the library's public API docs, type stubs, or source code
For each public function or method on the library, classify it:
- Does it return data from outside the program (network, file, env, stdin)? →
sourceModelwithkindmatching the threat model (usually"remote") - Does it consume data in a security-sensitive operation (SQL, exec, path, redirect, eval, deserialize)? →
sinkModelwithkindmatching the vulnerability class (e.g."sql-injection","command-injection") - Does it pass data through opaque library code (encode, decode, wrap, copy, iterate)? →
summaryModelwithkind: "taint"(derived) orkind: "value"(identity) - Does it sanitize data so its output is safe for a specific sink kind? →
barrierModelwithkindmatching the sink kind it neutralizes - Does it return a boolean indicating whether data is safe? →
barrierGuardModelwith the appropriateacceptingValue("true"or"false") and matchingkind - Is the type a subclass of something already modeled? →
typeModel(API Graph languages) or setsubtypes: True(MaD tuple languages) - Did the auto-generated model assign a wrong summary? →
neutralModelto suppress it
A complete chain of source → (summary*) → sink is required for end-to-end findings; missing a single hop will cause false negatives.
Choose between two paths:
- Single-repo shortcut — drop
.model.ymlfiles under.github/codeql/extensions/<pack-name>/in the consuming repo. Nocodeql-pack.ymlis required; Code Scanning auto-loads extensions from this directory. Use when the models only need to apply to one repo. - Reusable model pack — create a pack directory with a
codeql-pack.ymldeclaringextensionTargetsanddataExtensions. Use when models will be consumed by multiple repos or by org-wide Default Setup.
-
Create the model file
- Use naming convention
<library>-<module>.model.yml(lowercase, hyphen-separated) - Split per logical module rather than putting an entire ecosystem in one file
- Read
codeql://languages/{{language}}/library-modelingfor the exact column layout and examples
- Use naming convention
-
Write the YAML with correct extensible predicates
extensions: - addsTo: pack: codeql/{{language}}-all extensible: sinkModel data: # Add tuples here — column count must exactly match the predicate schema - [...]
- Every row must have the exact column count for its extensible predicate — an invalid row will fail silently or cause errors
- Use
provenance: 'manual'(MaD format) for hand-written rows - Ensure
kindvalues match across the chain (e.g. a"sql-injection"barrier must guard a"sql-injection"sink)
Skip this step if you chose the .github/codeql/extensions/ shortcut in Phase 3.
For a reusable pack, create or update codeql-pack.yml:
name: <org>/<language>-<pack-name>
version: 0.0.1
library: true
extensionTargets:
codeql/<language>-all: '*'
dataExtensions:
- models/**/*.yml-
library: true— model packs are always libraries, never queries -
extensionTargets— names the upstream pack the extensions extend -
dataExtensions— a glob that picks up every.model.ymlyou author -
Install pack dependencies
- Tool: #codeql_pack_install — resolve dependencies for the model pack
Validate the model against a real database:
-
Run a relevant security query with the extension applied
- Tool: #codeql_query_run
- Pass the model pack directory via the
additionalPacksparameter - Pick a query whose sink kind matches what you modeled (e.g. a
sql-injectionquery when adding SQL sinks) - Decode results: #codeql_bqrs_decode or #codeql_bqrs_interpret
-
Verify expected findings appear
- New sources/sinks should produce findings that were absent without the extension
- Barriers/barrier guards should suppress findings that were previously reported
-
Create a test case for the extension
- Write a small test file that exercises the new source/sink/summary chain end-to-end
- Include both positive cases (vulnerable code detected) and negative cases (safe code not flagged)
-
Run the tests
- Tool: #codeql_test_run
- Pass the model pack directory via the
additionalPacksparameter - Note:
codeql test rundoes not accept--model-packs; extensions must be wired viacodeql-pack.ymlor--additional-packs
-
Accept correct results
- Tool: #codeql_test_accept — accept the
.actualoutput as the.expectedbaseline once you confirm it is correct
- Tool: #codeql_test_accept — accept the
- If the
.model.ymllives under.github/codeql/extensions/of the consuming repo, you are done — Code Scanning will load it on the next analysis. - If you authored a reusable model pack and want it to apply across an organization, publish it to GHCR with
codeql pack publishand configure it under org Code security → Global settings → CodeQL analysis → Model packs.
- Correct tuple format for the language (API Graph vs MaD)
- Every row has the exact column count for its extensible predicate
- Sink/barrier
kindvalues match across the chain - At least one end-to-end test exercises the new model and produces expected findings
-
codeql-pack.ymldataExtensionsglob actually matches the new files - No regressions in pre-existing tests under the same pack
codeql://learning/data-extensions— Common data extensions overview (both model formats)codeql://languages/{{language}}/library-modeling— Language-specific library modeling guidecodeql://templates/security— Security query templatescodeql://learning/test-driven-development— TDD workflow for CodeQL queries