Skip to content

Commit 2f1b6e3

Browse files
committed
feat(resources): add CodeQL MaD extensions support
Implements changes required for resolution of issue #261 and first-class support for CodeQL Models-as-Data (MaD) extensions as part of agentic CodeQL development. Add per-language library-modeling resources, a common data-extensions overview, and a procedural MCP prompt for data extension development workflows. Resources: - Add library-modeling for cpp, csharp, java, javascript, python, ruby (from template PR #42) - Add data-extensions-overview.md covering MaD tuple and API Graph formats (codeql://learning/data-extensions) - Update Go library-modeling with barrierModel and barrierGuardModel (CodeQL 2.25.2+) - Register 6 new language resources in language-types.ts Prompt: - Add data_extension_development MCP prompt with 8-step procedural workflow (from template PR #48) Docs: - Update server-overview.md, server-prompts.md, server-queries.md with new URIs and references
1 parent 484bb46 commit 2f1b6e3

20 files changed

Lines changed: 2303 additions & 45 deletions

server/dist/codeql-development-mcp-server.js

Lines changed: 124 additions & 9 deletions
Large diffs are not rendered by default.

server/dist/codeql-development-mcp-server.js.map

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

server/src/lib/resources.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
*/
99

1010
// Static imports — esbuild inlines the file contents as string literals.
11+
import dataExtensionsOverviewContent from '../resources/data-extensions-overview.md';
1112
import dataflowMigrationContent from '../resources/dataflow-migration-v1-to-v2.md';
1213
import learningQueryBasicsContent from '../resources/learning-query-basics.md';
1314
import performancePatternsContent from '../resources/performance-patterns.md';
@@ -82,6 +83,13 @@ export function getQueryUnitTesting(): string {
8283
return queryUnitTestingContent;
8384
}
8485

86+
/**
87+
* Get the data extensions overview content
88+
*/
89+
export function getDataExtensionsOverview(): string {
90+
return dataExtensionsOverviewContent;
91+
}
92+
8593
/**
8694
* Get the dataflow migration (v1 to v2) guide content
8795
*/
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
---
2+
agent: agent
3+
---
4+
5+
# Data Extension Development Workflow
6+
7+
Use this workflow to create CodeQL data extensions (Models-as-Data) for third-party libraries and frameworks. Data extensions let you customize taint tracking without writing QL code — you author YAML files that declare which functions are sources, sinks, summaries, barriers, or barrier guards.
8+
9+
For format reference, read the MCP resource: `codeql://learning/data-extensions`
10+
For language-specific guidance: `codeql://languages/{{language}}/library-modeling`
11+
12+
## Workflow Checklist
13+
14+
### Phase 1: Identify the Target
15+
16+
- [ ] **Confirm the target library and language**
17+
- Library name and version: {{libraryName}}
18+
- Target language: {{language}}
19+
- Determine the model format:
20+
- **MaD tuple format** (9–10 column tuples): C/C++ (`codeql/cpp-all`), C# (`codeql/csharp-all`), Go (`codeql/go-all`), Java/Kotlin (`codeql/java-all`)
21+
- **API Graph format** (3–5 column tuples): JavaScript/TypeScript (`codeql/javascript-all`), Python (`codeql/python-all`), Ruby (`codeql/ruby-all`)
22+
- Using the wrong format will cause the extension to silently fail to load.
23+
24+
- [ ] **Locate a CodeQL database**
25+
- Tool: #list_codeql_databases
26+
- Or create one: #codeql_database_create
27+
- The database must contain code that exercises the target library
28+
29+
- [ ] **Explore the library's API surface**
30+
- Tool: #read_database_source — browse source files to identify relevant API calls
31+
- Tool: #codeql_query_run with `queryName="PrintAST"` — visualize how library calls are represented
32+
- Skim the library's public API docs, type stubs, or source code
33+
34+
### Phase 2: Classify the API Surface
35+
36+
For each public function or method on the library, classify it:
37+
38+
1. **Does it return data from outside the program** (network, file, env, stdin)? → `sourceModel` with `kind` matching the threat model (usually `"remote"`)
39+
2. **Does it consume data in a security-sensitive operation** (SQL, exec, path, redirect, eval, deserialize)? → `sinkModel` with `kind` matching the vulnerability class (e.g. `"sql-injection"`, `"command-injection"`)
40+
3. **Does it pass data through opaque library code** (encode, decode, wrap, copy, iterate)? → `summaryModel` with `kind: "taint"` (derived) or `kind: "value"` (identity)
41+
4. **Does it sanitize data so its output is safe for a specific sink kind?**`barrierModel` with `kind` matching the sink kind it neutralizes
42+
5. **Does it return a boolean indicating whether data is safe?**`barrierGuardModel` with the appropriate `acceptingValue` (`"true"` or `"false"`) and matching `kind`
43+
6. **Is the type a subclass of something already modeled?**`typeModel` (API Graph languages) or set `subtypes: True` (MaD tuple languages)
44+
7. **Did the auto-generated model assign a wrong summary?**`neutralModel` to suppress it
45+
46+
A complete chain of **source → (summary\*) → sink** is required for end-to-end findings; missing a single hop will cause false negatives.
47+
48+
### Phase 3: Choose the Deployment Scope
49+
50+
Choose between two paths:
51+
52+
- **Single-repo shortcut** — drop `.model.yml` files under `.github/codeql/extensions/<pack-name>/` in the consuming repo. **No `codeql-pack.yml` is required**; Code Scanning auto-loads extensions from this directory. Use when the models only need to apply to one repo.
53+
- **Reusable model pack** — create a pack directory with a `codeql-pack.yml` declaring `extensionTargets` and `dataExtensions`. Use when models will be consumed by multiple repos or by org-wide Default Setup.
54+
55+
### Phase 4: Author the `.model.yml` File(s)
56+
57+
- [ ] **Create the model file**
58+
- Use naming convention `<library>-<module>.model.yml` (lowercase, hyphen-separated)
59+
- Split per logical module rather than putting an entire ecosystem in one file
60+
- Read `codeql://languages/{{language}}/library-modeling` for the exact column layout and examples
61+
62+
- [ ] **Write the YAML with correct extensible predicates**
63+
64+
```yaml
65+
extensions:
66+
- addsTo:
67+
pack: codeql/{{language}}-all
68+
extensible: sinkModel
69+
data:
70+
# Add tuples here — column count must exactly match the predicate schema
71+
- [...]
72+
```
73+
74+
- Every row must have the **exact column count** for its extensible predicate — an invalid row will fail silently or cause errors
75+
- Use `provenance: 'manual'` (MaD format) for hand-written rows
76+
- Ensure `kind` values match across the chain (e.g. a `"sql-injection"` barrier must guard a `"sql-injection"` sink)
77+
78+
### Phase 5: Configure `codeql-pack.yml` (Model-Pack Path Only)
79+
80+
Skip this step if you chose the `.github/codeql/extensions/` shortcut in Phase 3.
81+
82+
For a reusable pack, create or update `codeql-pack.yml`:
83+
84+
```yaml
85+
name: <org>/<language>-<pack-name>
86+
version: 0.0.1
87+
library: true
88+
extensionTargets:
89+
codeql/<language>-all: '*'
90+
dataExtensions:
91+
- models/**/*.yml
92+
```
93+
94+
- `library: true` — model packs are always libraries, never queries
95+
- `extensionTargets` — names the upstream pack the extensions extend
96+
- `dataExtensions` — a glob that picks up every `.model.yml` you author
97+
98+
- [ ] **Install pack dependencies**
99+
- Tool: #codeql_pack_install — resolve dependencies for the model pack
100+
101+
### Phase 6: Test with `codeql query run`
102+
103+
Validate the model against a real database:
104+
105+
- [ ] **Run a relevant security query with the extension applied**
106+
- Tool: #codeql_query_run
107+
- Pass the model pack directory via the `additionalPacks` parameter
108+
- Pick a query whose sink kind matches what you modeled (e.g. a `sql-injection` query when adding SQL sinks)
109+
- Decode results: #codeql_bqrs_decode or #codeql_bqrs_interpret
110+
111+
- [ ] **Verify expected findings appear**
112+
- New sources/sinks should produce findings that were absent without the extension
113+
- Barriers/barrier guards should suppress findings that were previously reported
114+
115+
### Phase 7: Run Unit Tests with `codeql test run`
116+
117+
- [ ] **Create a test case for the extension**
118+
- Write a small test file that exercises the new source/sink/summary chain end-to-end
119+
- Include both positive cases (vulnerable code detected) and negative cases (safe code not flagged)
120+
121+
- [ ] **Run the tests**
122+
- Tool: #codeql_test_run
123+
- Pass the model pack directory via the `additionalPacks` parameter
124+
- Note: `codeql test run` does **not** accept `--model-packs`; extensions must be wired via `codeql-pack.yml` or `--additional-packs`
125+
126+
- [ ] **Accept correct results**
127+
- Tool: #codeql_test_accept — accept the `.actual` output as the `.expected` baseline once you confirm it is correct
128+
129+
### Phase 8: Decide Next Steps
130+
131+
- If the `.model.yml` lives under `.github/codeql/extensions/` of the consuming repo, you are **done** — Code Scanning will load it on the next analysis.
132+
- If you authored a reusable model pack and want it to apply across an organization, publish it to GHCR with `codeql pack publish` and configure it under org Code security → Global settings → CodeQL analysis → Model packs.
133+
134+
## Validation Checklist
135+
136+
- [ ] Correct tuple format for the language (API Graph vs MaD)
137+
- [ ] Every row has the exact column count for its extensible predicate
138+
- [ ] Sink/barrier `kind` values match across the chain
139+
- [ ] At least one end-to-end test exercises the new model and produces expected findings
140+
- [ ] `codeql-pack.yml` `dataExtensions` glob actually matches the new files
141+
- [ ] No regressions in pre-existing tests under the same pack
142+
143+
## Related Resources
144+
145+
- `codeql://learning/data-extensions` — Common data extensions overview (both model formats)
146+
- `codeql://languages/{{language}}/library-modeling` — Language-specific library modeling guide
147+
- `codeql://templates/security` — Security query templates
148+
- `codeql://learning/test-driven-development` — TDD workflow for CodeQL queries

server/src/prompts/workflow-prompts.ts

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,27 @@ export async function resolvePromptFilePath(
191191
// placeholder text shown in the VS Code input box.
192192
// ────────────────────────────────────────────────────────────────────────────
193193

194+
/**
195+
* Schema for data_extension_development prompt parameters.
196+
*
197+
* - `language` is **required** – the model format depends on the language.
198+
* - `libraryName` is optional – the library or framework to model.
199+
* - `database` is optional – path to a CodeQL database for testing.
200+
*/
201+
export const dataExtensionDevelopmentSchema = z.object({
202+
language: z
203+
.enum(SUPPORTED_LANGUAGES)
204+
.describe('Programming language for the data extension'),
205+
libraryName: z
206+
.string()
207+
.optional()
208+
.describe('Name of the library or framework to model'),
209+
database: z
210+
.string()
211+
.optional()
212+
.describe('Path to a CodeQL database for testing the extension'),
213+
});
214+
194215
/**
195216
* Schema for test_driven_development prompt parameters.
196217
*
@@ -621,6 +642,7 @@ export function createSafePromptHandler<T extends z.ZodObject<z.ZodRawShape>>(
621642
export const WORKFLOW_PROMPT_NAMES = [
622643
'check_for_duplicated_code',
623644
'compare_overlapping_alerts',
645+
'data_extension_development',
624646
'document_codeql_query',
625647
'explain_codeql_query',
626648
'find_overlapping_queries',
@@ -1044,6 +1066,60 @@ export function registerWorkflowPrompts(server: McpServer): void {
10441066
),
10451067
);
10461068

1069+
// Data Extension Development Prompt
1070+
server.prompt(
1071+
'data_extension_development',
1072+
'End-to-end workflow for creating CodeQL data extensions (Models-as-Data) for third-party libraries',
1073+
addCompletions(toPermissiveShape(dataExtensionDevelopmentSchema.shape)),
1074+
createSafePromptHandler(
1075+
'data_extension_development',
1076+
dataExtensionDevelopmentSchema,
1077+
async ({ language, libraryName, database }) => {
1078+
const template = loadPromptTemplate('data-extension-development.prompt.md');
1079+
1080+
const warnings: string[] = [];
1081+
let resolvedDatabase = database || '<database-path>';
1082+
if (database) {
1083+
const dbResult = await resolvePromptFilePath(database);
1084+
if (dbResult.blocked) return blockedPathError(dbResult, 'database path');
1085+
resolvedDatabase = dbResult.resolvedPath;
1086+
if (dbResult.warning) warnings.push(dbResult.warning);
1087+
}
1088+
1089+
const content = processPromptTemplate(template, {
1090+
language,
1091+
libraryName: libraryName || '<library-name>',
1092+
});
1093+
1094+
let contextSection = '## Data Extension Context\n\n';
1095+
contextSection += `- **Language**: ${language}\n`;
1096+
if (libraryName) {
1097+
contextSection += `- **Library**: ${libraryName}\n`;
1098+
}
1099+
if (database) {
1100+
contextSection += `- **Database**: ${markdownInlineCode(resolvedDatabase)}\n`;
1101+
}
1102+
contextSection += '\n';
1103+
1104+
const warningSection = warnings.length > 0
1105+
? warnings.join('\n') + '\n\n'
1106+
: '';
1107+
1108+
return {
1109+
messages: [
1110+
{
1111+
role: 'user',
1112+
content: {
1113+
type: 'text',
1114+
text: warningSection + contextSection + content,
1115+
},
1116+
},
1117+
],
1118+
};
1119+
},
1120+
),
1121+
);
1122+
10471123
// Document CodeQL Query Prompt
10481124
server.prompt(
10491125
'document_codeql_query',
@@ -1263,6 +1339,10 @@ ${workspaceUri ? `- **Workspace URI**: ${workspaceUri}
12631339
logger.info(`Registered ${WORKFLOW_PROMPT_NAMES.length} workflow prompts`);
12641340
}
12651341

1342+
// ────────────────────────────────────────────────────────────────────────────
1343+
// End of registerWorkflowPrompts — helper functions below
1344+
// ────────────────────────────────────────────────────────────────────────────
1345+
12661346
/**
12671347
* Build context section for tools query workflow
12681348
*/

0 commit comments

Comments
 (0)