You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/prompts/data_extensions_development.prompt.md
+57-5Lines changed: 57 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ This prompt provides common guidance for developing CodeQL data extensions acros
8
8
9
9
## Product Documentation
10
10
11
-
-[Extending coverage for a repository](https://docs.github.com/en/code-security/how-tos/scan-code-for-vulnerabilities/manage-your-configuration/editing-your-configuration-of-default-setup#extending-coverage-for-a-repository) - `.github/codeql/extensions directory` for local model pack refrences (does not need a qlpack.yml)
11
+
-[Extending coverage for a repository](https://docs.github.com/en/code-security/how-tos/scan-code-for-vulnerabilities/manage-your-configuration/editing-your-configuration-of-default-setup#extending-coverage-for-a-repository) - `.github/codeql/extensions directory` for local model pack references (does not need a qlpack.yml)
12
12
-[Extending coverage for all repositories in an organization](https://docs.github.com/en/code-security/how-tos/scan-code-for-vulnerabilities/manage-your-configuration/editing-your-configuration-of-default-setup#extending-coverage-for-all-repositories-in-an-organization) - publishing model packs and referencing them globally (must be done click button in UI)
13
13
-[Creating a CodeQL model pack](https://docs.github.com/en/code-security/tutorials/customize-code-scanning/creating-and-working-with-codeql-packs?versionId=free-pro-team%40latest&productId=code-security&restPage=how-tos%2Cscan-code-for-vulnerabilities%2Cmanage-your-configuration%2Cediting-your-configuration-of-default-setup#creating-a-codeql-model-pack) - publishing a model pack + for dataExtensions via qlpack.yml
14
14
@@ -17,7 +17,7 @@ This prompt provides common guidance for developing CodeQL data extensions acros
17
17
CodeQL analysis can be customized by adding library models in data extension YAML files to recognize libraries and frameworks that are not supported by default.
18
18
Model packs can be used to expand code scanning analysis at scale. Model packs use data extensions, which are implemented as YAML and describe how to add data for new dependencies. When a model pack is specified, the data extensions in that pack will be added to the code scanning analysis automatically.
19
19
20
-
Generally each language will allow customization of the following extensible prdicates:
20
+
Generally each language will allow customization of the following extensible predicates:
21
21
22
22
- sourceModel - This is used to model sources of potentially tainted data. The `kind` of the sources defined using this predicate determine which **threat model** they are associated with (e.g., `remote`, `local`, `file`, `commandargs`). Different threat models can be used to customize the sources used in an analysis.
23
23
- sinkModel - This is used to model sinks where tainted data maybe used in a way that makes the code vulnerable. The `kind` identifies the vulnerability class (e.g., `sql-injection`, `command-injection`).
All `.model.yml` files within a model pack are automatically picked up via the `dataExtensions` glob in `qlpack.yml` (e.g., `dataExtensions: models/**/*.yml`).
176
176
177
+
### Common Workflows
178
+
179
+
Data extensions support three primary workflows. An agent should follow the appropriate procedure end-to-end rather than jumping straight to YAML authoring.
180
+
181
+
#### Workflow 1: Creating a new `.model.yml`
182
+
183
+
1.**Identify the library to model** — review the library's API documentation or source code and classify public methods as sources, sinks, summaries, barriers, or barrier guards (see "What to Model in a Library" above)
184
+
2.**Determine the correct format** — check whether the target language uses API Graph (Python, Ruby, JS/TS) or MaD (Java/Kotlin, C#, Go, C/C++) tuples (see "Two Model Formats" below)
185
+
3.**Create the YAML file** — use the naming convention `<library>-<module>.model.yml` and the appropriate column format for the language
186
+
4.**Place the file** — choose one of two paths depending on scope:
187
+
-**Single repository:** Place the `.model.yml` directly in `.github/codeql/extensions/<pack-name>/` — no `qlpack.yml` is needed; Code Scanning picks up extensions from this directory automatically
188
+
-**Model pack (reusable across repos):** Place the file under a pack directory (e.g., `languages/<language>/custom/src/`) with a `qlpack.yml` that declares `extensionTargets` and `dataExtensions`
189
+
5.**Test locally** — run a targeted query against a sample database to confirm new findings appear (see "Model Pack / Data Extension Options" below for `--additional-packs` usage):
190
+
```bash
191
+
codeql query run \
192
+
--database=/path/to/db \
193
+
--additional-packs=<path-to-pack-dir> \
194
+
--output=results.bqrs \
195
+
-- path/to/RelevantQuery.ql
196
+
```
197
+
6.**Validate results** — decode and inspect results with `codeql bqrs decode`; confirm expected findings appear and no false positives are introduced
198
+
199
+
#### Workflow 2: Updating an existing `.model.yml`
200
+
201
+
1.**Find the existing model file** — check these locations in order:
202
+
-`.github/codeql/extensions/` in the current repository
203
+
-`languages/<lang>/custom/src/` in this template repository
204
+
- Published model packs (search GHCR or your org's CodeQL pack registry)
205
+
-**Note:** Models in upstream `codeql/<lang>-all` packs cannot be edited directly — create a custom model pack that adds new rows alongside the built-in models
206
+
2.**Add new rows** to the appropriate extensible predicate section (`sinkModel`, `sourceModel`, `summaryModel`, etc.) — do not remove existing rows unless they are incorrect
207
+
3.**Maintain consistency** — match the existing formatting, column count, and provenance values in the file
208
+
4.**Re-test** — run the same query or test suite that covers the library to confirm:
209
+
- Existing findings are unchanged (no regressions)
210
+
- New coverage produces expected results
211
+
5.**Bump the version** — if the model file lives in a published model pack, increment the `version` field in `qlpack.yml` before publishing
212
+
213
+
#### Workflow 3: Publishing a model pack to GHCR
214
+
215
+
1.**Ensure `qlpack.yml` is configured correctly:**
216
+
```yaml
217
+
name: <org>/<language>-<pack-name>
218
+
version: 1.0.0
219
+
library: true
220
+
extensionTargets:
221
+
codeql/<language>-all: '*'
222
+
dataExtensions:
223
+
- models/**/*.yml
224
+
```
225
+
2. **Run `codeql pack publish`** to push the pack to the GitHub Container Registry
226
+
3. **Configure for org-wide Default Setup** — in the GitHub organization settings, navigate to Code security → Default setup → Model packs and add `<org>/<language>-<pack-name>` (see [Extending coverage for all repositories in an organization](https://docs.github.com/en/code-security/how-tos/find-and-fix-code-vulnerabilities/manage-your-configuration/editing-your-configuration-of-default-setup#extending-codeql-coverage-with-codeql-model-packs-in-default-setup))
227
+
4. **For updates to an already-published pack** — increment the `version` in `qlpack.yml`, then re-run `codeql pack publish`; Default Setup will pick up the new version automatically based on the version range configured
228
+
177
229
### Two Model Formats: API Graph vs MaD
178
230
179
231
CodeQL data extensions use one of two tuple formats depending on the language. Using the wrong format for a language will produce invalid extensions.
- Use specific `local` subcategories (e.g., `"file"`, `"commandargs"`) when modeling local input mechanisms — be precise rather than using the generic `"local"` parent
325
377
- When in doubt, use `"remote"` — it provides the broadest default coverage
326
378
327
-
### Query Quality Criteria
379
+
### Model Quality Criteria
328
380
329
381
Your generated CodeQL models will be evaluated on:
330
382
331
383
1. **Code Quality**:
332
384
- **Critical**: Extensions must be formatted without errors. Invalid extensions will fail the engine and have negative code quality.
- **Best Practice**: Follow CodeQL naming conventions and idioms, provide comments with sensible organizaiton
386
+
- **Best Practice**: Follow CodeQL naming conventions and idioms, provide comments with sensible organization
335
387
336
388
### Common Pitfalls
337
389
@@ -341,7 +393,7 @@ Your generated CodeQL models will be evaluated on:
341
393
342
394
Access paths for data extensions are parsed using [shared/dataflow/codeql/dataflow/internal/AccessPathSyntax.qll](https://github.com/github/codeql/blob/main/shared/dataflow/codeql/dataflow/internal/AccessPathSyntax.qll)
343
395
344
-
For languages that support API Graphs as the access paths can be most easilly tested by:
396
+
For languages that support API Graphs as the access paths can be most easily tested by:
345
397
346
398
1. creating a small codeql database with some sample code that has a full end to end flow for the suspected query
347
399
2. writing/executing a sample codeql query using api graphs to verify with 100% certainty that the path to discover the suspected source/sink/summary is verified.
Copy file name to clipboardExpand all lines: .github/prompts/python_data_extension_development.prompt.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ mode: agent
5
5
# Python Data Extension
6
6
7
7
For general CodeQL data extension model development guidance, see [Common Data Extension Development](./data_extensions_development.prompt.md).
8
-
For general CodeQL query development guidance, see [Common Query Development](./query_development.prompt.md).
8
+
If you need to write a custom CodeQL query instead of a data extension, see [Common Query Development](./query_development.prompt.md).
9
9
10
10
## Python-Specific Documentation
11
11
@@ -14,7 +14,7 @@ For general CodeQL query development guidance, see [Common Query Development](./
14
14
-[Customizing Library Models for Python](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-python/)
15
15
- Can also be found at [Customizing Library Models for Python Docs](https://github.com/github/codeql/blob/main/docs/codeql/codeql-language-guides/customizing-library-models-for-python.rst)
16
16
17
-
-[Using API graphs in Python](https://codeql.github.com/docs/codeql-language-guides/using-api-graphs-in-python/) - the acess paths input to the extension tuple are powered by API graphs
17
+
-[Using API graphs in Python](https://codeql.github.com/docs/codeql-language-guides/using-api-graphs-in-python/) - the access paths input to the extension tuple are powered by API graphs
0 commit comments