Skip to content

Warn on malformed #aw_* references in body text#19035

Merged
pelikhan merged 6 commits intomainfrom
copilot/fix-malformed-references-validation
Mar 1, 2026
Merged

Warn on malformed #aw_* references in body text#19035
pelikhan merged 6 commits intomainfrom
copilot/fix-malformed-references-validation

Conversation

Copy link
Contributor

Copilot AI commented Mar 1, 2026

  • Understand the issue: malformed #aw_ refs in body text silently passed through without any warning
  • Add malformed-ref detection to replaceTemporaryIdReferences() with core.warning()
  • Centralize TEMPORARY_ID_CANDIDATE_PATTERN as module-level constant next to TEMPORARY_ID_PATTERN with \b word boundary
  • Remove if (typeof core !== "undefined") guard — core is always defined via shim.cjs
  • Reset lastIndex before using the global regex to avoid state issues
  • Export TEMPORARY_ID_CANDIDATE_PATTERN from module
  • Extend TEMPORARY_ID_PATTERN from {3,8} to {3,12} with \b word boundary (lenient parsing)
  • Extend isTemporaryId() from {3,8} to {3,12} to keep validation consistent
  • Update all error messages across codebase from "3 to 8" to "3 to 12"
  • Update all test assertions to match new "3 to 12" error messages
  • Fix update_project.test.cjs "too long" test case to use 13-char ID (previous 10-char ID now valid)
  • Fix safe_output_topological_sort.test.cjs large-graph test: replace _-separated IDs (e.g. aw_lv1_0000000) with alphanumeric-only IDs (e.g. aw_lv1n0000000) so TEMPORARY_ID_PATTERN can match and resolve dependencies
  • All tests pass, fmt-cjs and lint-cjs clean
Original prompt

This section details on the original issue you should resolve

<issue_title>Malformed #aw_* references in body text pass through without validation</issue_title>
<issue_description>## What happens

When a safe-output message body contains a malformed temporary ID reference (e.g., #aw_kb — too short, or #aw_toolongname123 — too long), the reference passes through as literal text in the created issue. No warning or error is produced.

This happens because body text and direct field values follow different validation paths:

  • Direct field values (e.g., parent_issue_number in link_sub_issue) go through resolveIssueNumber(), which detects malformed refs with a startsWith("aw_") check and returns an error message.
  • Body text (e.g., issue body in create_issue) goes through replaceTemporaryIdReferences(), which uses a regex that only matches valid temp IDs (3-8 alphanumeric chars after aw_). Malformed refs don't match the regex and are left untouched — no validation, no warning.

What should happen

Malformed temp ID references in body text should be detected and flagged. At minimum, a warning should be emitted when a string matches #aw_ prefix but fails the full temp ID validation, so operators can identify broken references in created issues.

Where in the code

All references are to main at 2d91393f3.

Detection regex (only matches valid format):

  • temporary_id.cjs:30TEMPORARY_ID_PATTERN = /#(aw_[A-Za-z0-9]{3,8})/gi — only captures references with exactly 3-8 alphanumeric chars after aw_

Body text replacement (no malformed detection):

  • temporary_id.cjs:82-96replaceTemporaryIdReferences() uses TEMPORARY_ID_PATTERN; anything that doesn't match the regex is invisible to this function
  • create_issue.cjs:402 — calls replaceTemporaryIdReferences() on message body

Extraction (same gap):

  • temporary_id.cjs:440-448extractTemporaryIdReferences() for text fields uses the same regex, so malformed refs are never extracted or tracked

Direct field validation (has the detection logic):

  • temporary_id.cjs:281-286resolveIssueNumber() detects startsWith("aw_") values that fail isTemporaryId() and returns {resolved: null, wasTemporaryId: false, errorMessage: "Invalid temporary ID format..."}
  • temporary_id.cjs:58-63isTemporaryId() validates against /^aw_[A-Za-z0-9]{3,8}$/i

The gap: The malformed-detection logic in resolveIssueNumber() is never called for body text processing.

Evidence

Source-level verification (2026-03-01):

  • Confirmed replaceTemporaryIdReferences() at :82-96 only processes regex matches — malformed refs pass through as literal text
  • Confirmed extractTemporaryIdReferences() at :440-448 uses the same regex — malformed refs are never extracted
  • Confirmed resolveIssueNumber() at :281-286 has detection logic for malformed refs but is only called for direct field values
  • Confirmed test at temporary_id.test.cjs:120-125 verifies malformed refs like #aw_ab are left unchanged — the current behavior is tested as intentional, but no warning is produced
  • No tests exist for detecting or flagging malformed refs in body text

Proposed fix

Add a validation pass in replaceTemporaryIdReferences() that detects strings matching #aw_ prefix but failing the full TEMPORARY_ID_PATTERN regex, and emits a core.warning() for each malformed reference found.

Impact

Frequency: Low — requires the agent to produce a malformed temp ID reference in a message body. More likely when the agent generates references from context rather than from the temp ID map directly.
Cost: Low — malformed refs are cosmetic (literal text in issue bodies), not data loss. But they silently defeat the purpose of temp ID cross-references and can be confusing when auditing created issues.</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Add malformed-ref detection to replaceTemporaryIdReferences() so that
body text containing #aw_ prefixed sequences that fail the valid temp ID
format (3-8 alphanumeric chars) emit a core.warning() instead of silently
passing through as literal text.

Fixes #<issue>

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix validation for malformed #aw_* references in body text Warn on malformed #aw_* references in body text Mar 1, 2026
*/
function replaceTemporaryIdReferences(text, tempIdMap, currentRepo) {
// Detect and warn about malformed #aw_ references that won't be resolved
if (typeof core !== "undefined") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot core should always be defined through shim.cjs , remove check update tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the if (typeof core !== "undefined") guard in 8cfae30core is always available via shim.cjs.

function replaceTemporaryIdReferences(text, tempIdMap, currentRepo) {
// Detect and warn about malformed #aw_ references that won't be resolved
if (typeof core !== "undefined") {
const candidatePattern = /#aw_([A-Za-z0-9]+)/gi;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot add word boundaries, refactor regex to global variable near other regex so it's centralized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8cfae30 — extracted TEMPORARY_ID_CANDIDATE_PATTERN = /#aw_([A-Za-z0-9]+)\b/gi as a module-level constant next to TEMPORARY_ID_PATTERN, with \b word boundary at the end. Also added lastIndex = 0 reset before using the global regex.

…ries

- Extract TEMPORARY_ID_CANDIDATE_PATTERN as a module-level constant next
  to TEMPORARY_ID_PATTERN (with word boundary \b at end)
- Remove typeof core guard — core is always defined via shim.cjs
- Reset lastIndex before using the global regex in replaceTemporaryIdReferences

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan marked this pull request as ready for review March 1, 2026 12:22
Copilot AI review requested due to automatic review settings March 1, 2026 12:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds operator visibility into malformed #aw_* temporary-ID references that appear in body text by emitting warnings during reference replacement.

Changes:

  • Introduces TEMPORARY_ID_CANDIDATE_PATTERN to detect #aw_-prefixed candidate references and warn when they fail isTemporaryId().
  • Updates replaceTemporaryIdReferences() to scan body text for malformed temp-ID candidates and call core.warning().
  • Adds unit tests asserting warnings for too-short/too-long #aw_* references and no warnings for valid references.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
actions/setup/js/temporary_id.cjs Adds candidate detection regex + warning pass before performing normal temp-id replacements; exports new pattern.
actions/setup/js/temporary_id.test.cjs Adds tests validating warning behavior for malformed refs and non-warning for valid refs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@pelikhan pelikhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot although the space says max 8, parse up to 12 char long. Update tests.

* Regex pattern for detecting candidate #aw_ references (any length, word-boundary delimited)
* Used to identify malformed temporary ID references that don't match TEMPORARY_ID_PATTERN
*/
const TEMPORARY_ID_CANDIDATE_PATTERN = /#aw_([A-Za-z0-9]+)\b/gi;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEMPORARY_ID_CANDIDATE_PATTERN already uses \b and matches any length — no change needed there since the malformed check still delegates to the updated isTemporaryId() (89d3deb).

@@ -29,6 +29,12 @@ const crypto = require("crypto");
*/
const TEMPORARY_ID_PATTERN = /#(aw_[A-Za-z0-9]{3,8})/gi;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot add word boundary, extend to 12 character long (be lenient with parsing long identifies)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 89d3debTEMPORARY_ID_PATTERN is now /#(aw_[A-Za-z0-9]{3,12})\b/gi with word boundary and extended limit.

…ord boundary

- TEMPORARY_ID_PATTERN: {3,8} → {3,12} and add \b word boundary
- isTemporaryId(): /^aw_[A-Za-z0-9]{3,8}$/ → /^aw_[A-Za-z0-9]{3,12}$/
- Update JSDoc comments to reflect new 3-12 char range
- Tests: 9-char IDs now valid; update "too long" cases to use 13+ chars

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 1, 2026

@copilot although the space says max 8, parse up to 12 char long. Update tests.

Done in 89d3deb — both TEMPORARY_ID_PATTERN and isTemporaryId() now accept 3–12 alphanumeric characters. Tests updated: 9-char IDs are now valid, and "too long" cases use 13+ chars.

@pelikhan
Copy link
Contributor

pelikhan commented Mar 1, 2026

@copilot fix tests

@pelikhan
Copy link
Contributor

pelikhan commented Mar 1, 2026

…limit

- Update "3 to 8" → "3 to 12" in error messages across:
  temporary_id.cjs, create_issue.cjs, create_pull_request.cjs,
  safe_output_unified_handler_manager.cjs, update_project.cjs
- Update test assertions in temporary_id.test.cjs and update_project.test.cjs
- Use 13-char "aw_toolong123456" in update_project.test.cjs "too long" test case
  (previous "aw_toolong123" was 10 chars, now valid under 12-char limit)

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 1, 2026

@copilot fix tests

Fixed in 53bcf76. The "aw_toolong123" test case was 10 chars (now valid with the 12-char limit), causing the test to fail. Updated it to "aw_toolong123456" (13 chars) and propagated the "3 to 12" limit to all error messages across temporary_id.cjs, create_issue.cjs, create_pull_request.cjs, safe_output_unified_handler_manager.cjs, update_project.cjs, and their corresponding test assertions.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/github/gh-aw/contents/.github%2Fworkflows%2Faudit-workflows.md
    • Triggering command: /opt/hostedtoolcache/node/24.13.1/x64/bin/node /opt/hostedtoolcache/node/24.13.1/x64/bin/node --conditions node --conditions development --experimental-import-meta-resolve --require /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/suppress-warnings.cjs /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/dist/workers/forks.js git it git conf�� user.name Test User it HEAD git /home/REDACTED/wor-- /bin/sh (http block)

If you need me to access, download, or install something from one of these locations, you can either:

@pelikhan
Copy link
Contributor

pelikhan commented Mar 1, 2026

IDs like 'aw_lv1_0000000' contain underscores which are not matched by
TEMPORARY_ID_PATTERN ([A-Za-z0-9]{3,12}), making the test unable to
resolve dependencies. Replace underscore separators with 'n' to produce
valid alphanumeric IDs: 'aw_lv1n0000000' and 'aw_lv2n0n0001'.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 1, 2026

@copilot Fix https://github.com/github/gh-aw/actions/runs/22543858934/job/65303050890

Fixed in b31a26b. The large-graph test used IDs like aw_lv1_0000000 and aw_lv2_0_0001 with underscores, which TEMPORARY_ID_PATTERN ([A-Za-z0-9]{3,12}) can't match — so dependencies were never resolved and the sort order was undefined. Changed the separator from _ to n to produce valid alphanumeric IDs: aw_lv1n0000000 and aw_lv2n0n0001.

@pelikhan pelikhan merged commit 5fa65dd into main Mar 1, 2026
75 checks passed
@pelikhan pelikhan deleted the copilot/fix-malformed-references-validation branch March 1, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Malformed #aw_* references in body text pass through without validation

3 participants