Skip to content

fix: use StringDecoder for stdout/stderr to prevent UTF-8 corruption#15387

Open
mfleming wants to merge 1 commit intoanomalyco:devfrom
mfleming:mfleming/utf8-stdout-fix
Open

fix: use StringDecoder for stdout/stderr to prevent UTF-8 corruption#15387
mfleming wants to merge 1 commit intoanomalyco:devfrom
mfleming:mfleming/utf8-stdout-fix

Conversation

@mfleming
Copy link

Issue for this PR

Fixes #15385

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

src/tool/bash.ts and src/session/prompt.ts assemble child process stdout with output += chunk.toString(). Node delivers stdout as raw byte Buffers, and a multi-byte UTF-8 character (e.g. emdash = 3 bytes) can be split across two data events. Each chunk.toString() decodes its fragment independently, producing replacement characters.

Fix: use StringDecoder from string_decoder which buffers incomplete UTF-8 sequences across chunks.

How did you verify your code works?

Test in test/tool/utf8-stdout.test.ts spawns a child process that deliberately splits an emdash across two stdout writes. Verifies that chunk.toString() produces corruption and StringDecoder.write() doesn't.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Buffer.toString() on raw stdout chunks can split multi-byte UTF-8
characters (e.g. emdash U+2014 = 3 bytes) across data events,
producing replacement characters. Use StringDecoder which buffers
incomplete sequences across chunks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-byte UTF-8 characters in tool output render as replacement characters

1 participant