Fix platform-dependent String.getBytes() calls to use explicit UTF-8 charset by saravadeo · Pull Request #10671 · DataDog/dd-trace-java

saravadeo · 2026-02-24T17:02:41Z

Summary

Specify StandardCharsets.UTF_8 in String.getBytes() calls used with MessageDigest and other encoding-sensitive APIs. Without an explicit charset, getBytes() uses the platform's default charset, which can vary across systems (e.g., UTF-8 on Linux vs Windows-1252 on older Windows) and produce inconsistent results.

Changes

`AppSecEventTracker.anonymize()` (internal-api)

Bug fix: userId.getBytes() → userId.getBytes(StandardCharsets.UTF_8)
User ID anonymization hashes are now consistent across all platforms, even for non-ASCII user IDs
Resolved the TODO about MessageDigest caching with a clarifying comment referencing micro-benchmark data showing negligible overhead of getInstance()

`Fingerprinter` (agent-debugger)

3× getBytes() → getBytes(StandardCharsets.UTF_8) for exception fingerprint hashing

`JsonStreamParser` (dd-trace-core)

raw.getBytes() → raw.getBytes(StandardCharsets.UTF_8) — JSON is UTF-8 by specification

`LLMObsSpanMapper` (dd-trace-core)

getKey().getBytes() → getKey().getBytes(StandardCharsets.UTF_8) — method is writeUTF8(), so the bytes should actually be UTF-8

Testing

All existing tests pass (AppSecEventTrackerSpecification, Fingerprinter, JsonStreamParser, LLMObsSpanMapper)
For ASCII-only strings (which existing tests use), behavior is unchanged since UTF-8 and most default charsets encode ASCII identically
The fix matters for non-ASCII characters (e.g., Unicode user IDs) where platform charsets diverge

saravadeo · 2026-02-24T17:04:50Z

Hi maintainers 👋 Could you please add the appropriate labels? I'd suggest:

comp: core (dd-trace-core changes)
comp: appsec (AppSecEventTracker change)
comp: debugger (Fingerprinter change)
type: bugfix

Thank you!

…charset Specify StandardCharsets.UTF_8 in String.getBytes() calls used with MessageDigest and other encoding-sensitive APIs. Without an explicit charset, getBytes() uses the platform's default charset, which can vary across systems and produce inconsistent results. Files changed: - AppSecEventTracker: user ID anonymization hash now uses UTF-8, ensuring consistent hashing across all platforms. Also resolved the TODO about MessageDigest caching with a clarifying comment referencing micro-benchmark data showing negligible overhead. - Fingerprinter: exception fingerprint hashes now use UTF-8. - JsonStreamParser: JSON byte conversion now uses UTF-8 (JSON spec). - LLMObsSpanMapper: writeUTF8() now receives actual UTF-8 bytes.

saravadeo force-pushed the fix/explicit-charset-in-getbytes-calls branch from 8afdff4 to 0c152d3 Compare February 24, 2026 17:07

saravadeo added 2 commits February 24, 2026 22:37

Merge branch 'master' into fix/explicit-charset-in-getbytes-calls

1888f44

Merge branch 'master' into fix/explicit-charset-in-getbytes-calls

e84c6a6

saravadeo marked this pull request as ready for review February 25, 2026 03:17

saravadeo requested review from a team as code owners February 25, 2026 03:17

saravadeo requested review from daniel-romano-DD, evanchooly and manuel-alvarez-alvarez and removed request for a team February 25, 2026 03:17

evanchooly approved these changes Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix platform-dependent String.getBytes() calls to use explicit UTF-8 charset#10671

Fix platform-dependent String.getBytes() calls to use explicit UTF-8 charset#10671
saravadeo wants to merge 3 commits intoDataDog:masterfrom
saravadeo:fix/explicit-charset-in-getbytes-calls

saravadeo commented Feb 24, 2026

Uh oh!

saravadeo commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saravadeo commented Feb 24, 2026

Summary

Changes

AppSecEventTracker.anonymize() (internal-api)

Fingerprinter (agent-debugger)

JsonStreamParser (dd-trace-core)

LLMObsSpanMapper (dd-trace-core)

Testing

Uh oh!

saravadeo commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`AppSecEventTracker.anonymize()` (internal-api)

`Fingerprinter` (agent-debugger)

`JsonStreamParser` (dd-trace-core)

`LLMObsSpanMapper` (dd-trace-core)