[{"content":"A Spring Boot 3.4 / Java 21 demo application that showcases a well-known Java concurrency pitfall: sharing a stateful CharsetDecoder across threads. The service provides two decoder implementations (vulnerable and hardened), toggled via configuration, so you can run the same load test against both and compare results live.\nLinks Deep-dive blog post — full analysis of the race condition, attack vector, and fix Interactive demo — metrics dashboard, error evidence, and decode history from a real load test Source code — Spring Boot service, load runner, and site generator ","permalink":"https://jakegenia.com/projects/charset-decoder-race-demo/","summary":"\u003cp\u003eA Spring Boot 3.4 / Java 21 demo application that showcases a well-known Java concurrency pitfall: sharing a stateful \u003ccode\u003eCharsetDecoder\u003c/code\u003e across threads. The service provides two decoder implementations (vulnerable and hardened), toggled via configuration, so you can run the same load test against both and compare results live.\u003c/p\u003e\n\u003ch2 id=\"links\"\u003eLinks\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"/posts/charset-decoder-race/\"\u003eDeep-dive blog post\u003c/a\u003e — full analysis of the race condition, attack vector, and fix\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"/projects/charset-decoder-race/demo/\"\u003eInteractive demo\u003c/a\u003e — metrics dashboard, error evidence, and decode history from a real load test\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"https://github.com/JAEKts\"\u003eSource code\u003c/a\u003e — Spring Boot service, load runner, and site generator\u003c/li\u003e\n\u003c/ul\u003e","title":"CharsetDecoder Race Condition"},{"content":"","permalink":"https://jakegenia.com/projects/whispy/","summary":"","title":"Whispy"},{"content":"","permalink":"https://jakegenia.com/projects/awssigner-contributor/","summary":"","title":"AWSSigner (Contributor)"},{"content":"Java\u0026rsquo;s CharsetDecoder is a critical component for text processing. The JDK documentation (JDK 9–21) explicitly states it is not thread-safe. Despite this, a common mistake persists in production systems: reusing a static or singleton CharsetDecoder instance across multiple threads without synchronization.\nWhen this happens, the decoder\u0026rsquo;s internal state machine (RESET → CODING → END) gets corrupted by concurrent access. One thread calls decode() while another resets the decoder mid-operation. The results:\nIllegalStateException thrown mid-decode Corrupted character output — the decoded string no longer matches the input Silent data corruption passed downstream to consumers, databases, and APIs This isn\u0026rsquo;t theoretical. Real-world incidents include Apache Hive (HIVE-22898) causing data corruption in ETL pipelines, and IBM WebSphere (APAR IJ15284) causing application failures during concurrent text processing.\nThis post walks through the vulnerability, demonstrates it under load with a Spring Boot service, and shows how to fix it. If you want to see the raw data — metrics, decode history, error logs — the interactive demo has everything.\nThe Vulnerable Code The pattern looks innocent. A CharsetDecoder is expensive to create, so a developer makes it a static singleton:\n@Component(\u0026#34;vulnerableDecoder\u0026#34;) public class LogDecoder implements Decoder { private static final CharsetDecoder sharedDecoder = StandardCharsets.UTF_8.newDecoder(); private static final ByteBuffer sharedByteBuffer = ByteBuffer.allocateDirect(200_000); private static final CharBuffer sharedCharBuffer = CharBuffer.allocate(200_000); @Override public String decode(byte[] input) { try { sharedByteBuffer.clear(); sharedByteBuffer.put(input); sharedByteBuffer.flip(); sharedCharBuffer.clear(); sharedDecoder.reset(); sharedDecoder.decode(sharedByteBuffer, sharedCharBuffer, true); sharedDecoder.flush(sharedCharBuffer); sharedCharBuffer.flip(); return sharedCharBuffer.toString(); } catch (Exception e) { throw new RuntimeException( \u0026#34;Decoder failed (likely concurrency issue)\u0026#34;, e); } } } Three things are shared across threads here: the CharsetDecoder, the input ByteBuffer, and the output CharBuffer. The decoder\u0026rsquo;s state field is an int that the JIT can cache in a CPU register, making the race on the decoder alone nearly invisible. But sharing the output CharBuffer means concurrent threads write characters into the same char[] heap array simultaneously. Heap writes are not register-cached, so races produce garbled character sequences, BufferOverflowException, or IllegalStateException.\nHow the Race Condition Works CharsetDecoder maintains an internal state machine with three states: RESET, CODING, and END. A normal decode cycle transitions through RESET → CODING → END. The problem is that none of these transitions are synchronized.\nWhen two threads hit the same decoder concurrently:\nThread T1 calls decode(), transitioning the state from RESET to CODING Thread T2 calls reset() while T1 is mid-decode, slamming the state back to RESET T1\u0026rsquo;s decode is now operating in an invalid state — the decoder thinks it\u0026rsquo;s in RESET while T1 is still writing to the output buffer The result is one of three outcomes:\nIllegalStateException — the decoder detects the invalid state transition and throws Corrupted output — the decoded string contains garbage characters from the other thread\u0026rsquo;s operation Silent pass-through — the corruption happens to produce valid-looking output that is quietly wrong The third outcome is the most dangerous. The service returns HTTP 200 with corrupted data, and nothing in the logs indicates a problem. The corruption propagates downstream to databases, APIs, and consumers.\nWhat It Looks Like Under Load The companion demo application is a Spring Boot service that exposes a /logs/decode endpoint. It can run with either the vulnerable or hardened decoder, toggled by configuration. Running 50 concurrent threads with 500 requests against each mode produces dramatically different results.\nVulnerable mode — shared singleton decoder:\nRequests succeed with corrupted data (silent corruption) IllegalStateException and MalformedInputException thrown mid-decode downstream.data.corrupted counter climbs steadily HTTP 500 errors returned to callers Hardened mode — ThreadLocal decoder:\nEvery request returns HTTP 200 with correct data Zero exceptions downstream.data.corrupted stays at 0 downstream.data.valid matches total request count The interactive demo shows the full metrics dashboard, error evidence, and decode history from a real load test run.\nAttack Vector — Concurrency-Flood DoS When thread-unsafe components like CharsetDecoder are shared in high-concurrency services, attackers — or even legitimate users during traffic spikes — can cause widespread failure by overwhelming the system.\nA request flood drives N concurrent threads into the shared decoder. The state corruption cascades into CPU spikes from exception handling, exception storms that fill log buffers, data corruption in downstream systems, and ultimately service outage.\nWarning While deliberate exploitation typically requires service endpoints that expose these vulnerabilities, accidental concurrency surges in high-traffic systems can lead to identical degradation. The failure mode is the same whether the traffic is malicious or organic.\nThe key insight is that this isn\u0026rsquo;t a traditional DoS where you exhaust bandwidth or memory. The attacker exploits a correctness bug — the shared mutable state — to turn the application\u0026rsquo;s own thread pool against itself. A relatively small number of concurrent requests can trigger cascading failures.\nThe Fix — Thread-Safe Decoder Patterns ThreadLocal Pattern (Recommended) Use ThreadLocal\u0026lt;CharsetDecoder\u0026gt; to give each thread its own independent decoder instance. This is the recommended approach for high-throughput services where decoder reuse per thread improves performance:\n@Component(\u0026#34;hardenedDecoder\u0026#34;) public class HardenedLogDecoder implements Decoder { private static final ThreadLocal\u0026lt;CharsetDecoder\u0026gt; DECODER = ThreadLocal.withInitial( () -\u0026gt; StandardCharsets.UTF_8.newDecoder()); @Override public String decode(byte[] input) { CharsetDecoder decoder = DECODER.get(); decoder.reset(); try { return decoder.decode(ByteBuffer.wrap(input)).toString(); } catch (Exception e) { throw new RuntimeException(\u0026#34;Decoder failed\u0026#34;, e); } } } Each thread gets its own decoder on first access. The decoder is reused across calls on the same thread, so you get the performance benefit of reuse without the concurrency risk. The ByteBuffer.wrap(input) call creates a lightweight view over the input array — no copying, no shared buffer.\nNew Instance Per Operation Alternatively, create a new CharsetDecoder for each decode call:\npublic String decode(byte[] input) { CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder(); return decoder.decode(ByteBuffer.wrap(input)).toString(); } This is the simplest approach and eliminates any possibility of shared mutable state. The cost is one CharsetDecoder allocation per call. For most services, this is negligible — CharsetDecoder construction is cheap compared to the I/O that typically surrounds it. Profile before assuming the ThreadLocal pattern is necessary.\nWhich to Use Pattern Best for Trade-off ThreadLocal High-throughput services, tight loops Slightly more complex; must manage lifecycle if using thread pools with unbounded growth New instance per call Most services, utility methods One allocation per call; simplest to reason about Both patterns ensure no shared mutable state exists between threads, completely eliminating the race condition.\nBroader Lessons This bug class extends beyond CharsetDecoder. Any stateful, non-thread-safe object that gets shared across threads is vulnerable to the same pattern:\nSimpleDateFormat (the classic Java concurrency trap) MessageDigest Cipher Matcher (from java.util.regex) JDBC Statement and ResultSet objects The general principle: treat all shared mutable state in concurrent environments with extreme caution. Favor immutability, thread-local resources, and defensive copying. When you see a static final instance of a stateful object in a multi-threaded context, that\u0026rsquo;s a code smell worth investigating.\nSource Code and Demo Interactive demo: View the full metrics dashboard, error evidence, and decode history Source code: The Spring Boot service, load runner, and site generator are available on GitHub References CharsetDecoder JavaDoc (JDK 21) ThreadLocal JavaDoc (JDK 21) Apache Hive HIVE-22898 IBM APAR IJ15284 OpenJDK JDK-8230843 JJWT Issue #787 SonarQube Rule S2885 (Multi-threading) ","permalink":"https://jakegenia.com/posts/charset-decoder-race/","summary":"Java\u0026rsquo;s CharsetDecoder is not thread-safe. Sharing a singleton instance across threads produces corrupted output, silent data loss, and service outages. Here\u0026rsquo;s how the race condition works and how to eliminate it.","title":"Java's CharsetDecoder Is Not Thread-Safe"},{"content":"What I do I do penetration testing for a living. Web applications, APIs, cloud-native architectures, and the AI-adjacent systems that are increasingly wired into all of them. External engagements, internal engagements, source code review when the scope calls for it. I\u0026rsquo;ve worked across a wide range of clients and industries.\nThe work covers the full spectrum of application security. SQL injection, cross-site scripting, broken access controls, server-side request forgery, insecure deserialization, denial of service via SlowHTTP or request flooding. I\u0026rsquo;ve built multi-step proof of concepts that chain low-severity findings into critical impact. If it\u0026rsquo;s on the OWASP Top 10 or any of the common vulnerability taxonomies, I\u0026rsquo;ve found it in production.\nWhat I\u0026rsquo;m curious about Outside of client work, I spend time on the problems that don\u0026rsquo;t fit neatly into a checklist.\nConcurrency bugs that only surface under production load. Authorization logic that breaks at the seam between two services that were each configured correctly. Trust boundaries in cloud-native architectures where the provisioning layer, the network layer, and the data plane each assume someone else is enforcing the constraint. The kinds of things that survive audits because they require crossing a boundary to see.\nI\u0026rsquo;m also interested in how AI is changing the attack surface. Not the hype, but the practical reality: what happens when you wire an LLM into an application that was never designed for adversarial input, or when AI-driven tooling creates new classes of testing that weren\u0026rsquo;t possible before.\nWhat I build I write tools. When I find a pattern worth repeating, I automate it. When I find a bug worth explaining, I build a demo.\nThe CharsetDecoder race condition project is a good example: a Spring Boot service that reproduces a Java concurrency vulnerability under load, with an interactive visualization of the results. AWSSigner is a Burp Suite extension for AWS request signing that I contribute to. Whispy is an AI-powered transcription tool I built in Python.\nMore of my work is on GitHub and Codeberg.\nWhat I write about This site is where I think through problems in public. Some posts are deep dives on a specific vulnerability. Some are notes on a class of bug I keep seeing across engagements. Some are about the tooling itself. I try to be specific about what I found, how I found it, and what the fix looks like.\nVerify me The canonical handles for anything I publish:\nGitHub: @JAEKts Codeberg: jaekts LinkedIn: jacob-genia-77b317149 For security-related contact, see .well-known/security.txt.\n","permalink":"https://jakegenia.com/about/","summary":"\u003ch2 id=\"what-i-do\"\u003eWhat I do\u003c/h2\u003e\n\u003cp\u003eI do penetration testing for a living. Web applications, APIs, cloud-native architectures, and the AI-adjacent systems that are increasingly wired into all of them. External engagements, internal engagements, source code review when the scope calls for it. I\u0026rsquo;ve worked across a wide range of clients and industries.\u003c/p\u003e\n\u003cp\u003eThe work covers the full spectrum of application security. SQL injection, cross-site scripting, broken access controls, server-side request forgery, insecure deserialization, denial of service via SlowHTTP or request flooding. I\u0026rsquo;ve built multi-step proof of concepts that chain low-severity findings into critical impact. If it\u0026rsquo;s on the OWASP Top 10 or any of the common vulnerability taxonomies, I\u0026rsquo;ve found it in production.\u003c/p\u003e","title":"About"}]