In a previous article, I explored Event-Driven Architecture from the conceptual side -- events vs. commands, Event Sourcing, CQRS, Saga patterns, and the cultural shift required to move from request-response thinking to event-driven thinking.
Theory is essential. But theory without code is a PowerPoint.
This post is about that transition. I built an open-source microservice called Mars Enterprise Kit Lite that implements EDA with Java 25, Spring Boot 4.0, Kafka (via Redpanda), and PostgreSQL 16. The project is real, it runs, and you can clone it right now.
But here is the twist: the project has a deliberate flaw. It implements the Dual Write anti-pattern -- and I left it there on purpose.
Understanding the problem is the first step to solving it.
Here is a scenario most of us have faced.
A customer places an order. Your service saves it to PostgreSQL. Then it publishes an order.created event to Kafka so downstream services can react -- inventory, billing, notifications.
Two writes. Two systems. One operation.
What happens when the database commit succeeds but the Kafka publish fails? The order exists in the database. No event was published. Downstream consumers never learn about it. Inventory is never reserved. The billing service never charges.
sequenceDiagram
participant HR as HTTP Request
participant OS as Order Service
participant DB as Database
participant K as Kafka
HR->>OS: POST /orders
OS->>DB: INSERT order
DB-->>OS: OK
OS-xK: publish("order.created") ❌
Note over OS,K: FAILURE — event never delivered
The reverse is equally dangerous: Kafka receives the event, but the database transaction rolls back. Now downstream consumers act on an order that was never persisted.
No retry. No compensation. Silent inconsistency.
It works... until it doesn't.
The Dual Write problem occurs when a service writes to two separate systems -- such as a database and a message broker -- without atomic guarantees across both. If the first write succeeds but the second fails, the systems become silently inconsistent.
A @Transactional annotation covers the database. But Kafka lives outside that transactional boundary. There is no native atomicity across both. This gap is what the Mars Enterprise Kit Lite project exposes -- intentionally.
I built this project to expose this flaw on purpose. Understanding the problem is the first step to solving it. Later in this article, I show how the Transactional Outbox Pattern solves this — and how the Mars Enterprise Kit Pro implements that pattern end-to-end.
Now let's look at the consistency gap more closely. Here is the actual CreateOrderUseCase — the exact code running in this project:
// domain/usecase/CreateOrderUseCase.java
@Service
public class CreateOrderUseCase {
@Transactional
public UUID execute(final Input input) {
var result = Order.create(input.customerId(), input.items());
orderRepository.save(result.domain());
// ⚠️ DUAL WRITE: no atomicity guarantee between DB and Kafka.
// If publish fails, the order exists in PostgreSQL but the event is silently lost.
try {
orderEventPublisher.publish(result.event());
} catch (Exception e) {
log.warn("DUAL WRITE FAILURE — EVENT LOST for orderId={}. " +
"Order saved in DB but event NOT published to Kafka. Cause: {}",
result.domain().id(), e.getMessage());
}
return result.domain().id();
}
}
The @Transactional annotation wraps the database operation. Kafka lives outside that transactional boundary — there is no native atomicity across both.
Here is the exact timeline when Kafka goes down:
t=0ms -> POST /orders arrives
t=1ms -> @Transactional begins
t=3ms -> orderRepository.save(order) [DB INSERT, within transaction]
t=5ms -> orderEventPublisher.publish() [Kafka send, OUTSIDE transaction — throws]
t=5ms -> catch(Exception e) [exception swallowed, WARN logged]
t=6ms -> @Transactional commits [DB commit succeeds]
t=6ms -> HTTP 201 returned to client [client sees SUCCESS]
Result: Order EXISTS in PostgreSQL. Event does NOT exist in Kafka.
The client got 201. Nobody knows the event was lost.
This pattern appears constantly in production code:
try {
orderEventPublisher.publish(result.event());
} catch (Exception e) {
log.warn("Failed to publish event: {}", e.getMessage());
}
It looks defensive. It looks resilient. It is actually hiding a data consistency failure.
The database committed. The client received 201 Created. The WARN log fires and gets ignored in the noise of a busy system. Downstream consumers never receive the event. Inventory is never reserved. Billing never charges. The order exists in the database and nowhere else in the system.
Two failure modes from the same root cause:
The code looks correct. It compiles. It passes unit tests. It works in dev. It works... until it doesn't.
This is why the flaw is intentional. You need to see it to understand why patterns like the Transactional Outbox exist.
There's another dimension to this problem that most developers miss. The timeline above assumes orderRepository.save() fires the SQL immediately. It doesn't.
JPA defers the INSERT to flush time — which happens just before the transaction commits. This creates a subtle but critical execution order:
t=0ms -> @Transactional begins
t=1ms -> orderRepository.save(order) [JPA queues INSERT — NO SQL yet]
t=3ms -> orderEventPublisher.publish() [KafkaTemplate.send() dispatches IMMEDIATELY]
t=4ms -> Kafka broker receives event ✅
t=5ms -> @Transactional prepares to commit
t=5ms -> JPA flushes → SQL INSERT fires
t=6ms -> PostgreSQL evaluates constraints
t=6ms -> CONSTRAINT VIOLATION → ROLLBACK
Result: Event EXISTS in Kafka. Order does NOT exist in PostgreSQL.
No chaos endpoint needed. No PhantomEventChaosAspect.
This is a natural Phantom Event.
The most common real-world trigger: a client retries a request and you add an idempotency_key constraint to prevent duplicate orders.
-- V2__add_idempotency_key.sql
ALTER TABLE orders ADD COLUMN idempotency_key VARCHAR(255);
CREATE UNIQUE INDEX orders_idempotency_key_idx ON orders(idempotency_key);
// Client sends the same request twice (network retry, button double-click, etc.)
// First request: succeeds normally.
// Second request (same idempotency_key):
// 1. @Transactional begins
// 2. orderRepository.save(order) <- JPA queues INSERT, no SQL yet
// 3. orderEventPublisher.publish() <- Kafka receives the event ✅
// 4. JPA flush fires INSERT
// 5. PostgreSQL: UNIQUE VIOLATION on idempotency_key -> ROLLBACK
//
// Result: Kafka has a duplicate event. PostgreSQL has no duplicate order.
// Downstream consumers process an order that does not exist.
No AOP. No chaos profile. Just a constraint, a retry, and JPA's normal flush behavior.
Talking about failure is one thing. Watching it happen is another.
The project includes two chaos testing scenarios you can run to see the Dual Write breaking on your machine. This is not a simulation -- it is real inconsistency between PostgreSQL and Kafka.
The problem: an order.created event exists in Kafka, but the order does not exist in PostgreSQL. Any consumer processing this event will reference a phantom order.
The project includes a built-in chaos endpoint (POST /chaos/phantom-event) that uses an AOP interceptor to force a DB rollback after the Kafka event has already been published. To activate it, start the application with the chaos profile:
# Start the app with the chaos profile
SPRING_PROFILES_ACTIVE=chaos mvn spring-boot:run
# Trigger the phantom event scenario
curl -s -X POST http://localhost:8082/chaos/phantom-event \
-H "Content-Type: application/json" \
-d '{
"customerId": "550e8400-e29b-41d4-a716-446655440000",
"items": [
{"productId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "quantity": 2, "unitPrice": 149.95}
]
}'
Response:
{
"orderId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"existsInDb": false,
"eventSentToKafka": true,
"dbRolledBack": true,
"explanation": "PHANTOM EVENT: The order.created event was published to Kafka, but the order does NOT exist in PostgreSQL. Any consumer processing this event will reference a non-existent order."
}
Verify it yourself -- the order does not exist in the database, but the event is in Kafka:
# Order does NOT exist in PostgreSQL (rolled back)
docker-compose exec postgres psql -U mars -d orders_db -c \
"SELECT * FROM orders WHERE id = '<orderId>';"
# → (0 rows)
# Event DOES exist in Kafka
docker-compose exec redpanda rpk topic consume order.created --num 1 --offset end
# → Event payload with the phantom orderId
How it works internally:
PhantomEventChaosAspectis an AOP@Aroundadvice that interceptsChaosOrderExecutor.execute(). It lets the use case run completely (DB INSERT + Kafka publish), then throws aPhantomEventSimulationException. Since the exception occurs inside the@Transactionalboundary, Spring rolls back the DB -- butKafkaTemplate.send()already dispatched the event. All chaos beans use@Profile("chaos")and do not exist in the default profile.
The problem: an order is persisted in PostgreSQL, but the order.created event is never published. Downstream consumers never learn the order was created. The client receives HTTP 201 — and has no idea anything went wrong.
This scenario does not need a special endpoint. Stop Redpanda before creating an order:
# 1. Create a baseline order (everything healthy)
curl -s -X POST http://localhost:8082/orders \
-H "Content-Type: application/json" \
-d '{"customerId":"550e8400-e29b-41d4-a716-446655440000","items":[{"productId":"6ba7b810-9dad-11d1-80b4-00c04fd430c8","quantity":1,"unitPrice":50.00}]}'
# → 201 Created — order in DB, event in Kafka ✅
# 2. Kill Kafka
docker-compose stop redpanda
# 3. Create another order — Kafka is down
curl -s -X POST http://localhost:8082/orders \
-H "Content-Type: application/json" \
-d '{"customerId":"aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee","items":[{"productId":"11111111-2222-3333-4444-555555555555","quantity":1,"unitPrice":99.99}]}'
# → 201 Created — but the event was silently lost ⚠️
# 4. Bring Kafka back
docker-compose start redpanda
sleep 10
# 5. Compare: DB has 2 orders, Kafka has only 1 event
docker-compose exec postgres psql -U mars -d orders_db -c \
"SELECT COUNT(*) FROM orders;"
# → 2
docker-compose exec redpanda rpk topic consume order.created \
--format '%v\n' | wc -l
# → 1
Order #2 exists in the database, the client received 201, but there is no corresponding event in Kafka. No downstream consumer knows it exists. No error was returned. The inconsistency is invisible.
In the application logs you will find only a WARN — not an error, not an alert:
WARN DUAL WRITE FAILURE — EVENT LOST for orderId=eebd2af8-...
Order saved in DB but event NOT published to Kafka. Cause: Send failed
That log line is easy to miss. And in a busy production system, it often is.
Both scenarios are caused by the same root issue: no atomicity between PostgreSQL and Kafka.
| Scenario 1: Phantom Event | Scenario 2: Lost Event | |
|---|---|---|
| Trigger | AOP forces DB rollback after publish | Kafka is down during order creation |
| PostgreSQL | Order does NOT exist (rolled back) | Order EXISTS (committed) |
| Kafka | Event EXISTS (already sent) | Event does NOT exist (publish failed) |
| Impact | Consumers process a non-existent order | Consumers never learn the order was created |
| HTTP Response | 500 (DB rolled back by AOP) | 201 — client sees success, event is gone |
| Reproduction | POST /chaos/phantom-event (requires chaos profile) | docker-compose stop redpanda + POST /orders |
| Fix | Transactional Outbox Pattern | Transactional Outbox Pattern |
Both failures are silent in production. No errors in the logs, no alerts, no retries. The system continues operating with inconsistent state between the database and the message broker.
Want to reproduce these scenarios on your machine? The repository is open: mars-enterprise-kit-lite. Five minutes to see the Dual Write breaking for real.
The entire stack runs with Docker Compose. Clone the repository and you are three commands away from a running system:
# 1. Start infrastructure (PostgreSQL 16 + Redpanda)
docker-compose up -d
# 2. Build the project
mvn clean install
# 3. Run the application
mvn spring-boot:run
Create an order:
curl -X POST http://localhost:8082/orders \
-H "Content-Type: application/json" \
-d '{
"customerId": "550e8400-e29b-41d4-a716-446655440000",
"items": [
{
"productId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"quantity": 2,
"unitPrice": 149.95
}
]
}'
# Response: 201 Created
# { "orderId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }
The Redpanda Console at http://localhost:8888 lets you inspect Kafka topics, see the order.created event, and verify the payload.
Redpanda is a Kafka-compatible streaming platform that runs without ZooKeeper or a JVM. It implements the Kafka protocol natively, so your application code does not change -- only the broker does.
| Component | Technology | Version |
|---|---|---|
| Language | Java | 25 |
| Framework | Spring Boot | 4.0.3 |
| Build Tool | Maven | Single-module |
| Database | PostgreSQL | 16 (alpine) |
| Messaging | Redpanda (Kafka-compatible) | v24.3.1 |
| Event Format | JSON (Jackson) | - |
| ORM | Spring Data JPA | - |
| Schema Management | Flyway | - |
| Testing | JUnit 5, Mockito, TestContainers, REST Assured | - |
If you find this useful, drop a star on the repository — it helps other developers discover the project.
Here is where things get interesting.
The project was designed from the start to be operated by an AI agent. Not as an afterthought -- as a first-class design constraint.
The CLAUDE.md file is not just documentation. It is a prompt disguised as a README. It tells Claude Code the architecture rules, dependency directions, domain invariants, naming conventions, and how to run the project end-to-end.
The .mars/docs/ directory contains the AI knowledge base -- architecture decision records, module responsibilities, and coding conventions.
Custom Claude Code commands and skills extend the workflow:
/generate-prp -- generates a Product Requirements Prompt for a new feature, based on the existing architecture/execute-prp -- implements the feature following TDD (Red, Green, Refactor)chaos-phantom-event -- runs the Phantom Event scenario end-to-end: starts the app with the chaos profile, calls POST /chaos/phantom-event, and verifies the event exists in Kafka but the order does not exist in PostgreSQLchaos-testing -- runs the Lost Event scenario: stops Redpanda, creates an order, and verifies the order exists in the database but the event was lost in KafkaThe AI can spin up the entire environment, create an order via REST, verify the Kafka event, trigger a cancellation through the consumer, and validate the final state. A full end-to-end smoke test:
1. docker compose up -d (PostgreSQL 16 + Redpanda)
2. Wait for services to be healthy
3. Verify Kafka topics exist (order.created + order.cancelled)
4. POST /orders -> validate 201 Created + orderId
5. Consume order.created event -> validate orderId matches
6. Publish order.cancelled -> { orderId, reason: "smoke-test" }
7. GET /orders/{orderId} -> validate status = CANCELLED
8. docker compose down
9. Report PASS or FAIL with logs
Beyond the smoke test, Claude Code runs the chaos skills autonomously:
# Phantom Event -- proves Kafka has an event for an order that does not exist in the DB
Run the chaos-phantom-event skill
# Lost Event -- proves the DB has an order with no event in Kafka
Run the chaos-testing skill with scenario lost-event
But the design is not about replacing developers. When you structure a project so an AI can operate it, you are forced to make things explicit. Architecture rules live in a document, not in someone's head. Conventions are written down, not tribal knowledge. The test sequence is a script, not a mental checklist.
This benefits every developer on the team, not just the AI. The CLAUDE.md doubles as onboarding documentation. The smoke test sequence is the acceptance criteria for "the system works."
AI-First design is Context Engineering applied to development infrastructure.
After building this project and several other event-driven systems, here is what I keep coming back to:
You just saw the problem breaking. Phantom events, lost events, silent inconsistency. In dev, this is an exercise. In production, this is a Friday night with your phone ringing.
The Dual Write problem in this project is intentional. I left it there so you can see it, understand it, and feel why it matters. In a production system, you would never ship this without a solution.
The Transactional Outbox Pattern solves the Dual Write problem by writing the event to an outbox table within the same database transaction as the business data. A separate process polls the outbox and publishes events to the message broker. Because the business write and the event write share a single transaction, atomicity is guaranteed.
The Mars Enterprise Kit Pro solves it with the Transactional Outbox Pattern implemented end-to-end — three fully-built services (Order, Inventory, Payment), ArchUnit-enforced boundaries, and a CI pipeline ready on day one.
You need to understand the problem before you appreciate the solution. That is why the Lite comes first.
| Feature | Lite (Free) | Enterprise Kit Pro |
|---|---|---|
| Kafka + PostgreSQL | Yes | Yes |
| AI-First design | Yes | Yes |
| 3 reference services (Order, Inventory, Payment) | No | Yes |
| Transactional Outbox Pattern | No | Yes |
| ArchUnit architecture enforcement | No | Yes |
| 21 Architecture Decision Records | No | Yes |
| GitHub Actions CI pipeline | No | Yes |
| SAGA orchestration | No | Planned (Phase 1) |
| OpenTelemetry observability | No | Planned (Phase 2) |
| Helm / Kubernetes | No | Planned (Phase 4) |
| Production-Ready | No | Yes |
The Lite teaches you the problem. The Pro gives you the solution. See what changes →
Event-Driven Architecture is not a theoretical exercise. It is a set of tradeoffs that show up in real code, in real failure modes, in real production incidents.
The Mars Enterprise Kit Lite gives you a working codebase to explore those tradeoffs. Clone it, run it, break it. And now you can prove the problem on your own machine: fire POST /chaos/phantom-event and watch the ghost event appear in Kafka, or stop Redpanda and watch the event disappear. This is not theory -- it is real inconsistency you can observe, debug, and understand.
Read the domain layer and notice the absence of frameworks. Trace the Dual Write through the code. Run the chaos tests. Then look at how the Transactional Outbox Pattern eliminates that gap.
Mars Enterprise Kit Lite is free, open-source, and runs on your machine in 5 minutes. Clone it, start Docker Compose, run the chaos tests, and watch the Dual Write breaking for real.
github.com/andrelucasti/mars-enterprise-kit-lite
Read the CLAUDE.md and let Claude Code reproduce the Dual Write failures for you.
If this project helped you understand the Dual Write problem, drop a star on the repo. It costs nothing and helps other developers find this content.
Mars Enterprise Kit Pro solves the Dual Write with the Transactional Outbox Pattern implemented end-to-end — three production-grade services, ArchUnit-enforced Onion Architecture, and a CI pipeline ready from day one.
Discover Mars Enterprise Kit Pro →
If you have questions or want to discuss event-driven patterns, feel free to connect on LinkedIn.
