Event-Driven Architecture in Practice: Spring Boot, Kafka, and PostgreSQL

From Theory to Working Code

In a previous article, I explored Event-Driven Architecture from the conceptual side -- events vs. commands, Event Sourcing, CQRS, Saga patterns, and the cultural shift required to move from request-response thinking to event-driven thinking.

Theory is essential. But theory without code is a PowerPoint.

This post is about that transition. I built an open-source microservice called Mars Enterprise Kit Lite that implements EDA with Java 25, Spring Boot 4.0, Kafka (via Redpanda), and PostgreSQL 16. The project is real, it runs, and you can clone it right now.

But here is the twist: the project has a deliberate flaw. It implements the Dual Write anti-pattern -- and I left it there on purpose.

Understanding the problem is the first step to solving it.

What Is the Dual Write Problem in Event-Driven Systems?

Here is a scenario most of us have faced.

A customer places an order. Your service saves it to PostgreSQL. Then it publishes an order.created event to Kafka so downstream services can react -- inventory, billing, notifications.

Two writes. Two systems. One operation.

What happens when the database commit succeeds but the Kafka publish fails? The order exists in the database. No event was published. Downstream consumers never learn about it. Inventory is never reserved. The billing service never charges.

sequenceDiagram
    participant HR as HTTP Request
    participant OS as Order Service
    participant DB as Database
    participant K as Kafka

    HR->>OS: POST /orders
    OS->>DB: INSERT order
    DB-->>OS: OK
    OS-xK: publish("order.created") ❌
    Note over OS,K: FAILURE — event never delivered

The reverse is equally dangerous: Kafka receives the event, but the database transaction rolls back. Now downstream consumers act on an order that was never persisted.

No retry. No compensation. Silent inconsistency.

It works... until it doesn't.

The Dual Write problem occurs when a service writes to two separate systems -- such as a database and a message broker -- without atomic guarantees across both. If the first write succeeds but the second fails, the systems become silently inconsistent.

A @Transactional annotation covers the database. But Kafka lives outside that transactional boundary. There is no native atomicity across both. This gap is what the Mars Enterprise Kit Lite project exposes -- intentionally.

I built this project to expose this flaw on purpose. Understanding the problem is the first step to solving it. Later in this article, I show how the Transactional Outbox Pattern solves this — and how the Mars Enterprise Kit Pro implements that pattern end-to-end.

How the Dual Write Problem Appears in Spring Boot and Kafka

Now let's look at the consistency gap more closely. Here is the actual CreateOrderUseCase — the exact code running in this project:

// domain/usecase/CreateOrderUseCase.java
@Service
public class CreateOrderUseCase {

    @Transactional
    public UUID execute(final Input input) {
        var result = Order.create(input.customerId(), input.items());
        orderRepository.save(result.domain());

        // ⚠️ DUAL WRITE: no atomicity guarantee between DB and Kafka.
        // If publish fails, the order exists in PostgreSQL but the event is silently lost.
        try {
            orderEventPublisher.publish(result.event());
        } catch (Exception e) {
            log.warn("DUAL WRITE FAILURE — EVENT LOST for orderId={}. " +
                     "Order saved in DB but event NOT published to Kafka. Cause: {}",
                    result.domain().id(), e.getMessage());
        }

        return result.domain().id();
    }
}

The @Transactional annotation wraps the database operation. Kafka lives outside that transactional boundary — there is no native atomicity across both.

Here is the exact timeline when Kafka goes down:

t=0ms   -> POST /orders arrives
t=1ms   -> @Transactional begins
t=3ms   -> orderRepository.save(order)     [DB INSERT, within transaction]
t=5ms   -> orderEventPublisher.publish()   [Kafka send, OUTSIDE transaction — throws]
t=5ms   -> catch(Exception e)              [exception swallowed, WARN logged]
t=6ms   -> @Transactional commits          [DB commit succeeds]
t=6ms   -> HTTP 201 returned to client     [client sees SUCCESS]

Result: Order EXISTS in PostgreSQL. Event does NOT exist in Kafka.
        The client got 201. Nobody knows the event was lost.

The Catch That Hides the Problem

This pattern appears constantly in production code:

try {
    orderEventPublisher.publish(result.event());
} catch (Exception e) {
    log.warn("Failed to publish event: {}", e.getMessage());
}

It looks defensive. It looks resilient. It is actually hiding a data consistency failure.

The database committed. The client received 201 Created. The WARN log fires and gets ignored in the noise of a busy system. Downstream consumers never receive the event. Inventory is never reserved. Billing never charges. The order exists in the database and nowhere else in the system.

Two failure modes from the same root cause:

DB succeeds, Kafka fails — The exception is swallowed by the catch block. The DB commits normally. The client receives HTTP 201 with no indication that anything went wrong. Event silently lost. Downstream services never know.
Kafka succeeds, DB rolls back — Event published, order never saved. Consumers act on a phantom order.

The code looks correct. It compiles. It passes unit tests. It works in dev. It works... until it doesn't.

This is why the flaw is intentional. You need to see it to understand why patterns like the Transactional Outbox exist.

Why JPA Makes This Worse Than You Think

There's another dimension to this problem that most developers miss. The timeline above assumes orderRepository.save() fires the SQL immediately. It doesn't.

JPA defers the INSERT to flush time — which happens just before the transaction commits. This creates a subtle but critical execution order:

t=0ms   -> @Transactional begins
t=1ms   -> orderRepository.save(order)     [JPA queues INSERT — NO SQL yet]
t=3ms   -> orderEventPublisher.publish()   [KafkaTemplate.send() dispatches IMMEDIATELY]
t=4ms   -> Kafka broker receives event ✅
t=5ms   -> @Transactional prepares to commit
t=5ms   -> JPA flushes → SQL INSERT fires
t=6ms   -> PostgreSQL evaluates constraints
t=6ms   -> CONSTRAINT VIOLATION → ROLLBACK

Result: Event EXISTS in Kafka. Order does NOT exist in PostgreSQL.
        No chaos endpoint needed. No PhantomEventChaosAspect.
        This is a natural Phantom Event.

The most common real-world trigger: a client retries a request and you add an idempotency_key constraint to prevent duplicate orders.

-- V2__add_idempotency_key.sql
ALTER TABLE orders ADD COLUMN idempotency_key VARCHAR(255);
CREATE UNIQUE INDEX orders_idempotency_key_idx ON orders(idempotency_key);

// Client sends the same request twice (network retry, button double-click, etc.)
// First request: succeeds normally.
// Second request (same idempotency_key):
//   1. @Transactional begins
//   2. orderRepository.save(order)     <- JPA queues INSERT, no SQL yet
//   3. orderEventPublisher.publish()   <- Kafka receives the event ✅
//   4. JPA flush fires INSERT
//   5. PostgreSQL: UNIQUE VIOLATION on idempotency_key -> ROLLBACK
//
// Result: Kafka has a duplicate event. PostgreSQL has no duplicate order.
// Downstream consumers process an order that does not exist.

No AOP. No chaos profile. Just a constraint, a retry, and JPA's normal flush behavior.

Proving the Dual Write in Practice: Chaos Testing

Talking about failure is one thing. Watching it happen is another.

The project includes two chaos testing scenarios you can run to see the Dual Write breaking on your machine. This is not a simulation -- it is real inconsistency between PostgreSQL and Kafka.

Scenario 1: Phantom Event (Ghost Event in Kafka)

The problem: an order.created event exists in Kafka, but the order does not exist in PostgreSQL. Any consumer processing this event will reference a phantom order.

The project includes a built-in chaos endpoint (POST /chaos/phantom-event) that uses an AOP interceptor to force a DB rollback after the Kafka event has already been published. To activate it, start the application with the chaos profile:

# Start the app with the chaos profile
SPRING_PROFILES_ACTIVE=chaos mvn spring-boot:run

# Trigger the phantom event scenario
curl -s -X POST http://localhost:8082/chaos/phantom-event \
  -H "Content-Type: application/json" \
  -d '{
    "customerId": "550e8400-e29b-41d4-a716-446655440000",
    "items": [
      {"productId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "quantity": 2, "unitPrice": 149.95}
    ]
  }'

Response:

{
  "orderId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "existsInDb": false,
  "eventSentToKafka": true,
  "dbRolledBack": true,
  "explanation": "PHANTOM EVENT: The order.created event was published to Kafka, but the order does NOT exist in PostgreSQL. Any consumer processing this event will reference a non-existent order."
}

Verify it yourself -- the order does not exist in the database, but the event is in Kafka:

# Order does NOT exist in PostgreSQL (rolled back)
docker-compose exec postgres psql -U mars -d orders_db -c \
  "SELECT * FROM orders WHERE id = '<orderId>';"
# → (0 rows)

# Event DOES exist in Kafka
docker-compose exec redpanda rpk topic consume order.created --num 1 --offset end
# → Event payload with the phantom orderId

How it works internally: PhantomEventChaosAspect is an AOP @Around advice that intercepts ChaosOrderExecutor.execute(). It lets the use case run completely (DB INSERT + Kafka publish), then throws a PhantomEventSimulationException. Since the exception occurs inside the @Transactional boundary, Spring rolls back the DB -- but KafkaTemplate.send() already dispatched the event. All chaos beans use @Profile("chaos") and do not exist in the default profile.

Scenario 2: Lost Event (Kafka Down)

The problem: an order is persisted in PostgreSQL, but the order.created event is never published. Downstream consumers never learn the order was created. The client receives HTTP 201 — and has no idea anything went wrong.

This scenario does not need a special endpoint. Stop Redpanda before creating an order:

# 1. Create a baseline order (everything healthy)
curl -s -X POST http://localhost:8082/orders \
  -H "Content-Type: application/json" \
  -d '{"customerId":"550e8400-e29b-41d4-a716-446655440000","items":[{"productId":"6ba7b810-9dad-11d1-80b4-00c04fd430c8","quantity":1,"unitPrice":50.00}]}'
# → 201 Created — order in DB, event in Kafka ✅

# 2. Kill Kafka
docker-compose stop redpanda

# 3. Create another order — Kafka is down
curl -s -X POST http://localhost:8082/orders \
  -H "Content-Type: application/json" \
  -d '{"customerId":"aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee","items":[{"productId":"11111111-2222-3333-4444-555555555555","quantity":1,"unitPrice":99.99}]}'
# → 201 Created — but the event was silently lost ⚠️

# 4. Bring Kafka back
docker-compose start redpanda
sleep 10

# 5. Compare: DB has 2 orders, Kafka has only 1 event
docker-compose exec postgres psql -U mars -d orders_db -c \
  "SELECT COUNT(*) FROM orders;"
# → 2

docker-compose exec redpanda rpk topic consume order.created \
  --format '%v\n' | wc -l
# → 1

Order #2 exists in the database, the client received 201, but there is no corresponding event in Kafka. No downstream consumer knows it exists. No error was returned. The inconsistency is invisible.

In the application logs you will find only a WARN — not an error, not an alert:

WARN DUAL WRITE FAILURE — EVENT LOST for orderId=eebd2af8-...
     Order saved in DB but event NOT published to Kafka. Cause: Send failed

That log line is easy to miss. And in a busy production system, it often is.

Two Faces of the Same Problem

Both scenarios are caused by the same root issue: no atomicity between PostgreSQL and Kafka.

	Scenario 1: Phantom Event	Scenario 2: Lost Event
Trigger	AOP forces DB rollback after publish	Kafka is down during order creation
PostgreSQL	Order does NOT exist (rolled back)	Order EXISTS (committed)
Kafka	Event EXISTS (already sent)	Event does NOT exist (publish failed)
Impact	Consumers process a non-existent order	Consumers never learn the order was created
HTTP Response	500 (DB rolled back by AOP)	201 — client sees success, event is gone
Reproduction	`POST /chaos/phantom-event` (requires `chaos` profile)	`docker-compose stop redpanda` + `POST /orders`
Fix	Transactional Outbox Pattern	Transactional Outbox Pattern

Both failures are silent in production. No errors in the logs, no alerts, no retries. The system continues operating with inconsistent state between the database and the message broker.

Want to reproduce these scenarios on your machine? The repository is open: mars-enterprise-kit-lite. Five minutes to see the Dual Write breaking for real.

Running the Project

The entire stack runs with Docker Compose. Clone the repository and you are three commands away from a running system:

# 1. Start infrastructure (PostgreSQL 16 + Redpanda)
docker-compose up -d

# 2. Build the project
mvn clean install

# 3. Run the application
mvn spring-boot:run

Create an order:

curl -X POST http://localhost:8082/orders \
  -H "Content-Type: application/json" \
  -d '{
    "customerId": "550e8400-e29b-41d4-a716-446655440000",
    "items": [
      {
        "productId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
        "quantity": 2,
        "unitPrice": 149.95
      }
    ]
  }'
# Response: 201 Created
# { "orderId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }

The Redpanda Console at http://localhost:8888 lets you inspect Kafka topics, see the order.created event, and verify the payload.

Redpanda is a Kafka-compatible streaming platform that runs without ZooKeeper or a JVM. It implements the Kafka protocol natively, so your application code does not change -- only the broker does.

Component	Technology	Version
Language	Java	25
Framework	Spring Boot	4.0.3
Build Tool	Maven	Single-module
Database	PostgreSQL	16 (alpine)
Messaging	Redpanda (Kafka-compatible)	v24.3.1
Event Format	JSON (Jackson)	-
ORM	Spring Data JPA	-
Schema Management	Flyway	-
Testing	JUnit 5, Mockito, TestContainers, REST Assured	-

If you find this useful, drop a star on the repository — it helps other developers discover the project.

AI-First Design: CLAUDE.md as a Prompt

Here is where things get interesting.

The project was designed from the start to be operated by an AI agent. Not as an afterthought -- as a first-class design constraint.

The CLAUDE.md file is not just documentation. It is a prompt disguised as a README. It tells Claude Code the architecture rules, dependency directions, domain invariants, naming conventions, and how to run the project end-to-end.

The .mars/docs/ directory contains the AI knowledge base -- architecture decision records, module responsibilities, and coding conventions.

Custom Claude Code commands and skills extend the workflow:

/generate-prp -- generates a Product Requirements Prompt for a new feature, based on the existing architecture
/execute-prp -- implements the feature following TDD (Red, Green, Refactor)
chaos-phantom-event -- runs the Phantom Event scenario end-to-end: starts the app with the chaos profile, calls POST /chaos/phantom-event, and verifies the event exists in Kafka but the order does not exist in PostgreSQL
chaos-testing -- runs the Lost Event scenario: stops Redpanda, creates an order, and verifies the order exists in the database but the event was lost in Kafka

The AI can spin up the entire environment, create an order via REST, verify the Kafka event, trigger a cancellation through the consumer, and validate the final state. A full end-to-end smoke test:

1. docker compose up -d          (PostgreSQL 16 + Redpanda)
2. Wait for services to be healthy
3. Verify Kafka topics exist     (order.created + order.cancelled)
4. POST /orders                  -> validate 201 Created + orderId
5. Consume order.created event   -> validate orderId matches
6. Publish order.cancelled       -> { orderId, reason: "smoke-test" }
7. GET /orders/{orderId}         -> validate status = CANCELLED
8. docker compose down
9. Report PASS or FAIL with logs

Beyond the smoke test, Claude Code runs the chaos skills autonomously:

# Phantom Event -- proves Kafka has an event for an order that does not exist in the DB
Run the chaos-phantom-event skill

# Lost Event -- proves the DB has an order with no event in Kafka
Run the chaos-testing skill with scenario lost-event

But the design is not about replacing developers. When you structure a project so an AI can operate it, you are forced to make things explicit. Architecture rules live in a document, not in someone's head. Conventions are written down, not tribal knowledge. The test sequence is a script, not a mental checklist.

This benefits every developer on the team, not just the AI. The CLAUDE.md doubles as onboarding documentation. The smoke test sequence is the acceptance criteria for "the system works."

AI-First design is Context Engineering applied to development infrastructure.

Best Practices

After building this project and several other event-driven systems, here is what I keep coming back to:

Make the Dual Write problem visible. If your system writes to a database and a message broker in the same operation, acknowledge it. Document it. Plan for the failure modes. Better yet, build chaos tests that reproduce the failures -- a phantom event endpoint or a Kafka shutdown script makes the problem tangible for the whole team.

Solving the Dual Write Problem with Transactional Outbox

You just saw the problem breaking. Phantom events, lost events, silent inconsistency. In dev, this is an exercise. In production, this is a Friday night with your phone ringing.

The Dual Write problem in this project is intentional. I left it there so you can see it, understand it, and feel why it matters. In a production system, you would never ship this without a solution.

The Transactional Outbox Pattern solves the Dual Write problem by writing the event to an outbox table within the same database transaction as the business data. A separate process polls the outbox and publishes events to the message broker. Because the business write and the event write share a single transaction, atomicity is guaranteed.

The Mars Enterprise Kit Pro solves it with the Transactional Outbox Pattern implemented end-to-end — three fully-built services (Order, Inventory, Payment), ArchUnit-enforced boundaries, and a CI pipeline ready on day one.

You need to understand the problem before you appreciate the solution. That is why the Lite comes first.

Feature	Lite (Free)	Enterprise Kit Pro
Kafka + PostgreSQL	Yes	Yes
AI-First design	Yes	Yes
3 reference services (Order, Inventory, Payment)	No	Yes
Transactional Outbox Pattern	No	Yes
ArchUnit architecture enforcement	No	Yes
21 Architecture Decision Records	No	Yes
GitHub Actions CI pipeline	No	Yes
SAGA orchestration	No	Planned (Phase 1)
OpenTelemetry observability	No	Planned (Phase 2)
Helm / Kubernetes	No	Planned (Phase 4)
Production-Ready	No	Yes

The Lite teaches you the problem. The Pro gives you the solution. See what changes →

Conclusion

Event-Driven Architecture is not a theoretical exercise. It is a set of tradeoffs that show up in real code, in real failure modes, in real production incidents.

The Mars Enterprise Kit Lite gives you a working codebase to explore those tradeoffs. Clone it, run it, break it. And now you can prove the problem on your own machine: fire POST /chaos/phantom-event and watch the ghost event appear in Kafka, or stop Redpanda and watch the event disappear. This is not theory -- it is real inconsistency you can observe, debug, and understand.

Read the domain layer and notice the absence of frameworks. Trace the Dual Write through the code. Run the chaos tests. Then look at how the Transactional Outbox Pattern eliminates that gap.

Want to try it? Clone and break things.

Mars Enterprise Kit Lite is free, open-source, and runs on your machine in 5 minutes. Clone it, start Docker Compose, run the chaos tests, and watch the Dual Write breaking for real.

github.com/andrelucasti/mars-enterprise-kit-lite

Read the CLAUDE.md and let Claude Code reproduce the Dual Write failures for you.

If this project helped you understand the Dual Write problem, drop a star on the repo. It costs nothing and helps other developers find this content.

Need this in production?

Mars Enterprise Kit Pro solves the Dual Write with the Transactional Outbox Pattern implemented end-to-end — three production-grade services, ArchUnit-enforced Onion Architecture, and a CI pipeline ready from day one.

Discover Mars Enterprise Kit Pro →

If you have questions or want to discuss event-driven patterns, feel free to connect on LinkedIn.

References

Tags: