Testing with Real Databases

Why Mocks Fall Short and Containers Save the Day

๐Ÿ“Š

Key Facts

30s
Disk-backed tests
<5s
tmpfs-backed tests
0
Shared state between tests
100%
Schema reproducibility

Testing with Real Databases: Why Mocks Fall Short and Containers Save the Day

You've written a service that processes customer orders. It validates input, applies business rules, persists data to PostgreSQL, and publishes events. The logic is clean, the code compiles, and now you need to prove it works.

So you write a test. And immediately hit the question every team hits: what do we do about the database?

The Mocking Trap

The instinct is to mock it. Abstract the database behind an interface, inject a fake, and test the logic in isolation. It sounds clean. In practice, it's a trap.

Consider an order service that creates an order, checks inventory, and updates stock levels — all within a transaction. A mock can verify that the right methods were called in the right order. What it can't verify:

Mocks test your assumptions about the database. They don't test the database.

There's a more subtle problem too. Mocks couple your tests to implementation details. Refactor the repository layer — change how queries are batched, switch from individual inserts to a CTE — and every mock breaks, even though the behaviour is identical. You end up spending more time maintaining mocks than writing features.

This doesn't mean mocks are useless. They're excellent for isolating external services — an EventBridge publisher, an S3 client, a third-party API. But for the data layer that your business logic fundamentally depends on? You need the real thing.

What the Database Actually Does for You

The mocking problem goes deeper than most testing courses acknowledge. A modern relational database isn't just a place to store rows — it's an active participant in your application's correctness. When you mock it away, you're not just skipping storage. You're skipping behaviour.

Consider what PostgreSQL alone brings to the table:

Constraints as business rules. A CHECK (quantity > 0) constraint guarantees no negative quantities exist — ever, regardless of which code path writes the data. A UNIQUE constraint on (tenant_id, email) enforces uniqueness across concurrent requests in a way that application-level checks simply cannot. An EXCLUDE USING gist constraint prevents overlapping date ranges on room bookings — try mocking that.

Triggers and side effects. An audit trigger that logs every update to a history table. A trigger that maintains a denormalised order_total column when line items change. A trigger that prevents deletion of referenced records with a custom error message. These fire inside the database, inside the transaction. A mock never sees them.

Transaction isolation. Two users simultaneously buying the last item in stock. With SERIALIZABLE isolation, the database detects the write skew and aborts one transaction. With READ COMMITTED, your application needs explicit locking. The difference between these is the difference between a correct system and one that oversells inventory under load. No mock reproduces this.

Upsert and conflict handling. INSERT ... ON CONFLICT DO UPDATE behaves differently depending on which unique index is violated. The logic lives in the index definition, not in your application code. Mock the repository and you're testing a fiction.

Partial indexes and query behaviour. A partial index on WHERE status = 'active' changes which rows a query returns when combined with the planner's decisions. Your ORM generates a query, the database optimises it against real indexes and statistics, and the result set depends on that interplay.

Row-level security. Policies that filter rows based on the current session's role or a runtime parameter. The same query returns different results for different users — enforced by the database, invisible to the application layer, and completely absent from any mock.

Every one of these is a correctness mechanism that your production system relies on. Every one of them disappears when you substitute a mock. And the insidious part is that your tests still pass — they just don't prove what you think they prove.

This is the gap that unit testing courses rarely address. They teach you to isolate dependencies, which is sound advice for stateless services and external APIs. But a database is not just a dependency — it's a co-author of your application's behaviour. Testing without it means testing half the system.

The Environment Problem

Running tests against a real database introduces a different problem: environment consistency. "It works on my machine" becomes "it works against my database." Developer A has a schema from last Tuesday. Developer B ran a migration manually. CI has a shared instance that three pipelines are writing to simultaneously.

You need:

This is where containers change everything.

Containers as Test Infrastructure

A Docker container running PostgreSQL gives you a disposable, reproducible database that starts in seconds. Every test run gets a pristine instance. No shared state, no drift, no "did someone drop that table?"

A minimal docker-compose.yml is all it takes:

services:
  postgres:
    image: postgres:17
    environment:
      POSTGRES_PASSWORD: postgres
    ports:
      - "5432:5432"
    tmpfs:
      - /var/lib/postgresql/data

The tmpfs mount is a small but important detail — it puts the data directory in memory, which makes the database significantly faster for test workloads where durability doesn't matter. Tests that took 30 seconds with disk-backed storage often finish in under 5.

In CI, this container starts as a service alongside your build. Locally, developers run it in the background. Either way, the database is identical. The schema comes from the same SQL files. The fixtures come from the same test data. The environment is the code.

From Container to Test Lifecycle

Having a clean container is necessary but not sufficient. You still need to:

  1. Create the test database
  2. Apply the schema
  3. Load fixture data
  4. Run the tests
  5. Clean up

Doing this manually — or with shell scripts that run before the test binary — is fragile. The schema step fails silently, the test runs against stale tables, and you spend an hour debugging a test that was never actually testing what you thought.

What you want is the test framework itself managing the database lifecycle. The setup happens as part of the test run, not before it. If a SQL file fails to apply, the test fails with a clear error pointing at the file and the statement. No ambiguity.

This is the approach behind the Catch2 SQL Bootstrap library. It registers as a Catch2 event listener and hooks into the test lifecycle:

The SQL files live in your project, version-controlled alongside the code:

order-service/
โ”œโ”€โ”€ SQL/
โ”‚   โ”œโ”€โ”€ 001 - schema.sql
โ”‚   โ””โ”€โ”€ 002 - tables.sql
โ””โ”€โ”€ tests/
    โ”œโ”€โ”€ SQL/
    โ”‚   โ””โ”€โ”€ 001 - base-data.sql
    โ””โ”€โ”€ order_service_tests.cpp

Files are sorted lexicographically and executed in order. The convention is simple: SQL/ for schema, tests/SQL/ for fixtures. Numeric prefixes control execution order. That's it.

Testing Business Behaviour, Not SQL

With the database lifecycle handled, tests can focus on what matters: does the service behave correctly?

Here's what this looks like in practice. Imagine an order service with a create_order method that validates the request, persists the order, deducts inventory, and returns the created order. The test doesn't execute SQL directly — it calls the service and asserts on the result:

SCENARIO("Creating orders", "[db-rebuild]") {

    GIVEN("A product with 10 units in stock") {
        order_service service;
        service.init();

        WHEN("Creating a valid order for 3 units") {
            auto result = service.create_order({"WIDGET-001", 3});

            THEN("The order is created successfully") {
                REQUIRE(result.has_value());
                REQUIRE(result->product_id() == "WIDGET-001");
                REQUIRE(result->quantity() == 3);
                REQUIRE(result->status() == "confirmed");
            }

            THEN("The stock is reduced") {
                auto stock = service.get_stock("WIDGET-001");
                REQUIRE(stock.available() == 7);
            }
        }

        WHEN("Creating an order that exceeds available stock") {
            auto result = service.create_order({"WIDGET-001", 50});

            THEN("The order is rejected") {
                REQUIRE_FALSE(result.has_value());
                REQUIRE(result.error() == "insufficient_stock");
            }

            THEN("The stock is unchanged") {
                auto stock = service.get_stock("WIDGET-001");
                REQUIRE(stock.available() == 10);
            }
        }
    }
}

Notice what's happening here. The [db-rebuild] tag tells the framework to rebuild the database for each GIVEN section. The GIVEN("A product with 10 units in stock") block starts with a clean database where the fixture SQL has already inserted a product with 10 units. Each WHEN branch tests a different path through the service. Each THEN asserts on business outcomes — order status, stock levels — not on row counts or SQL results.

The fixture data that primes the database lives in a subdirectory matching the scenario name:

tests/
โ””โ”€โ”€ SQL/
    โ”œโ”€โ”€ 001 - base-data.sql
    โ””โ”€โ”€ Creating orders/
        โ””โ”€โ”€ A product with 10 units in stock/
            โ””โ”€โ”€ 001 - inventory.sql
-- 001 - inventory.sql
INSERT INTO inventory.product_t(product_id, name, available)
VALUES('WIDGET-001', 'Standard Widget', 10);

The framework loads these automatically — base fixtures first, then scenario-level, then given-level. No setup code in the test. No teardown. The test reads like a specification.

Isolation Without Compromise

The BDD structure maps naturally to database state. Each GIVEN describes a precondition — "a product with 10 units in stock", "an empty orders table", "a customer with an expired subscription". The framework rebuilds the database for each one, so preconditions are always exact.

This solves the classic integration test problem: test ordering. When tests share a database, the order they run in matters. Test A inserts a row, test B assumes the table is empty, test B fails — but only when run after test A. These failures are intermittent, maddening, and waste hours.

With a per-test-case rebuild, every test starts from a known state. Run them in any order. Run them in parallel across different databases. Run a single test in isolation. The result is always the same.

The Workflow in Practice

A typical development cycle looks like this:

  1. Write the schemaSQL/001 - schema.sql defines your tables, constraints, indexes
  2. Write base fixturestests/SQL/001 - base-data.sql inserts reference data that most tests need
  3. Write scenario fixturestests/SQL/<scenario>/<given>/001 - setup.sql primes specific preconditions
  4. Write the test — BDD-style, calling service methods, asserting on business outcomes
  5. Run it — the framework handles everything else

When the schema changes, you update the SQL files. The next test run picks up the changes automatically. When a fixture needs updating, it's a SQL file in version control with a clear diff. When a test fails, the error message tells you which SQL file failed and what the database said — no guessing.

When to Use This Approach

This pattern works best when:

It's not the right choice for everything. Pure domain logic with no persistence? Unit test it with plain objects. External API integrations? Mock them. But for the service layer that sits between your domain and your database — the layer where most bugs actually live — testing against a real database with automated lifecycle management is the most reliable approach there is.

Key Takeaways

The database is not something to be mocked away. It's a critical part of your system, and your tests should exercise it. The trick is making that exercise painless — and with containers and automated lifecycle management, it is.

โ† Back to Insights