Practical CI: kumo for AWS Integration Tests

Set up kumo as a lightweight AWS emulator in CI to run deterministic S3, SQS, DynamoDB and Lambda tests with tips for isolation and speed.

When your CI pipeline runs integration tests against S3, SQS, DynamoDB and Lambda, you want two things: fidelity and speed. kumo — a single-binary AWS emulator — balances both. It's lightweight, starts fast, and can run deterministically in CI without AWS credentials. This guide walks through a production-ready CI setup that uses kumo as an AWS emulator, plus practical tips for data persistence, test isolation, and speeding up feedback loops.

Why use an emulator like kumo?

Testing against real AWS in every CI job is slow, costly and brittle. Emulators let you validate integration logic reliably without touching production cloud resources. kumo is particularly well-suited for CI because it:

is a single binary (or container) that’s easy to distribute,
requires no authentication (great for isolated CI runners),
starts quickly and uses minimal resources, and
can persist state via KUMO_DATA_DIR so you can snapshot environments.

It supports a broad set of services, but in this article we focus on S3, SQS, DynamoDB and Lambda — the services commonly used in backend integrations.

Core concepts for a production-ready setup

Before configuring CI, decide on a few cross-cutting rules that keep tests deterministic and fast:

Environment parity: point SDKs to kumo endpoints rather than real AWS in test environments.
Test isolation: ensure parallel jobs cannot interfere by using unique prefixes, account IDs or per-job data directories.
Data persistence policy: choose whether to persist data between runs (fast warm start) or start clean (deterministic tests).
Warm vs cold server handling: decide if CI should reuse a kumo service across jobs or recreate it per job for strict isolation.

Quick start: running kumo locally and in CI

kumo can run either as a native binary or inside Docker. In CI the container approach is convenient:

docker run -d --name kumo -p 4566:4566 -v $PWD/kumo-data:/var/lib/kumo -e KUMO_DATA_DIR=/var/lib/kumo ghcr.io/sivchari/kumo:latest

Notes:

By default kumo exposes AWS-compatible endpoints (commonly on port 4566). Adjust the port if yours differs.
KUMO_DATA_DIR enables optional data persistence — see the persistence section below.

Point your SDKs at kumo

Most SDKs accept a custom endpoint. For example, with the AWS CLI or SDKs you can use --endpoint-url (CLI) or pass an endpoint parameter when creating a client. Example commands to create an S3 bucket and DynamoDB table against kumo:

AWS_ENDPOINT=http://localhost:4566
aws --endpoint-url $AWS_ENDPOINT s3 mb s3://test-bucket
aws --endpoint-url $AWS_ENDPOINT dynamodb create-table --table-name Users --attribute-definitions AttributeName=id,AttributeType=S --key-schema AttributeName=id,KeyType=HASH --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

CI examples: GitHub Actions and GitLab CI

Below are compact job definitions that start kumo, run integration tests, and tear it down. They include health checks and optional data persistence.

GitHub Actions (job snippet)

jobs:
  integration:
    runs-on: ubuntu-latest
    services:
      kumo:
        image: ghcr.io/sivchari/kumo:latest
        ports:
          - 4566:4566
        options: >-
          --user 1000:1000
          -v ${{ runner.temp }}/kumo-data:/var/lib/kumo
        env:
          KUMO_DATA_DIR: /var/lib/kumo
    steps:
      - uses: actions/checkout@v4
      - name: Wait for kumo
        run: |
          until curl -sS http://localhost:4566/ >/dev/null; do sleep 0.2; done
      - name: Run integration tests
        env:
          AWS_ENDPOINT: http://localhost:4566
        run: make test-integration

GitLab CI (job snippet)

integration:
  image: docker:20
  services:
    - name: ghcr.io/sivchari/kumo:latest
      alias: kumo
  variables:
    KUMO_DATA_DIR: /var/lib/kumo
  script:
    - >
      until nc -z kumo 4566; do sleep 0.2; done
    - AWS_ENDPOINT=http://kumo:4566 make test-integration

Test isolation strategies

To avoid flaky tests caused by cross-test interference, use one or more of these techniques:

1. Per-job data directories

Set KUMO_DATA_DIR to a unique directory per CI job (or mount an ephemeral volume). This guarantees a clean state or allows per-job snapshots.

2. Resource prefixes

Generate a unique prefix (UUID or CI run ID) and prepend it to every resource name: users-, queue-. This enables parallelization across matrix jobs.

3. Namespaces or simulated account IDs

When services support multi-tenant keys or accounts, use a synthetic account ID per job. Even when they don't, logical namespaces combined with tags help locate and clean resources after tests.

4. Test fixtures and teardown

Implement deterministic fixtures (seed data) and rigorous teardown steps. A teardown script that removes anything matching the test prefix avoids accumulating orphaned resources in persisted directories.

Data persistence: choosing between speed and determinism

kumo's KUMO_DATA_DIR option can make tests faster by avoiding cold initialization, but it introduces statefulness. Choose a policy:

Fresh every job — safest; start with an empty data dir. Makes tests deterministic at the cost of longer warm-up.
Persisted snapshots — snapshot a seeded environment (tables, buckets) and restore it into KUMO_DATA_DIR for subsequent jobs. This reduces setup time while keeping predictable base state.
Hybrid — persist only read-only fixtures (like static reference data); recreate mutable resources per test run.

Practical approach: create a job that runs once per commit or nightly to rebuild a canonical snapshot. Store that snapshot as an artifact or container image layer and mount it in PR jobs for fast startup.

Making Lambda tests deterministic

Testing Lambda integrations can be tricky. kumo supports Lambda emulation — you can pre-deploy functions into kumo and invoke them from your application components just like in real AWS. Tips:

Build and package function artifacts during CI and upload them to kumo's Lambda service as part of the job setup.
Use deterministic timestamps or stub time sources in Lambda code so test assertions don't rely on wall-clock time.
Invoke Lambdas synchronously when asserting immediate results, or use SQS/SNS event assertions for asynchronous flows.

Speeding up feedback loops

The faster your CI runs, the sooner developers get feedback. Use these techniques to accelerate integration tests that rely on kumo:

Parallelize tests: split test suites into independent shards that each use unique prefixes or data directories.
Warm snapshots: reuse a pre-seeded kumo data snapshot to skip heavy setup steps.
Keep expensive setup out of PR runs: run a smaller smoke integration suite on PRs and the full suite on nightly or merge-to-main pipelines.
Run kumo as a long-lived service on self-hosted runners: if you control CI runners, keep kumo warmed between jobs to avoid startup overhead.
Cache artifacts: cache compiled Lambda artifacts or pre-built container layers that your tests rely on.
Use fast assertions: avoid overly broad retries. Make tests fail fast on genuine faults.

Debugging and observability

Even in CI you should be able to inspect kumo state. Suggestions:

Expose debug endpoints or use the SDK to list resources and dump state when tests fail.
Record network traces or SDK request logs for failing tests.
Store kumo data snapshots when a job fails so you can reproduce locally.

Comparing kumo to localstack and other alternatives

kumo and LocalStack share the goal of emulating AWS locally, but different tradeoffs apply:

kumo: single binary, lightweight, fast startup, no auth required — good for CI and constrained runners.
LocalStack: feature-rich and widely used; heavier and often requires more resources and configuration in CI.

If you want a minimal, fast local AWS emulator or a localstack alternative for CI, kumo is worth trying. It pairs well with languages and SDKs that support custom endpoint configuration. Be aware of service coverage and edge-case differences from real AWS — always include some tests that run against real AWS periodically (e.g., nightly) to catch provider-specific behavior.

For related patterns around building reliable scraping and rate-limited systems, see Understanding Rate-Limiting Techniques in Modern Web Scraping, and for organizational approaches to developer policy, check From Compliance to Creativity: How Developers Can Innovate within AI Bot Limits.

Actionable checklist

Use this checklist to adopt kumo in your CI pipeline:

Decide on persistence policy: fresh vs snapshot vs hybrid.
Update CI job to run kumo as a service (container or binary) and wait for readiness.
Ensure all SDK clients in tests are pointed to the kumo endpoint.
Implement per-job isolation (prefixes or per-job KUMO_DATA_DIR).
Seed deterministic fixtures or create a snapshot image for fast startup.
Parallelize tests safely using prefixes or namespaces.
Add observability: resource dumps and logs on failures.
Run full suite against real AWS on a schedule (e.g., nightly) to validate provider differences.

Conclusion

kumo offers a pragmatic path to deterministic, fast integration tests in CI. Its single-binary nature and optional data persistence make it a powerful alternative to heavier emulators. With clear isolation strategies, snapshotting, and a staged testing approach (fast PR smoke tests + thorough nightly runs), you can get the best of both speed and confidence. Implement the patterns above and your team will ship integration-safe changes more quickly.