Using AWS Data Migration Service for Near–Real-Time Redshift, Part 1
Stop batching, start streaming
On nearly every consulting engagement, I’m asked to help design or improve a data warehouse. For early-stage startups running on AWS, that usually means Redshift.
The next question is predictable: how do we sync production data — typically from RDS Postgres — into Redshift?
Most teams reach for tools like Fivetran, Stitch, or Airbyte. These tools have their place. But if you’re already operating inside AWS and replicating from RDS to Redshift, there’s a built-in service that’s often overlooked:
AWS Database Migration Service (DMS).
Used correctly, DMS can keep your warehouse within seconds of production — without introducing another vendor or relying on overnight batch jobs.
Under normal load, changes in Postgres can appear in Redshift 10–30 seconds later.
That changes how you work with data.
The Problem with Batch Thinking
In most organizations, warehouse data is at least a day behind reality.
Dashboards reflect “yesterday.” Reports are scheduled overnight. Stakeholders are trained to accept that delay as normal.
But that delay becomes painfully visible when you’re iterating quickly.
If you ship a change to onboarding at 10:00am, do you really want to wait until tomorrow to see the impact? If something breaks in production, do you want to wait for the next batch to understand what happened?
Near–real-time replication removes that blind spot.
It allows you to:
- Explore data in Redshift without granting access to production.
- Avoid runaway queries impacting your primary database.
- Measure product changes minutes after they ship.
- Debug operational issues using current data instead of stale snapshots.
We’ve become so conditioned to batch processing that real-time analytics can feel exotic. It isn’t. The primitives have been there for years.
What Is AWS DMS?
AWS Database Migration Service has been generally available since 2016. It isn’t flashy. It hasn’t been rebranded with “AI.” It’s a replication engine.
And it works.
DMS can perform an initial load of your source database and then continuously replicate changes. For RDS Postgres → Redshift, that typically means:
- Full load of existing tables.
- Ongoing change data capture (CDC) using Postgres logical replication.
- Inserts, updates, deletes, and supported DDL changes streamed into Redshift.
It’s not perfect. It can be difficult to debug. Misconfiguration can cause lag or failure. There are edge cases that even AWS support struggles with.
But for the common case — RDS Postgres replicating into Redshift — there are stable, production-ready configurations that avoid most of the footguns. I’ve used this pattern across multiple teams for over five years, and it continues to hold up.
How It Works
Every implementation I’ve built relies on the same underlying Postgres mechanisms.
The Watcher on the WAL
Postgres uses a Write-Ahead Log (WAL) to record changes before they are written to disk.
Every transaction is first written to the WAL. That log supports:
- Crash recovery
- Point-in-time restore
- Replication
For our purposes, the WAL is the source of truth for database changes.
Replication Station
A replication slot acts like a bookmark.
When a replication client connects, it reads changes from the WAL. The replication slot tracks which changes have been consumed. Postgres retains WAL records until the connected client confirms they’ve been processed.
This guarantees that changes aren’t lost — but it also means misconfigured replication can cause WAL files to accumulate. Slot management matters.
How DMS Consumes Changes
DMS connects to Postgres using logical replication. It reads from a replication slot, consumes change events, and applies them to Redshift.
Instead of maintaining a physical replica database, DMS translates those change events into operations against your warehouse tables.
Under normal conditions, this pipeline introduces only seconds of delay.
Heavy write throughput, network constraints, or poor replication instance sizing can increase lag — which is why configuration and monitoring are critical. We’ll cover that in detail in Part 2.
Why This Matters
Near–real-time warehousing changes how teams think.
It removes the artificial constraint of “yesterday’s data.” It shortens feedback loops. It enables safer access patterns for analytics without touching production.
Most importantly, it demonstrates that batch processing is often a habit — not a requirement.
If your production database is already emitting a stream of changes, you can consume that stream.
What’s Next
In Part 2, we’ll walk through a production-ready RDS Postgres → Redshift configuration, including:
- Replication instance sizing
- Logical replication setup
- Slot management
- Table mapping and task configuration
- Monitoring and failure modes
In Part 3, we’ll cover operational lessons: debugging lag, handling schema changes, and the edge cases that tend to surprise teams the first time they deploy DMS.
Real-time analytics on AWS doesn’t require another vendor or a complex streaming platform. In many cases, it just requires using the tools you already have.