Postgres WAL and Replication

The Write-Ahead Log (WAL) is Postgres's durability mechanism: every change is written to the WAL before the corresponding heap page is modified. WAL is also the foundation of replication — both physical streaming replication and logical replication derive their data from the same WAL stream. Understanding WAL explains replication lag, standby failover, and why certain operations are replication-incompatible.

How WAL provides durability

When a transaction commits, Postgres guarantees that WAL records for that transaction have been flushed to durable storage before returning success to the client. The heap pages themselves may still be dirty in shared_buffers — a background writer and checkpointer flush them asynchronously. On crash, Postgres replays WAL records from the last checkpoint forward, restoring all committed transactions.

WAL is written to segment files in pg_wal/, each 16 MB by default. The LSN (Log Sequence Number) is a 64-bit monotonically increasing byte offset into this stream. Every operation — inserts, updates, deletes, DDL — has an LSN.

wal_level: what gets written

The wal_level GUC controls how much information goes into WAL records:

minimal — minimum for crash recovery. Skips WAL for operations that can be trivially replayed (bulk COPY, CREATE TABLE AS SELECT). Cannot support replication.
replica (default) — adds information required for physical streaming replication and base backups. Sufficient for read replicas and high availability.
logical — adds the column-level before/after values needed to decode row changes. Required for logical replication, CDC pipelines, and tools like Debezium. Increases WAL volume by roughly 30–50% on write-heavy workloads.

Changing wal_level requires a server restart. Check current value:

SHOW wal_level;

Physical streaming replication

Physical replication sends raw WAL byte-for-byte to standby servers. The standby continuously replays WAL records against an exact byte-copy of the primary's data directory. This produces an identical Postgres instance — same page layout, same file structure.

Setup requires a replication slot or manual primary_conninfo in recovery.conf / standby.signal (Postgres 12+). The key parameters on the primary:

# postgresql.conf
wal_level = replica
max_wal_senders = 10          # max concurrent replication connections
wal_keep_size = 1GB           # retain this much WAL for slow standbys
hot_standby = on              # allow reads on standby (set on standby)

Physical standbys can serve read queries but cannot have different schemas, additional databases, or run DDL independently.

Logical replication

Logical replication decodes WAL into row-level changes (INSERT/UPDATE/DELETE) and streams them to subscribers. Unlike physical replication, subscribers can be different Postgres versions, have their own schemas, apply only a subset of tables, or even be non-Postgres systems via logical decoding plugins.

Create a publication on the primary:

-- publish specific tables
CREATE PUBLICATION my_pub FOR TABLE orders, customers;

-- or publish everything
CREATE PUBLICATION my_pub FOR ALL TABLES;

Subscribe on the target:

CREATE SUBSCRIPTION my_sub
  CONNECTION 'host=primary port=5432 dbname=mydb user=replicator password=...'
  PUBLICATION my_pub;

Logical replication requires wal_level = logical and that replicated tables have a replica identity. By default this is the primary key; tables without a primary key need ALTER TABLE t REPLICA IDENTITY FULL (writes the entire old row to WAL on updates).

Replication slots and WAL retention risk

A replication slot causes the primary to retain WAL until the subscriber has consumed it. If a subscriber goes offline, the primary accumulates WAL indefinitely. On a write-heavy database, this can fill the disk and crash the primary.

Monitor slot lag:

SELECT slot_name,
       active,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag_bytes
FROM pg_replication_slots;

Set a hard limit on retained WAL:

-- drop a slot that's been inactive too long
SELECT pg_drop_replication_slot('my_slot');

max_slot_wal_keep_size (Postgres 13+) caps how much WAL a slot can retain before Postgres invalidates it automatically:

# postgresql.conf
max_slot_wal_keep_size = 10GB

Monitoring replica lag

On the primary, pg_stat_replication shows all connected standbys:

SELECT client_addr,
       state,
       sent_lsn,
       write_lsn,
       flush_lsn,
       replay_lsn,
       write_lag,
       flush_lag,
       replay_lag
FROM pg_stat_replication;

The replay_lag column shows how far behind the standby is in wall-clock time. A standby with replay_lag > 30s on a production HA setup warrants investigation. Common causes:

Network bandwidth between primary and standby is saturated.
Standby is I/O-bound applying WAL — check pg_stat_activity for the startup process.
Long-running queries on the standby (hot_standby_feedback enabled) conflict with WAL application.
recovery_min_apply_delay is set intentionally (delayed standby for accidental-delete protection).

Synchronous vs asynchronous replication

By default, replication is asynchronous: the primary commits as soon as WAL is flushed locally, without waiting for standbys. This is fastest but means standby lag is theoretically unbounded — a failover could lose recently-committed transactions.

Synchronous replication waits for at least one standby to acknowledge WAL receipt before returning to the client:

# postgresql.conf
synchronous_standby_names = 'ANY 1 (standby1, standby2)'
synchronous_commit = remote_write   # or 'on' for full flush

remote_write waits for the standby to write WAL to its OS buffer (not flush to disk). on waits for a full fsync on the standby. Synchronous replication adds commit latency equal to the network round-trip to the standby.

Quick checklist

Set wal_level = logical only if needed for CDC or logical replication — it adds WAL volume.
Monitor pg_replication_slots — an inactive slot accumulating lag can fill disk.
Set max_slot_wal_keep_size to cap WAL retention from dead slots.
Alert on replay_lag > 30s for HA standbys.
Use synchronous_commit = remote_write for durability without full-fsync latency.
Tables in logical replication must have a replica identity (primary key recommended).