Workload Attribution with Sqlcommenter
pg_stat_statements ranks queries by total cost, but ranks them by fingerprint — not by the controller, route, or background job that issued them. A fingerprint that burns 18% of database time is useful; knowing that 14 of those 18 points come from one report-generation cron and the rest from the customer-facing checkout is what actually drives prioritization. Sqlcommenter is the bridge: it prepends a structured comment to every statement, Postgres preserves the comment in pg_stat_statements, and a few lines of SQL roll execution time up to whichever dimension you tagged.
This guide assumes sqlcommenter is already wired into your app — if not, start with Sqlcommenter for Postgres for the ORM-by-ORM setup. What follows is the database-side half: how to write the attribution SQL yourself, how to read the results, and how to decide what to instrument first.
The shape of an attributed row
With sqlcommenter on, every row in pg_stat_statements keeps the leading comment intact. A SELECT from pg_stat_statements looks like this:
SELECT left(query, 120), calls, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 3;
-- /*controller='checkout',action='create',route='/cart/checkout'*/ INSERT INTO orders ...
-- /*controller='reports',action='daily',job='report_cron'*/ SELECT ... FROM orders JOIN ...
-- /*controller='catalog',action='show',route='/product/:id'*/ SELECT ... FROM products WHERE id = $1Each comment is a comma-separated list of key='url-encoded-value' pairs. The same controller/action combo will appear with the same comment across every shape of query it issues, which is what makes the rollup cheap.
Extracting the call-site key
The minimum useful key is controller + action. Pull them out with a regex:
WITH tagged AS (
SELECT
queryid,
calls,
total_exec_time,
rows,
(regexp_match(query, $$controller='([^']+)'$$))[1] AS controller,
(regexp_match(query, $$action='([^']+)'$$))[1] AS action,
(regexp_match(query, $$route='([^']+)'$$))[1] AS route
FROM pg_stat_statements
)
SELECT
coalesce(controller, '<untagged>') AS controller,
coalesce(action, '<untagged>') AS action,
count(*) AS fingerprints,
sum(calls) AS calls,
round(sum(total_exec_time)::numeric, 1) AS total_ms,
round(sum(total_exec_time)::numeric / nullif(sum(calls), 0), 2) AS avg_ms
FROM tagged
GROUP BY controller, action
ORDER BY total_ms DESC
LIMIT 25;The output is your call-site ledger: which controller+action pair owns the most database time, how many distinct query fingerprints it spans, and how much each invocation costs on average. A high fingerprints count for one controller usually means dynamic query building (an ORM stitching joins together by argument shape) — those call-sites are the most fragile under schema change.
The unattributed share is a number, not a footnote
Untagged rows fall into the <untagged> bucket. The fraction of total execution time in that bucket is the single most important number on the rollup:
WITH tagged AS (
SELECT
total_exec_time,
(regexp_match(query, $$controller='([^']+)'$$))[1] AS controller
FROM pg_stat_statements
),
totals AS (
SELECT
sum(total_exec_time) FILTER (WHERE controller IS NOT NULL) AS attributed_ms,
sum(total_exec_time) FILTER (WHERE controller IS NULL) AS unattributed_ms,
sum(total_exec_time) AS total_ms
FROM tagged
)
SELECT
round(100.0 * unattributed_ms / nullif(total_ms, 0), 1) AS pct_unattributed,
unattributed_ms,
attributed_ms,
total_ms
FROM totals;Interpretation:
- Under 5%: legitimately background — vacuum, autoanalyze, replication-slot queries, internal
pg_catalogtraffic. Leave it alone. - 5–25%: some service in the fleet has not enabled sqlcommenter. Likely candidates: a Sidekiq queue, a Lambda, an admin Rails console, a Go service that hasn't had the comment-injection middleware merged. Hunt by
client_addrinpg_stat_activity. - Above 25%: attribution data is misleading. Any prioritization off the rollup is going to over-blame whichever services are tagged, because untagged ones are invisible. Fix coverage first.
Choosing what to tag
The default sqlcommenter packages emit controller, action, route, and framework. Most teams should add a couple more keys for the dimensions they care about:
job— for background work (Sidekiq, Celery, RQ). Without this, every cron looks like a long-running anonymous transaction.service— for shops with more than one app on the same database (a checkout monolith plus a reporting service plus an admin tool). Lets you slice by app before rolling up to controller.tenant_class— coarse buckets likefree,paid,enterprise. Never the tenant ID itself — that explodes cardinality and leaks data into the slow-query log.db_driver— emitted by sqlcommenter by default. Useful for spotting old driver versions still in the fleet (Django 3.2 vs 5.x).
Resist the urge to tag every field a developer might want.pg_stat_statements rows are bounded by track_activity_query_size (1024 bytes by default), and Postgres will truncate the SQL itself once the comment eats the budget. Five keys is the practical ceiling.
From rollup to action
The rollup tells you who. To know what to do, join the rollup back to the underlying fingerprints:
WITH tagged AS (
SELECT
queryid,
query,
calls,
total_exec_time,
rows,
shared_blks_hit + shared_blks_read AS blocks,
(regexp_match(query, $$controller='([^']+)'$$))[1] AS controller,
(regexp_match(query, $$action='([^']+)'$$))[1] AS action
FROM pg_stat_statements
)
SELECT
controller,
action,
queryid,
calls,
round(total_exec_time::numeric, 1) AS total_ms,
round((total_exec_time / calls)::numeric, 2) AS avg_ms,
blocks
FROM tagged
WHERE controller = 'checkout' AND action = 'create'
ORDER BY total_exec_time DESC;For one controller+action, you now have every fingerprint it emits, ranked by total cost. That is the shortlist for an EXPLAIN sweep. The three most common findings from this drill- down:
- One fingerprint dominates — typical when an ORM emits one well-shaped query and 20 small ones. The big one is the only one worth tuning.
- Many fingerprints, similar cost — usually missing an index that the whole call-site would share. Confirm with HypoPG.
- A long tail of one-call fingerprints — almost always a dynamic
WHERE-clause builder or anIN (...)with variable arity. The fix is on the application side (use parameter arrays), not the database.
Joining attribution to live activity
pg_stat_statements aggregates historically. pg_stat_activity shows what is running now. The two join cleanly on the comment because Postgres preserves it through both:
SELECT
pid,
now() - query_start AS age,
state,
(regexp_match(query, $$controller='([^']+)'$$))[1] AS controller,
(regexp_match(query, $$action='([^']+)'$$))[1] AS action,
left(query, 80) AS snippet
FROM pg_stat_activity
WHERE state = 'active'
AND query NOT LIKE '%pg_stat_activity%'
ORDER BY age DESC;This is the call you make when something is on fire — it tells you which routes are currently stuck, not which routes have been historically slow. Combine the two views in your dashboard (TuneStack does this on the CallSiteAttribution card) and you get past-tense aggregates plus a real-time leaderboard from one tagging investment.
Watch the resets
pg_stat_statements values accumulate from the last reset. If you ran SELECT pg_stat_statements_reset() two hours ago, the rollup represents two hours, not the deploy-to-deploy steady state you might expect. Snapshot pg_stat_statements every 5 minutes (and the attribution rollup with it) so you can compute deltas and spot sudden call-site shifts that historical aggregates would average away.
-- Snapshot once per tick; diff against previous tick.
CREATE TABLE IF NOT EXISTS pgss_snapshot (
taken_at timestamptz NOT NULL DEFAULT now(),
queryid bigint NOT NULL,
calls bigint NOT NULL,
total_exec_time double precision NOT NULL,
controller text,
action text
);
INSERT INTO pgss_snapshot (queryid, calls, total_exec_time, controller, action)
SELECT
queryid,
calls,
total_exec_time,
(regexp_match(query, $$controller='([^']+)'$$))[1],
(regexp_match(query, $$action='([^']+)'$$))[1]
FROM pg_stat_statements;Quick checklist
- Compute pct_unattributed first. Fix coverage before prioritizing.
- Limit tag keys to controller, action, route, job, service — under 5 dimensions keeps comments inside
track_activity_query_size. - Never tag per-tenant. Use buckets, not IDs.
- Snapshot
pg_stat_statementsevery 5 minutes; roll up call-sites against the diff, not the cumulative view. - Drill from rollup to fingerprints before touching EXPLAIN — the rollup tells you which 3 queries to look at, not all 800.
Related: Sqlcommenter wiring (Django/Rails/SQLAlchemy/OTel) · pg_stat_statements · Query regression detection