Skip to content

Workload Attribution with Sqlcommenter

pg_stat_statements ranks queries by total cost, but ranks them by fingerprint — not by the controller, route, or background job that issued them. A fingerprint that burns 18% of database time is useful; knowing that 14 of those 18 points come from one report-generation cron and the rest from the customer-facing checkout is what actually drives prioritization. Sqlcommenter is the bridge: it prepends a structured comment to every statement, Postgres preserves the comment in pg_stat_statements, and a few lines of SQL roll execution time up to whichever dimension you tagged.

This guide assumes sqlcommenter is already wired into your app — if not, start with Sqlcommenter for Postgres for the ORM-by-ORM setup. What follows is the database-side half: how to write the attribution SQL yourself, how to read the results, and how to decide what to instrument first.

The shape of an attributed row

With sqlcommenter on, every row in pg_stat_statements keeps the leading comment intact. A SELECT from pg_stat_statements looks like this:

SELECT left(query, 120), calls, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 3;

-- /*controller='checkout',action='create',route='/cart/checkout'*/ INSERT INTO orders ...
-- /*controller='reports',action='daily',job='report_cron'*/ SELECT ... FROM orders JOIN ...
-- /*controller='catalog',action='show',route='/product/:id'*/ SELECT ... FROM products WHERE id = $1

Each comment is a comma-separated list of key='url-encoded-value' pairs. The same controller/action combo will appear with the same comment across every shape of query it issues, which is what makes the rollup cheap.

Extracting the call-site key

The minimum useful key is controller + action. Pull them out with a regex:

WITH tagged AS (
  SELECT
    queryid,
    calls,
    total_exec_time,
    rows,
    (regexp_match(query, $$controller='([^']+)'$$))[1] AS controller,
    (regexp_match(query, $$action='([^']+)'$$))[1]     AS action,
    (regexp_match(query, $$route='([^']+)'$$))[1]      AS route
  FROM pg_stat_statements
)
SELECT
  coalesce(controller, '<untagged>') AS controller,
  coalesce(action,     '<untagged>') AS action,
  count(*)                            AS fingerprints,
  sum(calls)                          AS calls,
  round(sum(total_exec_time)::numeric, 1) AS total_ms,
  round(sum(total_exec_time)::numeric / nullif(sum(calls), 0), 2) AS avg_ms
FROM tagged
GROUP BY controller, action
ORDER BY total_ms DESC
LIMIT 25;

The output is your call-site ledger: which controller+action pair owns the most database time, how many distinct query fingerprints it spans, and how much each invocation costs on average. A high fingerprints count for one controller usually means dynamic query building (an ORM stitching joins together by argument shape) — those call-sites are the most fragile under schema change.

The unattributed share is a number, not a footnote

Untagged rows fall into the <untagged> bucket. The fraction of total execution time in that bucket is the single most important number on the rollup:

WITH tagged AS (
  SELECT
    total_exec_time,
    (regexp_match(query, $$controller='([^']+)'$$))[1] AS controller
  FROM pg_stat_statements
),
totals AS (
  SELECT
    sum(total_exec_time) FILTER (WHERE controller IS NOT NULL) AS attributed_ms,
    sum(total_exec_time) FILTER (WHERE controller IS NULL)     AS unattributed_ms,
    sum(total_exec_time)                                       AS total_ms
  FROM tagged
)
SELECT
  round(100.0 * unattributed_ms / nullif(total_ms, 0), 1) AS pct_unattributed,
  unattributed_ms,
  attributed_ms,
  total_ms
FROM totals;

Interpretation:

  • Under 5%: legitimately background — vacuum, autoanalyze, replication-slot queries, internal pg_catalog traffic. Leave it alone.
  • 5–25%: some service in the fleet has not enabled sqlcommenter. Likely candidates: a Sidekiq queue, a Lambda, an admin Rails console, a Go service that hasn't had the comment-injection middleware merged. Hunt by client_addr in pg_stat_activity.
  • Above 25%: attribution data is misleading. Any prioritization off the rollup is going to over-blame whichever services are tagged, because untagged ones are invisible. Fix coverage first.

Choosing what to tag

The default sqlcommenter packages emit controller, action, route, and framework. Most teams should add a couple more keys for the dimensions they care about:

  • job — for background work (Sidekiq, Celery, RQ). Without this, every cron looks like a long-running anonymous transaction.
  • service — for shops with more than one app on the same database (a checkout monolith plus a reporting service plus an admin tool). Lets you slice by app before rolling up to controller.
  • tenant_class — coarse buckets like free, paid, enterprise. Never the tenant ID itself — that explodes cardinality and leaks data into the slow-query log.
  • db_driver — emitted by sqlcommenter by default. Useful for spotting old driver versions still in the fleet (Django 3.2 vs 5.x).

Resist the urge to tag every field a developer might want.pg_stat_statements rows are bounded by track_activity_query_size (1024 bytes by default), and Postgres will truncate the SQL itself once the comment eats the budget. Five keys is the practical ceiling.

From rollup to action

The rollup tells you who. To know what to do, join the rollup back to the underlying fingerprints:

WITH tagged AS (
  SELECT
    queryid,
    query,
    calls,
    total_exec_time,
    rows,
    shared_blks_hit + shared_blks_read AS blocks,
    (regexp_match(query, $$controller='([^']+)'$$))[1] AS controller,
    (regexp_match(query, $$action='([^']+)'$$))[1]     AS action
  FROM pg_stat_statements
)
SELECT
  controller,
  action,
  queryid,
  calls,
  round(total_exec_time::numeric, 1)              AS total_ms,
  round((total_exec_time / calls)::numeric, 2)    AS avg_ms,
  blocks
FROM tagged
WHERE controller = 'checkout' AND action = 'create'
ORDER BY total_exec_time DESC;

For one controller+action, you now have every fingerprint it emits, ranked by total cost. That is the shortlist for an EXPLAIN sweep. The three most common findings from this drill- down:

  • One fingerprint dominates — typical when an ORM emits one well-shaped query and 20 small ones. The big one is the only one worth tuning.
  • Many fingerprints, similar cost — usually missing an index that the whole call-site would share. Confirm with HypoPG.
  • A long tail of one-call fingerprints — almost always a dynamic WHERE-clause builder or an IN (...) with variable arity. The fix is on the application side (use parameter arrays), not the database.

Joining attribution to live activity

pg_stat_statements aggregates historically. pg_stat_activity shows what is running now. The two join cleanly on the comment because Postgres preserves it through both:

SELECT
  pid,
  now() - query_start AS age,
  state,
  (regexp_match(query, $$controller='([^']+)'$$))[1] AS controller,
  (regexp_match(query, $$action='([^']+)'$$))[1]     AS action,
  left(query, 80)                                    AS snippet
FROM pg_stat_activity
WHERE state = 'active'
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY age DESC;

This is the call you make when something is on fire — it tells you which routes are currently stuck, not which routes have been historically slow. Combine the two views in your dashboard (TuneStack does this on the CallSiteAttribution card) and you get past-tense aggregates plus a real-time leaderboard from one tagging investment.

Watch the resets

pg_stat_statements values accumulate from the last reset. If you ran SELECT pg_stat_statements_reset() two hours ago, the rollup represents two hours, not the deploy-to-deploy steady state you might expect. Snapshot pg_stat_statements every 5 minutes (and the attribution rollup with it) so you can compute deltas and spot sudden call-site shifts that historical aggregates would average away.

-- Snapshot once per tick; diff against previous tick.
CREATE TABLE IF NOT EXISTS pgss_snapshot (
  taken_at        timestamptz NOT NULL DEFAULT now(),
  queryid         bigint      NOT NULL,
  calls           bigint      NOT NULL,
  total_exec_time double precision NOT NULL,
  controller      text,
  action          text
);

INSERT INTO pgss_snapshot (queryid, calls, total_exec_time, controller, action)
SELECT
  queryid,
  calls,
  total_exec_time,
  (regexp_match(query, $$controller='([^']+)'$$))[1],
  (regexp_match(query, $$action='([^']+)'$$))[1]
FROM pg_stat_statements;

Quick checklist

  • Compute pct_unattributed first. Fix coverage before prioritizing.
  • Limit tag keys to controller, action, route, job, service — under 5 dimensions keeps comments inside track_activity_query_size.
  • Never tag per-tenant. Use buckets, not IDs.
  • Snapshot pg_stat_statements every 5 minutes; roll up call-sites against the diff, not the cumulative view.
  • Drill from rollup to fingerprints before touching EXPLAIN — the rollup tells you which 3 queries to look at, not all 800.

Related: Sqlcommenter wiring (Django/Rails/SQLAlchemy/OTel) · pg_stat_statements · Query regression detection

Workload Attribution with Sqlcommenter | TuneStack