Concepts
This page walks through the building blocks of QueryInsights one at a time. If you have read the Architecture overview, this fills in what each piece actually does.
Poller
The poller captures queries from databases that expose their workload as statistics tables. It is intentionally a thin data collector: it does not parse, normalize, or store anything. It reads, filters, samples, and forwards.
- PostgreSQL — reads
pg_stat_statements, joiningpg_databaseto resolve each query's database name. Becausepg_stat_statementsis cluster-wide, the poller can fall back to the maintenancepostgresdatabase if the DSN's own database doesn't have the extension installed. - CockroachDB — reads
crdb_internal.statement_statistics, skipping internal application names and DDL statements.
Each configured instance gets its own goroutine with an independent sampling
rate and poll interval. Sampling is probabilistic — every captured query
passes a rand() < sampling_rate check — so you can dial down volume on a busy
instance. Internal and utility statements (system catalogs, SET, SHOW,
transaction control, DDL, and known admin tools) are filtered out before
anything is forwarded.
Captured queries are sent to the engine via the SubmitQuery gRPC call, tagged
with the engine name, the source instance, and the originating database.
Engine
The engine is the centralized brain. It is a gRPC server that owns all processing and all writes to the metastore. For every query it receives it:
- Parses the SQL into a structured analysis (query type, tables, columns, joins, clause presence, subquery count).
- Normalizes it into a fingerprint by replacing literals with placeholders.
- Upserts the normalized pattern, deduplicating by
(normalized_query, engine), and records a raw-query sample. - Recommends — for a new pattern with resolvable tables, it kicks off a background per-query recommendation.
The engine also runs the workload-insight ticker, which periodically produces holistic, cross-pattern advice per engine/instance (see Recommendations below).
Internally the engine keeps a map of recommendation services keyed by engine
name (trino, cockroachdb, postgresql), each wired to the matching metadata
provider, and selects the right one based on the query's engine label.
Parsing and normalization
Parsing turns raw SQL into a structured Analysis, and normalization turns it
into a stable fingerprint. The engine routes by engine type:
- Trino — the official Trino SQL grammar compiled with ANTLR; a visitor walks the parse tree to extract metadata, and a token-stream normalizer replaces literals.
- CockroachDB — a yacc-style lexer and structural analyzer.
- PostgreSQL / others — a regex-based normalizer that strips comments, replaces string/numeric/dollar-quoted literals, and collapses whitespace.
Normalization is what makes the product tractable: a thousand variants of the same statement collapse into one pattern, so you optimize shapes, not noise.
-- Original
SELECT name FROM users WHERE id = 42 AND status = 'active'
-- Normalized
SELECT name FROM users WHERE id = ? AND status = ?
Metadata providers (schema/DDL)
Recommendations are only useful if they are grounded in your real schema. The
SchemaProvider interface fetches the DDL for the tables a query touches:
| Engine | How it fetches DDL |
|---|---|
| Trino | SHOW CREATE TABLE via the Trino client |
| CockroachDB | SHOW CREATE TABLE over the PostgreSQL wire protocol |
| PostgreSQL | Reconstructed from information_schema, enriched with pg_stats |
Providers are multi-instance: the engine keeps one provider per configured instance and connects lazily to the specific database a query came from. Table names are validated before being interpolated into any metadata query.
LLM provider
The LLM provider is the abstraction over the model backend that writes the actual recommendations. It exposes two operations — one for per-query rewrites and one for workload insights — behind a single interface, so support for a new backend is a matter of implementing that interface.
Two backends ship today:
- OpenAI — chat completions using OpenAI's Structured Outputs, so the model
is forced to return a strict
{ summary, details }JSON envelope. - Anthropic — Claude models.
Which backend runs is decided from the configured model name:
- If the model name contains
claudeand an Anthropic API key is set → Anthropic. - If the model name contains
claudebut only an OpenAI key is set → the OpenAI-compatible client (this is the expected path when routing Claude through an OpenAI-compatible gateway such as LiteLLM or Portkey). - Otherwise → OpenAI.
A custom base URL lets you point either client at a gateway instead of the vendor's public API. Generation uses a low temperature and a hard token cap so output stays focused and per-call cost is bounded. If no API key is configured, the engine simply starts without recommendation features — ingestion and the dashboard still work.
See Configuration for the exact variables.
Recommendations: two scopes, one feed
QueryInsights produces two deliberately non-overlapping kinds of advice, surfaced
together in the dashboard with scope and kind chips:
Query rewrite (scope=query) | Workload schema (scope=workload) | |
|---|---|---|
| Subject | One normalized query pattern | An entire engine/instance workload |
| Trigger | Automatically, on first sight of a new pattern | The engine's periodic ticker, or an on-demand dashboard request |
| Question | "How can this query be written better?" | "Given all queries hitting this engine, what schema changes help the whole workload?" |
| Output | Rewrites, predicate ordering, join shape — no schema advice | Indexes, covering/storing clauses, layout — no per-query rewrites |
Keeping the two scopes non-overlapping means the advice stays complementary instead of contradictory.
Idempotency and safety
Because recommendation generation calls an LLM, the engine takes care to never do it twice for the same pattern:
- An
is_recommendation_generatedflag is checked before any LLM call. SELECT ... FOR UPDATEinside a transaction serializes concurrent workers racing on the same pattern.- In-flight claims and per-table caps prevent a burst of new patterns from spawning unbounded concurrent LLM calls.
- Failed attempts are counted and permanently skipped after a configurable limit.
These controls make ingestion safe to run continuously against a live, high-volume workload.