Architecture
QueryInsights is composed of four independently runnable services that share a single PostgreSQL database. Each service does one job and communicates with the others over gRPC.
+-----------+ gRPC +----------+
Trino Events ---->| Webhook |------------------>| |
+-----------+ | Engine |
| |
+------------+ gRPC | parse |
CockroachDB ----->| Poller |----------------->| store |
(multi-instance) +------------+ | recommend (per-query)
PostgreSQL ---->| | | workload insight ticker
(multi-instance) +------------+ +----------+
|
v
+------------+
| PostgreSQL |
+------------+
|
+------------+ |
Browser --------->| Dashboard |<----------------------+
+------------+ gRPC (on-demand insights)
(Go API + React UI)
Services at a glance
| Service | Port | Responsibility |
|---|---|---|
| Webhook | 2427 | HTTP ingress that receives Trino event-listener payloads and forwards them to the engine over gRPC. |
| Engine | 50051 | The gRPC brain: parses, normalizes, stores, generates per-query recommendations, and runs the workload-insight ticker. |
| Poller | — | Polls CockroachDB crdb_internal.statement_statistics and/or PostgreSQL pg_stat_statements and forwards captured queries to the engine over gRPC. |
| Dashboard | 3001 | HTTP API + React SPA for browsing query patterns, recommendations, and workload insights. |
All four connect to the same PostgreSQL metastore. Only the engine writes to it; the dashboard reads from it.
Data flow
- Ingestion. Queries enter through one of two non-invasive paths:
- The webhook receives Trino event-listener callbacks and calls the
engine's
SubmitQueryRPC. - The poller reads statement statistics directly from your databases and
calls the same
SubmitQueryRPC. It is a thin collector — it never parses, normalizes, or stores anything itself.
- The webhook receives Trino event-listener callbacks and calls the
engine's
- Processing. The engine parses each query, normalizes it into a fingerprint, and upserts it. New patterns trigger a per-query recommendation in the background.
- Storage. Normalized patterns, raw queries, recommendations, and workload insights are written to PostgreSQL.
- Presentation. The dashboard API serves aggregated stats, ranked patterns, and recommendations to the React UI, and can ask the engine to generate a workload insight on demand.
Why the poller and webhook are separate
The two ingestion paths exist because databases expose their workload differently:
- Trino emits an event when a query finishes, so a lightweight HTTP webhook is the natural fit.
- PostgreSQL and CockroachDB expose cumulative statistics tables
(
pg_stat_statements,crdb_internal.statement_statistics), so a poller that samples those tables on an interval is the natural fit.
Both funnel into the exact same SubmitQuery RPC, so the engine treats every
query identically regardless of where it came from — it only branches on the
engine label to pick the right parser and metadata provider.
Where state lives
| State | Store | Written by |
|---|---|---|
| Normalized query patterns | normalized_analyses (PostgreSQL) | Engine |
| Raw query samples | raw_queries (PostgreSQL) | Engine |
| Per-query recommendations | recommendations (PostgreSQL) | Engine |
| Workload insights (history) | workload_insights (PostgreSQL) | Engine |
The engine is the only writer, which keeps the concurrency model simple: the poller and webhook are stateless forwarders, and the dashboard is read-only.
Extending to new engines
The architecture is deliberately pluggable. Adding a new database engine means implementing a parser, a schema/metadata provider, and an engine-specific LLM prompt, then either writing a new poller or submitting queries through the existing gRPC RPC. See Concepts for the interfaces involved.