Architecture

QueryInsights is composed of four independently runnable services that share a single PostgreSQL database. Each service does one job and communicates with the others over gRPC.

                    +-----------+       gRPC        +----------+
  Trino Events ---->|  Webhook  |------------------>|          |
                    +-----------+                   |  Engine  |
                                                    |          |
                    +------------+      gRPC        |  parse   |
  CockroachDB ----->|   Poller   |----------------->|  store   |
  (multi-instance)  +------------+                  |  recommend (per-query)
  PostgreSQL   ---->|            |                  |  workload insight ticker
  (multi-instance)  +------------+                  +----------+
                                                         |
                                                         v
                                                   +------------+
                                                   | PostgreSQL |
                                                   +------------+
                                                         |
                    +------------+                       |
  Browser --------->| Dashboard  |<----------------------+
                    +------------+         gRPC (on-demand insights)
                    (Go API + React UI)

Services at a glance

ServicePortResponsibility
Webhook2427HTTP ingress that receives Trino event-listener payloads and forwards them to the engine over gRPC.
Engine50051The gRPC brain: parses, normalizes, stores, generates per-query recommendations, and runs the workload-insight ticker.
PollerPolls CockroachDB crdb_internal.statement_statistics and/or PostgreSQL pg_stat_statements and forwards captured queries to the engine over gRPC.
Dashboard3001HTTP API + React SPA for browsing query patterns, recommendations, and workload insights.

All four connect to the same PostgreSQL metastore. Only the engine writes to it; the dashboard reads from it.

Data flow

  1. Ingestion. Queries enter through one of two non-invasive paths:
    • The webhook receives Trino event-listener callbacks and calls the engine's SubmitQuery RPC.
    • The poller reads statement statistics directly from your databases and calls the same SubmitQuery RPC. It is a thin collector — it never parses, normalizes, or stores anything itself.
  2. Processing. The engine parses each query, normalizes it into a fingerprint, and upserts it. New patterns trigger a per-query recommendation in the background.
  3. Storage. Normalized patterns, raw queries, recommendations, and workload insights are written to PostgreSQL.
  4. Presentation. The dashboard API serves aggregated stats, ranked patterns, and recommendations to the React UI, and can ask the engine to generate a workload insight on demand.

Why the poller and webhook are separate

The two ingestion paths exist because databases expose their workload differently:

  • Trino emits an event when a query finishes, so a lightweight HTTP webhook is the natural fit.
  • PostgreSQL and CockroachDB expose cumulative statistics tables (pg_stat_statements, crdb_internal.statement_statistics), so a poller that samples those tables on an interval is the natural fit.

Both funnel into the exact same SubmitQuery RPC, so the engine treats every query identically regardless of where it came from — it only branches on the engine label to pick the right parser and metadata provider.

Where state lives

StateStoreWritten by
Normalized query patternsnormalized_analyses (PostgreSQL)Engine
Raw query samplesraw_queries (PostgreSQL)Engine
Per-query recommendationsrecommendations (PostgreSQL)Engine
Workload insights (history)workload_insights (PostgreSQL)Engine

The engine is the only writer, which keeps the concurrency model simple: the poller and webhook are stateless forwarders, and the dashboard is read-only.

Extending to new engines

The architecture is deliberately pluggable. Adding a new database engine means implementing a parser, a schema/metadata provider, and an engine-specific LLM prompt, then either writing a new poller or submitting queries through the existing gRPC RPC. See Concepts for the interfaces involved.