This guide assumes that you have a dashboard or report up and running, and that you wish to improve the performance of your dashboard by leveraging OLAP best practices. It will guide you through:
This guide proposes a AI copilot-centric approach for doing this, but each of these steps can be done manually as well. This guide also assumes that you have Visual Studio Code as your IDE, configured with Github Copilot as your coding assistant.
This guide is written for a TypeScript developer. If you are interested in a Python guide, let us know!
This section will guide you through:
See appendix for instructions on setting up WSL2 if you do not already have a Linux box running on your windows machine.
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) mooseMake sure to restart your terminal after installing MooseStack.
You have two options for setting up your MooseStack project:
If you already have a repository with MooseStack configured:
git clone <YOUR_REPOSITORY_URL>
cd <your-project>
pnpm installIf you have an existing PNPM monorepo (e.g., a Next.js app) and want to add MooseStack:
From your monorepo root:
mkdir moosestack
cd moosestack
moose-cli init . typescriptEdit your root pnpm-workspace.yaml to include the moose directory:
packages: - . # Your main app (e.g., Next.js) - moosestack # MooseStack projectEdit moosestack/package.json to set it up as a workspace package:
{ "name": "moosestack", "private": true, "version": "0.0.1", "main": "./dist/index.js", "types": "./dist/index.d.ts", "exports": { ".": { "types": "./dist/index.d.ts", "default": "./dist/index.js" } }, "scripts": { "build": "moose-cli build", "dev": "moose-cli dev", "moose": "moose-cli" }, "dependencies": { "@514labs/moose-lib": "latest", "typia": "^10.1.0" }, "devDependencies": { "@514labs/moose-cli": "latest", "@types/node": "^24", "typescript": "^5" }}Edit your root package.json to add MooseStack scripts and pnpmOnlyBuiltDependencies:
{ "scripts": { "dev": "next dev", // Your app dev server "dev:moose": "pnpm -C moosestack run dev", // MooseStack dev server "moose": "pnpm -C moosestack run moose", // MooseStack CLI "build": "next build", // Your app build "build:moose": "pnpm -C moosestack run build" // MooseStack build }, "dependencies": { "moosestack": "workspace:*", // Reference the moosestack workspace package // ... your other dependencies }, "pnpm": { "onlyBuiltDependencies": [ "@confluentinc/kafka-javascript", "@514labs/kafka-javascript" ] }}From your monorepo root:
pnpm installYour monorepo will have the following structure:
Ensure Docker Desktop is running first.
Navigate to the MooseStack project directory (it contains the moose.config.toml file):
cd moosestack-service # or wherever your moose project is
pnpm install
moose devFrom your monorepo root, you can run MooseStack using the convenience script:
pnpm dev:mooseOr navigate to the moosestack directory:
cd moosestack
pnpm run devIf the development server runs correctly, you'll see a list of the available API endpoints print:
📍 Available Routes: Base URL: http://localhost:4000 Static Routes: METHOD ENDPOINT DESCRIPTION ------ -------- ----------- GET /admin/inframap Admin: Get infrastructure map GET /admin/reality-check Admin: Reality check - provides a diff when drift is detected between the running instance of moose and the db it is connected to GET /health Health check endpoint GET /ready Readiness check endpoint... Ingestion Routes: METHOD ENDPOINT DESCRIPTION ------ -------- ----------- POST /ingest/DebeziumChangeEvent Ingest data to DebeziumChangeEvent (vlatest) Consumption Routes: METHOD ENDPOINT DESCRIPTION ------ -------- ----------- GET /api/example-api/v1 Consumption API endpoint... WebApp Routes: METHOD ENDPOINT DESCRIPTION ------ -------- ----------- * /dataset Express WebApp endpoint * /tools Express WebApp endpointAnd your local services will be running in Docker containers:
This subsection will run a couple of services that can help you and your copilots / agents work with the MooseStack project:
get_infra_map, query_olap, and get_logs will be useful in accelerating your path through this guide.To install the Moose Dev MCP with VS Code 1.102+, add this to your project's .vscode/mcp.json or User Settings:
{ "servers": { "moose-dev": { "type": "http", "url": "http://localhost:4000/mcp" } }}Confirm the server is running and integrated with VSCode by running MCP: List Servers command from the Command Palette.
This section walks you through how to set up your MSSQL database to integrate with Debezium for change data capture. Before you start, make sure you have the following credentials for your database:
This setup comprises 3 main steps:
The following example shows how CDC tracks changes to a customers table:
1. App executes: UPDATE dbo.customers SET status='active' WHERE customer_id=123
2. Transaction Log: Log record written (LSN: 00000042:00000123:0001, BEFORE: 'inactive', AFTER: 'active')
3. Capture Job: Reads log, finds change to CDC-enabled table dbo.customers
4. Tracking Table: INSERT INTO cdc.dbo_customers_CT (__$operation=4, customer_id=123, status='active', ...)
5. Debezium: Polls tracking table, creates JSON: {"op":"u", "before":{...}, "after":{"status":"active"}}
6. HTTP Sink: Receives POST with change event payload
You'll need to first make sure your MSSQL Server is properly configured to work with Debezium.
Useful for executing commands throughout this guide.
Command:
Invoke-Sqlcmd -ServerInstance "$host,$port" -Database "dbName" -Username "username" -Password "<YOUR_PASSWORD>" -TrustServerCertificate -Query $sqlSingle line $sql statement:
$sql = "SELECT 1"Multiline $sql statement:
$sql = @"
SELECT 1;
"@Verify connectivity to the SQL Server instance.
Command:
SELECT 1;Expected Response:
Column1------- 1This enables CDC infrastructure on the database (creates the cdc schema, system tables, and jobs).
Command:
EXEC sys.sp_cdc_enable_db;Expected Response:
Command(s) completed successfully.Check that CDC system tables exist after enabling CDC.
Command:
-- Check if cdc schema and tables existSELECT s.name AS SchemaName, t.name AS TableName, t.create_date AS CreatedDateFROM sys.tables tJOIN sys.schemas s ON t.schema_id = s.schema_idWHERE s.name = 'cdc' AND t.name IN ('change_tables', 'captured_columns', 'ddl_history', 'index_columns', 'lsn_time_mapping')ORDER BY t.name;Expected Response:
SchemaName TableName CreatedDate----------- ------------------ ----------------------cdc captured_columns 2025-12-16 16:48:50cdc change_tables 2025-12-16 16:48:50cdc ddl_history 2025-12-16 16:48:50cdc index_columns 2025-12-16 16:48:50cdc lsn_time_mapping 2025-12-16 16:48:50CDC System Tables:
| Table | Purpose |
|---|---|
cdc.change_tables | Registry of all CDC-enabled tables and their capture instances |
cdc.captured_columns | Lists which columns are being tracked for each capture instance |
cdc.ddl_history | Records DDL changes (ALTER TABLE, etc.) on CDC-enabled tables |
cdc.index_columns | Stores index column information for tables with net changes support |
cdc.lsn_time_mapping | Maps LSN values to commit timestamps (for time-based queries) |
Enable CDC for each table you want to track. The command differs based on whether the table has a primary key.
Command:
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'customers', -- Replace with your table name @role_name = NULL, @supports_net_changes = 1; -- Requires primary keyExpected Response:
Job 'cdc.dbo_customers_capture' started successfully.Job 'cdc.dbo_customers_cleanup' started successfully.Command:
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'TableName', -- Replace with your table name @role_name = NULL, @supports_net_changes = 0; -- No primary key existsExpected Response:
Job 'cdc.dbo_TableName_capture' started successfully.Job 'cdc.dbo_TableName_cleanup' started successfully.Note: Above commands generate the CDC tracking table with the format: cdc.<schema>_<table>_CT
Verify which tables now have CDC enabled.
Command:
SELECT s.name AS SchemaName, t.name AS TableName FROM sys.tables t JOIN sys.schemas s ON t.schema_id = s.schema_id WHERE t.is_tracked_by_cdc = 1 ORDER BY s.name, t.name;Expected Response:
SchemaName TableName----------- ------------dbo customersdbo ordersdbo order_itemsdbo productsConfirm that SQL Server Agent is running (required for CDC to function).
Command:
EXEC xp_servicecontrol 'QueryState', 'SQLServerAGENT';Expected Response:
Current Service State---------------------Running.⚠️ Critical: SQL Server Agent MUST be running for CDC to capture changes!
If the service is not running, start it with:
EXEC xp_servicecontrol 'START', 'SQLServerAGENT';Verify that CDC capture and cleanup jobs were created.
Command:
EXEC sys.sp_cdc_help_jobs;Expected Response:
job_id : 9f9c5160-1bea-4b22-9818-50f16653be9ejob_type : capturejob_name : cdc.appdb_capturemaxtrans : 500maxscans : 10continuous : Truepollinginterval : 5retention : 0threshold : 0 job_id : ee746d5b-9c6d-471c-97e7-757ad9418341job_type : cleanupjob_name : cdc.appdb_cleanupmaxtrans : 0maxscans : 0continuous : Falsepollinginterval : 0retention : 4320threshold : 5000What this shows:
Source Table (E-commerce customers table):
CREATE TABLE dbo.customers ( customer_id INT PRIMARY KEY, customer_name NVARCHAR(100), status NVARCHAR(20), created_at DATETIME, updated_at DATETIME);CDC Tracking Table: cdc.dbo_customers_CT
-- Auto-generated by SQL ServerCREATE TABLE cdc.dbo_customers_CT ( -- CDC System Columns (always present) __$start_lsn BINARY(10) NOT NULL, -- LSN when change committed __$end_lsn BINARY(10) NULL, -- Always NULL (reserved) __$seqval BINARY(10) NOT NULL, -- Sequence within transaction __$operation INT NOT NULL, -- 1=DEL, 2=INS, 3=UPD(before), 4=UPD(after) __$update_mask VARBINARY(128) NULL, -- Bitmask of changed columns -- Your Table Columns (copied from source) customer_id INT, customer_name NVARCHAR(100), status NVARCHAR(20), created_at DATETIME, updated_at DATETIME);This section describes the CDC architecture used in our demo environment, where network constraints prevent Debezium from connecting directly to Kafka. In environments without these constraints, CDC can be configured to stream changes directly from Debezium into Kafka topics instead.
In this architecture, all changes from your MSSQL database tables are captured using Debezium and sent to a single, Moose-managed Kafka (or Redpanda) Stream via a Moose-managed HTTP Ingest API endpoint (there is no per-table endpoint or per-table ingest stream at this stage).
From there, CDC events are explicitly fanned out based on their metadata. Each MSSQL source table ultimately maps to:
This architecture implements a Streaming Function that inspects each raw CDC event from the shared stream to identify the source table and routes it to the appropriate table-specific stream. Each of these streams feeds a corresponding ClickHouse table.
To summarize, the high level data flow is:
Before continuing, confirm:
If CDC is not enabled yet, complete the MSSQL Server CDC Configuration section first.
This endpoint will receive CDC events from Debezium.
In your MooseStack project, create a new Ingest API sink that accepts Debezium ChangeEvent payloads:
File location: moosestack/src/cdc/DebeziumChangeEvent.model.ts
import { IngestApi, Stream, DeadLetterQueue, Int64 } from "@514labs/moose-lib"; /** * Debezium SQL Server Change Event Structure * Documentation: https://debezium.io/documentation/reference/stable/connectors/sqlserver.html * * This model represents the standardized event structure sent by Debezium * for all change data capture events from SQL Server. */ export interface DebeziumSource { version: string; connector: string; name: string; ts_ms: Int64; // Timestamp in milliseconds (epoch) snapshot?: string; db: string; sequence?: string; schema: string; table: string; change_lsn?: string; commit_lsn?: string; event_serial_no?: number;} export interface DebeziumTransaction { id?: string; total_order?: number; data_collection_order?: number;} /** * Main Debezium Change Event Payload * The 'before' and 'after' fields contain the actual row data * and their shape varies by table, so we use Record<string, any> */export interface DebeziumChangeEvent { // Row data before the change (null for INSERT operations) before?: Record<string, any> | null; // Row data after the change (null for DELETE operations) after?: Record<string, any> | null; // Source metadata identifying where this change came from source: DebeziumSource; // Operation type: 'c' = create, 'u' = update, 'd' = delete, 'r' = read (snapshot) op: string; // Timestamp in milliseconds ts_ms: Int64; // Transaction metadata (optional) transaction?: DebeziumTransaction | null;} /** * Full Debezium Event Envelope (what actually gets POSTed by Debezium) * Debezium sends events with both schema and payload wrapped together */export interface DebeziumEventEnvelope { schema?: Record<string, any>; payload: DebeziumChangeEvent;} /** * Stream for CDC events - fans out to table-specific streams via streaming function */export const DebeziumChangeEventStream = new Stream<DebeziumEventEnvelope>("DebeziumChangeEvent"); /** * Dead Letter Queue for failed Debezium events */export const DebeziumChangeEventDLQ = new DeadLetterQueue<DebeziumEventEnvelope>( "DebeziumChangeEvent_DLQ"); /** * Ingestion API endpoint for Debezium CDC events * Creates: POST /ingest/DebeziumChangeEvent * * Debezium sends events here, which flow through the streaming function * to fan out to table-specific Redpanda topics. */export const DebeziumChangeEventIngestApi = new IngestApi<DebeziumEventEnvelope>( "DebeziumChangeEvent", { destination: DebeziumChangeEventStream, deadLetterQueue: DebeziumChangeEventDLQ, });Before implementing the streaming function, define an explicit mapping between MSSQL source table names and their corresponding Moose Streams.
File location: moosestack/src/cdc/tableStreamMap.ts
import { ProductStream } from "../models/Product.model";import { CustomerStream } from "../models/Customer.model";import { OrderStream } from "../models/Order.model";import { OrderItemStream } from "../models/OrderItem.model"; export const TABLE_STREAM_MAP: Record<string, any> = { products: ProductStream, customers: CustomerStream, orders: OrderStream, order_items: OrderItemStream,};This mapping is what makes the fan-out deterministic and ensures each source table's changes flow through the correct stream and into the correct ClickHouse table.
This function acts as our fanout point.
When cdc events are posted from Debezium, we need to read the table name from the payload and route the cdc event to the correct stream.
File location: moosestack/src/cdc/processDebeziumEvent.ts
import { DebeziumEventEnvelope, DebeziumChangeEventStream,} from "./DebeziumChangeEvent.model";import { TABLE_STREAM_MAP } from "./tableStreamMap"; /** * Process and route CDC events to table-specific Redpanda topics * * ReplacingMergeTree CDC fields: * - ts_ms: Version column from payload.ts_ms (used to determine newest row) * - isDeleted: 1 for delete operations, 0 otherwise (ReplacingMergeTree collapses deleted rows) */export default async function processDebeziumEvent(envelope: DebeziumEventEnvelope): Promise<void> { console.log(`[CDC] Processing event: ${JSON.stringify(envelope)}`); const event = envelope.payload; const { source, op, before, after, ts_ms } = event; const sourceTable = source.table; const targetStream = TABLE_STREAM_MAP[sourceTable]; // Unknown table - log and skip if (!targetStream) { console.warn(`[CDC] Unknown table: ${sourceTable}`); return; } // Determine data and deleted flag based on operation type let rowData: Record<string, any> | null = null; let isDeleted: number = 0; switch (op) { case "c": // CREATE case "r": // READ (snapshot) case "u": // UPDATE rowData = after ?? null; isDeleted = 0; break; case "d": // DELETE - use 'before' data since 'after' is null for deletes rowData = before ?? null; isDeleted = 1; break; default: console.warn(`[CDC] Unknown op: ${op} for ${sourceTable}`); return; } if (!rowData) { console.warn(`[CDC] No data in ${op} event for ${sourceTable}`); return; } // Add CDC metadata columns for ReplacingMergeTree // Ensure isDeleted is explicitly UInt8 (0 or 1) for ClickHouse // Use bitwise OR with 0 to ensure it's an integer, not Float64 const data = { ...rowData, ts_ms: ts_ms, // Version column - determines which row is newest isDeleted: (isDeleted | 0) as 0 | 1, // isDeleted flag - 1 for deletes, 0 otherwise (UInt8) }; // Publish directly to table's Redpanda topic try { await targetStream.send(data); console.log( `[CDC] ${op.toUpperCase()} ${sourceTable} → Redpanda topic (ts_ms=${ts_ms}, isDeleted=${isDeleted})` ); } catch (error: any) { console.error(`[CDC] Failed to publish ${sourceTable}:`, error.message); throw error; // Trigger DLQ }} // Wire up the streaming functionDebeziumChangeEventStream.addConsumer(processDebeziumEvent);Placeholders to replace:
<YOUR_DATABASE_NAME> - your MS SQL database name (required)<YOUR_HTTP_SINK_URL> - your HTTP sink endpoint URL (required)<YOUR_MSSQL_PASSWORD> - your MS SQL SA password (required)kubectl apply -f namespace.yaml
# namespace.yaml---apiVersion: v1kind: Namespacemetadata: name: poc-debezium labels: name: poc-debezium app: debezium-kafka-connectkubectl apply -f configmap.yaml
# configmap.yaml---apiVersion: v1kind: ConfigMapmetadata: name: debezium-server-config namespace: poc-debeziumdata: application.properties: | # Debezium Server Source - MS SQL Server debezium.source.connector.class=io.debezium.connector.sqlserver.SqlServerConnector debezium.source.offset.storage.file.filename=/debezium/data/offsets.dat debezium.source.offset.flush.interval.ms=10000 # MS SQL Server Connection debezium.source.database.hostname=mssql.poc-mssql.svc.cluster.local debezium.source.database.port=1433 debezium.source.database.user=sa debezium.source.database.password=<YOUR_MSSQL_PASSWORD> debezium.source.database.names=<YOUR_DATABASE_NAME> debezium.source.database.encrypt=false debezium.source.database.trustServerCertificate=true # Server name - used as namespace for topic names and schema names debezium.source.topic.prefix=mssql debezium.source.database.server.name=mssql-server # Table whitelist debezium.source.table.include.list=* # Schema history - MS SQL Server CDC requires this # Use file-based schema history debezium.source.schema.history.internal=io.debezium.storage.file.history.FileSchemaHistory debezium.source.schema.history.internal.file.filename=/debezium/data/schema-history.dat # Snapshot mode debezium.source.snapshot.mode=schema_only # Snapshot optimization - conservative for low-impact CDC debezium.source.snapshot.max.threads=1 debezium.source.snapshot.fetch.size=1000 # Skip schema change events - only emit data changes debezium.source.include.schema.changes=false # HTTP Sink Configuration debezium.sink.type=http debezium.sink.http.url=<YOUR_HTTP_SINK_URL> debezium.sink.http.timeout.ms=60000 # HTTP Sink Retry Settings debezium.sink.http.retries=5 debezium.sink.http.retry.interval.ms=3000 # Connector Error Handling and Retries debezium.source.errors.max.retries=5 debezium.source.errors.retry.delay.initial.ms=300 debezium.source.errors.retry.delay.max.ms=10000 # Consumer configuration - conservative for low-impact debezium.source.max.batch.size=2048 debezium.source.poll.interval.ms=120000 # Queue capacity - reduced for lower memory footprint debezium.source.max.queue.size=8192 debezium.source.max.queue.size.in.bytes=104857600 # JSON format configuration debezium.format.key=json debezium.format.value=json debezium.format.value.json.schemas.enable=false # Logging quarkus.log.level=INFO quarkus.log.category."io.debezium.server.http".level=DEBUG quarkus.log.console.json=falsekubectl apply -f secret.yaml
# secret.yaml---apiVersion: v1kind: Secretmetadata: name: debezium-secrets namespace: poc-debeziumtype: OpaquestringData: # The password for the MS SQL Server 'sa' user MSSQL_SA_PASSWORD: "<YOUR_MSSQL_PASSWORD>"kubectl apply -f statefulset.yaml
# statefulset.yaml---apiVersion: apps/v1kind: StatefulSetmetadata: name: debezium-server namespace: poc-debezium labels: app: debezium-serverspec: serviceName: debezium-server replicas: 1 selector: matchLabels: app: debezium-server template: metadata: labels: app: debezium-server spec: securityContext: fsGroup: 1001 runAsUser: 1001 runAsNonRoot: true initContainers: - name: fix-permissions image: busybox:latest command: - sh - -c - chmod -R 777 /debezium/data && chown -R 1001:1001 /debezium/data securityContext: runAsUser: 0 runAsNonRoot: false volumeMounts: - name: data mountPath: /debezium/data containers: - name: debezium-server image: quay.io/debezium/server:2.7 env: - name: MSSQL_SA_PASSWORD valueFrom: secretKeyRef: name: debezium-secret key: MSSQL_SA_PASSWORD - name: JAVA_OPTS value: "-Xms4g -Xmx4g -XX:+UseG1GC -XX:MaxGCPauseMillis=200" volumeMounts: - name: config mountPath: /debezium/conf - name: data mountPath: /debezium/data resources: requests: memory: "4Gi" cpu: "2000m" limits: memory: "4Gi" cpu: "2000m" livenessProbe: exec: command: - /bin/sh - -c - test -e /proc/1/exe initialDelaySeconds: 120 periodSeconds: 60 timeoutSeconds: 10 failureThreshold: 5 volumes: - name: config configMap: name: debezium-server-config volumeClaimTemplates: - metadata: name: data spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gikubectl apply -f service.yaml
# service.yaml---apiVersion: v1kind: Servicemetadata: name: debezium-server namespace: poc-debezium labels: app: debezium-serverspec: selector: app: debezium-server ports: - name: http port: 8080 targetPort: 8080 type: ClusterIPThis section will guide you through the process of selecting a dashboard/report component to be improved, through:
A branch is used not just for managing your code versioning, but also for spinning up "preview branches" in Boreal.
git checkout -b <new-branch-name>moose seedYou can seed your local development environment with the moose seed command.
Command:
moose seed clickhouse --connection-string <BOREAL_CONNECTION_STRING> --limit 100You can set the limit to whatever amount of data you want to bring locally, or, alternatively, you can copy all the data from remote to local (be careful with this approach):
Command:
moose seed clickhouse --connection-string <BOREAL_CONNECTION_STRING> --allBOREAL_CONNECTION_STRING is your connection string to a remote/cloud database. In Boreal, you can find that here:
Note, there are a few forms of connection string, this query requires HTTPS URL.
You can set your Boreal connection string as a local environment variable for ease of use:
Command:
echo 'BOREAL_CONNECTION_STRING="PASTE_THE_HTTPS_URL_FROM_BOREAL_HERE"' >> .env.local && set -a && source .env.local && set +aOr export directly:
export BOREAL_CONNECTION_STRING=PASTE_THE_HTTPS_URL_FROM_BOREAL_HEREYour analytical components in your frontend can be made drastically more performant with Fiveonefour. In order to accomplish this, you need to migrate dashboard reports from the existing transactional-backed implementation to an analytical-backed implementation (using the Fiveonefour stack) without changing how the frontend talks to the backend. The recommended approach of this guide allows you to keep:
The primary change is where the data comes from (analytical database tables/materialized views) and how the SQL is executed (ClickHouse).
Your analytical components in your frontend can be made drastically more performant with Fiveonefour. In order to accomplish this, you need to migrate dashboard reports from the existing transactional-backed implementation to an analytical-backed implementation (using the Fiveonefour stack) without changing how the frontend talks to the backend.
The recommended approach allows you to keep:
The primary change is where the data comes from (ClickHouse tables/materialized views) and how the SQL is executed (ClickHouse instead of MSSQL).
A dataset definition is the shared model used by both the frontend and backend to define the expected operations for a dataset, including:
Since each dataset definition has its own distinct API handler, we want to ensure that we do not hand-write the same dynamic SQL in every handler. Instead, this approach implements a shared set of "query helpers" (utils/query-helpers.ts) that you can use to convert the dataset definition model and the runtime API request payload into ClickHouse SQL.
The query helpers (query-helpers.ts) allow you to reuse common SQL building logic for each dataset:
For each distinct dataset, you define a thin adapter (called a ParamMap) that declares the mapping between the frontend dataset definition model and the corresponding column names (or SQL expressions for derived metrics) in the ClickHouse tables/materialized views that contain the data.
The shared query helpers used in this guide are not yet part of main.
They currently live on the query-helpers branch, but you do not need to switch branches to use them.
Instead, simply copy the relevant folders from the query-helpers branch into your current working branch. Specifically:
moosestack/src/utils/query-helpers.tsmoosestack/src/utils/USAGE.mdOnce copied, the helpers can be used directly in your existing branch, and all examples below assume those files are present locally.
These helpers are still evolving, which is why they haven't been merged into main yet. Until then, copying them into your branch is the intended workflow.
The query-helpers.ts module provides a three-layer architecture:
This layer allows you to easily create a runtime validator based on the incoming request payload for the dataset handler:
import { PaginationParams, assertValidOrThrow } from "./query-helpers";import typia from "typia"; interface MyDatasetParams extends PaginationParams { merchantId?: number; orderStatus?: string;} const validator = typia.createValidate<MyDatasetParams>();const validated = assertValidOrThrow(validator(params));This layer allows you to map the parameter names in the incoming request payload to the corresponding column names in the ClickHouse tables/materialized views. This mapping is then used to build the ClickHouse SQL for the dataset handler, using the validated request payload.
import { createParamMap } from "./query-helpers"; const orderMetricsParamMap = createParamMap<OrderFilters, OrderMetricsRow, OrderField>(OrderMetricsTable, { filters: { days: { column: "days" }, merchantIds: { column: "merchant_id", operator: "in" }, }, fields: { MerchantName: { column: "merchant_name", alias: "MerchantName" }, FulfillmentRate: { toSql: () => sql`round(100 * sum(fulfilled_orders) / nullIf(sum(total_orders), 0), 2)`, alias: "Fulfillment Rate (%)", isAggregate: true, // Triggers automatic GROUP BY }, }, defaultSelect: ["merchant_name", "order_status"], defaultOrderBy: [{ column: "merchant_name", direction: "ASC" }],});This layer allows you to generate the complete ClickHouse SQL for the dataset handler, using the validated request payload and the ParamMap configuration.
Option A: Simple queries - Generate complete SQL with toQuerySql:
import { toQuerySql } from "./query-helpers"; const intent = orderMetricsParamMap.toIntent({ fields: ["MerchantName", "FulfillmentRate"], filters: { days: 30 }, pagination: { limit: 100, offset: 0 },}); const query = toQuerySql(OrderMetricsTable, intent);const result = await client.query.execute(query);Option B: Complex queries - For cases where you need to build a more complex query with multiple CTEs, JOINs, or other custom logic, you can use the individual converters for each part of the query:
import { toSelectSql, toWhereSql, toOrderBySql, toGroupBySql } from "./query-helpers"; const selectClause = toSelectSql(OrderMetricsTable, intent.select);const whereClause = toWhereSql(OrderMetricsTable, intent.where);const groupByClause = toGroupBySql(intent.groupBy);const orderByClause = toOrderBySql(intent.orderBy); // Compose custom query with CTEsconst customQuery = sql` WITH MerchantOrders AS (${getMerchantOrdersQuery(merchantNames)}) SELECT ${selectClause} FROM ${OrderMetricsTable} WHERE ${whereClause} ${groupByClause} ${orderByClause} LIMIT ${intent.pagination.limit}`;Because query generation is centralized in the shared helpers, every dataset handler starts from the same contract: a standard dataset definition plus the incoming API request (filters, ordering, pagination, etc.). That helper emits consistent SQL fragments, which means new handlers don't need to re-implement boilerplate logic like ORDER BY, filter parsing, or paging rules. They only supply the dataset-specific mapping and let the mapper and helpers do the rest.
In practice, you can translate datasets manually one at a time using this pattern. But once the structure and conventions are clear, this is also a strong candidate for automation—AI copilots can reliably generate the required mappings, queries, and glue code when given the right context and constraints.
For each dataset translation, the AI should produce the same set of artifacts:
The general concept of the approach is this:
In this case, we recommend providing the AI access to the following main sources of context:
For this particular example, you would provide context similar to a specification document that details the business requirements, expected input/output schemas, and sample test data. Test data can be manually extracted from the frontend or generated from the existing system.
In previous implementations, expected output (to validate accuracy) arrived later, so it was used only to validate the AI-generated endpoints. If you have this data upfront, include it in the main context document so it can inform the reasoning process, not just the final validation.
Choose a specific dashboard component or report to migrate. For this example, let's use an e-commerce order fulfillment dashboard:
OrderFulfillmentSelect Plan mode (selected near the model selector in the chat pane), and choose an advanced reasoning model of your choice (like Claude Opus 4.5, GPT Codex 5.1 Max or GPT 5.2).
This plan refers the agent to the chosen component, the context files, and ensures that the patterns that are used in the project are followed. This is a prompt that we have had success with, but please feel free to modify to your chosen style of prompting.
This prompt refers the agent to the existing project structure, to the test data in the local database, to be able to test against remote data, and to the specification document containing business logic and test data.
Note, to find the base URL of the production deployment, used in the prompt, see above.
Task: implement dataset "OrderFulfillment" for the dataset API endpoint.The MSSQL used in the original system, the input payload and the expected output are in the specification document for this component. I want you to test the system and the endpoint (curl the endpoint). Iterate on the query until it is correct. If you need to use cascading queries to emulate materialized views that will later be implemented as part of testing, do so. Use the query-helpers patterns from:@moosestack/src/utils/query-helpers.ts@moosestack/src/utils/USAGE.md Reference existing handlers in your project for the implementation pattern. Use the MCP to test that the objects you are creating are being validly created (note, there is about a one minute reload from changing files to the local dev server updating, so wait). The local database has been seeded with data for testing. You can also use `moose query` from moosestack/ to query the development database. Test the endpoint against this route:Method: POSTEndpoint: /datasetDescription: Express WebApp endpoint The base URL is:https://default:<PASSWORD>@<HOST>:8443/<DATABASE_NAME> This will allow you to test the ClickHouse queries against production data. Make sure that all of your queries return data before even starting to implement MooseStack code. Once the query is working, then work on implementing in code. Show me each test at each step.This will generate a plan, using the relevant context.
See example generated plan in the appendix.
The following steps can help refine the generated plan.
Check if stored procedures referenced in the plan have already been implemented.
For example, the generated plan might refer to getUserPermissions:
1. **User Permissions**: The MSSQL query uses `getUserPermissions` stored procedure. For initial implementation, we may need to:You can list these in the follow up prompt for best results:
Analyse the MooseStack project to see if the referenced stored procedures are already implemented: getUserPermissionsIf they haven't been implemented, it is likely that they need to be implemented as a materialized view:
Add implementation of that to the plan (you will likely have to create a materialized view)For an abundance of caution, ensure again that you've highlighted the test criteria:
Ensure that the output is EXACTLY as defined in the document. Ensure this is adhered to.Add documentation tasks you want the agent to produce alongside the implementation, e.g.
Generate OpenAPI spec for the created endpointsAdd a README summarizing endpoints and data lineageInclude example curl requests + sample responses, schema diagrams / entity summary (if relevant), and a "how to test locally" sectionAdd inline JSDoc for handlers, params, and response shapesOnce you are satisfied with the plan, hit Build. This will experiment with SQL queries, build up materialized views and implement the new endpoint.
You will likely be asked for various permissions along the way, including for MCP tool use to check work against local data, and curl commands to check work against the remote database.
Example of tool-use in the process of implementing the plan:
You can ask the LLM to generate curl commands that you can use to test the generated API endpoints:
Generate curl commands to test this new endpointWhich returns:
Command:
curl -X POST http://localhost:4000/dataset \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR_API_TOKEN>" \
-d '{
"datasetId": "<YOUR_DATASET_GUID>",
"parameters": {
"limit": "10",
"offset": "0",
"orderby": "MerchantName Asc"
}
}' | jq '.'You can iterate on this curl command to test against known data values.
You can also instruct the agent to take the success criteria defined in the specification document and create a test suite, showing you the results. It is best practice to audit and get comfortable with the test suite and the output to ensure that it covers actual usage.
Generate a test script that iterates through the generated API, with the test cases defined in the specification documentWhich generates a test script with multiple test cases, testing general API functionality like security, pagination, etc., as well as tests for input and output schemas, and exact input and output data for known values, e.g.:
Test cases: ✅ Basic request with default parameters (limit 10, offset 0, ascending)✅ Request with limit 5✅ Request with offset (pagination - offset 10)✅ Request with descending order✅ Request with cacheId✅ Request with security filter (merchant)✅ Request with parameter.datasource parameter✅ Request with orderby leading space (edge case)✅ Exact structure validation (validates response structure matches document)✅ Exact value validation (validates first 10 records match document exactly)At this point in the migration, you typically have a report that is functionally correct but still not where you want it performance-wise.
Continuing the order fulfillment dataset example from the previous section, the original transactional implementation often relies on layered logic: permission filtering, joins across multiple tables, and derived metrics computed on the fly.
A simplified version of that original logic might look like this:
WITH UserMerchants AS ( SELECT merchant_id FROM getUserPermissions(:user_id)),OrdersWithMerchant AS ( SELECT o.created_at, m.merchant_name, o.total_orders, o.fulfilled_orders FROM order_metrics o JOIN merchants m ON m.id = o.merchant_id WHERE o.merchant_id IN (SELECT merchant_id FROM UserMerchants)),AggregatedMetrics AS ( SELECT merchant_name, count(*) AS total_orders, sum(fulfilled_orders) AS fulfilled_orders, sum(total_orders) AS total_orders FROM OrdersWithMerchant GROUP BY merchant_name)SELECT *FROM AggregatedMetricsORDER BY merchant_name;When migrating this dataset to ClickHouse using MooseStack, the first step is to translate this logic directly and wire it into a dataset handler using the shared query helpers. At this stage, the priority is parity: matching the existing API behavior and results exactly.
However, notice what's happening here:
At low traffic, this may be acceptable. At production scale, where the same underlying dataset is queried frequently with different filters or sort orders, it becomes the dominant performance bottleneck.
This is the inflection point where you stop asking:
"How do I translate this query?"
and start asking:
"Which parts of this query should only be computed once?"
That's where Materialized Views come in.
Materialized Views (MVs) are the primary mechanism for turning a correct report migration into a fast one.
When migrating a dashboard or report from a transactional system to MooseStack, the first goal is functional parity: reproduce the same results using ClickHouse without changing frontend request/response contracts. In practice, this often starts with a direct translation of the original SQL. This sometimes involves long CTE chains, complex joins, window functions, or layered business logic that was previously executed at read time.
If the translated query is correct but still complex or expensive to run per request, that is the signal to introduce a Materialized View.
The core idea is to shift expensive computation from read time to insert time:
If your final dataset query depends on multiple CTEs or significant data reshaping before the final SELECT, consider moving that logic into a Materialized View so it is computed once as data arrives, rather than on every API request.
In MooseStack, Materialized Views allow you to precompute joins, aggregations, and derived fields and persist the results in a physical table. Your API handlers then query these precomputed tables instead of executing the full, complex query at request time. This keeps dataset handlers thin and predictable, and makes performance largely independent of query complexity.
You should strongly consider adding an MV if the migrated query includes one or more of the following:
In these cases, Materialized Views act as a boundary between data preparation and data serving.
This is the most common change you'll make. Typical reasons include:
To do this correctly, follow these steps:
Locate the Materialized View definition in your MooseStack project. Each MV explicitly defines its output schema.
Add the new column to the schema with the correct ClickHouse type. Use the type of the final computed value, not the raw source column.
Because the MV's backing table is already typed from this schema, saving the file will automatically apply the schema change locally.
Once saved, watch the moose dev logs. You should see the backing table updated with an additive column change.
After adding the column to the schema, you will likely see an error. This is expected.
At this point, the backing table expects the new column, but the MV's SELECT statement does not yet produce it.
Edit the SQL query defining the Materialized View and ensure that:
SELECT listOnce the SQL is updated and saved, the error should resolve and the MV will resume populating normally for new data.
Note that only new data will begin populating the new column. Existing rows in the MV are unchanged until you explicitly backfill (covered later in this guide).
Add the files you created above (or just the functional files if you don't want to commit your test scripts) and push to your version control. Create a Pull Request.
This branch will later be used in Boreal for a Branch Deployment, automatically triggered by creating the PR.
One suggested pattern for accelerating iterations and end-to-end testing is to run your entire stack locally. This includes your frontend, backend / gateway API, and your moose dev server. You can connect the rest of your application to your new analytical APIs in the local moose dev server using token authentication. Once connected, you can see data in your frontend and test the whole application without having to go through any cloud deployment.
This section will guide you through the process of applying your local changes to production:
(Skip this step if you already have completed Boreal onboarding.)
To deploy changes to production, you need access to Boreal and to the organization that owns the project.
Once you've joined the correct organization, you should be able to see the project in Boreal and proceed with your production rollout.
Return to your IDE and confirm the following before moving on:
moose dev starts without errorsIf all three checks pass, then you're ready for the final pre-production step: ensuring your changes can be deployed without breaking anything in production. To do this, you'll generate and review a migration plan.
Open your terminal (ensure you cd to your MooseStack project root). Then run:
Command:
moose generate migration --save --url <BOREAL_HOST> --token <BOREAL_ADMIN_API_BEARER_TOKEN>Parameters:
BOREAL_HOST is the host for your production deployment in Boreal. Copy it from the URL in your project overview dashboard:
BOREAL_ADMIN_API_BEARER_TOKEN is sent in the request header when calling the Boreal Admin API at BOREAL_HOST. This is the API key. It is a secret and must not be committed to source control. Store it securely in a password manager.After successfully running moose generate migration with the correct --url and --token, a new /migrations directory should appear at the root of your MooseStack project. Open the plan.yaml file in that directory and review it carefully.
Review the migration plan to confirm which SQL resources will be created or modified. Make sure it matches exactly what you intend to ship. As a rule of thumb:
This is expected when optimizing queries. Any new materialized view should result in:
CreateTable operation that creates the backing table for the viewSqlResource operation containing the CREATE MATERIALIZED VIEW statement, with the view explicitly writing TO that backing tableSeeing both confirms the Materialized View is being added cleanly and additively. For every new Materialized View in your branch, there should be exactly one CreateTable and one SqlResource defining it.
Column-level changes are uncommon. If you encounter them:
There are a small number of cases where column changes are expected:
AddTableColumn operation applied to the backing table for that view.DropTableColumn followed by an AddTableColumn. If this rename was intentional, replace those two operations with a single RenameTableColumn operation instead.Outside of these cases, column-level changes should be treated with caution, especially DropTableColumn or ModifyTableColumn operations. These changes are strongly discouraged. Instead, stick to strictly additive migrations. Undo the delete or modification in your OlapTable object, and introduce a new column instead.
DropTable operationsIf you see any DropTable operations, proceed with extreme caution and review your changes carefully. They may indicate that an OlapTable or MaterializedView object defined in the codebase (and currently used in production) is being deleted, which can result in irreversible data loss if applied unintentionally.
If the plan shows changes you did not anticipate, stop and resolve that before proceeding.
Once the plan looks correct, you're ready to continue with preview and production rollout.
Commit your changes, push them to a new branch, and open a pull request targeting main.
Once the PR is open, Boreal automatically deploys an isolated preview (staging) environment for the branch. Your code and the generated plan.yaml are applied to a staging database forked from production, so changes are created exactly as they would be in prod.
Confirm that boreal-cloud bot appears in the PR comments.
This confirms that:
If the bot does not appear, double check that you have correctly integrated your Github account with your Boreal account. If something doesn't look right, reach out to the 514 team for help.
In the boreal-cloud bot comment, you'll see a table. Click the link in the Project column (the link text will match your branch name). This opens the Boreal project dashboard with your preview environment selected.
From here, you'll inspect the database state and validate that the resources created by your migration plan match what you reviewed and expected before proceeding.
In this step, you'll query the staging ClickHouse database directly using ClickHouse's HTTPS interface.
First, get the database HTTPS connection string from Boreal using the same steps you followed earlier. Make sure the Boreal dashboard is set to your feature branch, not main. You can confirm this by checking the branch selector in the left sidebar of the project dashboard.
Create a temporary environment variable for your staging database URL:
Command:
export STAGING_DB=<your-staging-db-connection>You can now safely use $STAGING_DB to run queries against the staging database via curl.
In a terminal, run:
Command:
curl -sS \
$STAGING_DB \
--data-binary 'SHOW TABLES'Expected Response:
You should see a plain-text list of all tables in the staging database if the command executed successfully:
customersproductsordersorder_itemsmerchantsorder_metrics_daily...Use this output to confirm that:
plan.yaml existIf the list of tables does not match what you reviewed in the migration plan, stop here and fix the issue before proceeding.
Do not merge until the preview environment reflects exactly the database resources and behavior you expect to see in production.
If everything lines up as you expect, you're ready to merge!
Merge your PR and now do the same thing: click the Boreal bot to view the deployment page. You should see the logs from the deployment and status there. The deployment should take a few minutes.
If your migration introduces any new Materialized Views, they will start populating only for new incoming data. To apply them to historical data, you must explicitly backfill them from existing ClickHouse tables.
This step uses the same HTTPS + curl workflow as before, but targets the production (main) database and performs a write operation to apply the backfill.
Open the migration plan.yml you just shipped and find the new materialized views you created (look for the CREATE MATERIALIZED VIEW statements in SqlResource).
For each one, note two things:
TOBackfilling is done by inserting historical rows into the MV's backing table using the same SELECT logic used by the view.
In a terminal, run:
Command:
curl -sS \
'$BOREAL_CONNECTION_STRING' \
--data-binary "
INSERT INTO <mv_backing_table>
SELECT ...
"Use the exact SELECT statement from the CREATE MATERIALIZED VIEW definition (or the underlying SELECT you used when building it) and paste it in place of SELECT ....
After each backfill, sanity check that the backing table now has rows:
Command:
curl -sS \
$BOREAL_CONNECTION_STRING \
--data-binary 'SELECT count() FROM <mv_backing_table>'Expected Response:
If the count is non-zero (and roughly matches what you expect), the backfill is complete.
Only backfill the MVs introduced in this change. Avoid reprocessing older MVs unless you intentionally want to rebuild them.
Command:
wsl --installThis one command enables the necessary Windows features and downloads the latest Ubuntu Linux distribution by default [4]. If prompted, restart your computer to complete the WSL installation. (On Windows 11, a restart may happen automatically.)
Note: If WSL was already partially installed and the above command just shows help text, you can list available distros with wsl --list --online and then install a specific one with wsl --install -d Ubuntu [5].
Initial Ubuntu setup: After reboot, Windows should automatically launch the newly installed Ubuntu for its first-time setup. If it doesn't open on its own, you can launch "Ubuntu" from the Start menu. A console window will appear as Ubuntu initializes (this may take a minute as files decompress) [6].
Create Linux user: You will be prompted to create a new UNIX username and password for the Ubuntu instance (this is separate from your Windows credentials) [7]. Enter a username and a secure password — you won't see characters as you type the password (that's normal). This will create a new user account in the Ubuntu environment and then drop you into a Linux shell.
Verify WSL installation: Once setup is complete, you should see a Linux terminal prompt (e.g. something like username@DESKTOP-XYZ:~$). At this point, you have a Ubuntu WSL instance running. You can check that it's using WSL version 2 by opening a PowerShell/Command Prompt (not the Ubuntu shell) and running wsl -l -v. It should list your Ubuntu distro with version "2". If it says version "1", upgrade it with wsl --set-version Ubuntu 2.
Now that Ubuntu is running in WSL, update its package lists and upgrade installed packages:
Update apt repositories: In the Ubuntu terminal, run:
Command:
sudo apt update && sudo apt upgrade -yThis fetches the latest package listings and installs any updates. It's good practice to do this right after installing a new Linux distro [8]. (The -y flag auto-confirms prompts during upgrade.)
(Optional) Install basic tools: You may want to install some common utilities. For example, you can run sudo apt install -y build-essential curl git to ensure you have compilers and Git available. This is not strictly required for MooseStack, but can be useful for general development.
MooseStack uses Docker containers under the hood for components like ClickHouse (database), Redpanda (streaming), etc. On Windows, the recommended way to run Docker with WSL2 is via Docker Desktop [9]
Install Docker Desktop:
Download and Install: If you don't have it already, download Docker Desktop for Windows from the Docker website and run the installer. During installation, it should prompt to enable WSL2 features if not already enabled (which we did in Step 1). Follow the prompts to install Docker Desktop.
Start Docker Desktop: After installation, launch Docker Desktop. In the taskbar, you'll see the Docker whale icon appear (it may take a few seconds to start up the first time).
Configure Docker Desktop with WSL2
Integrate with Ubuntu: Still in Settings, go to Resources > WSL Integration. You should see a list of your WSL distributions. Find Ubuntu (or the distro you installed) and enable integration for it (toggle it on). This allows Docker containers to be managed from within that WSL instance.
Allocate sufficient resources: Docker Desktop by default might allocate limited resources. MooseStack's stack requires at least 2.5 GB of memory for Docker. Go to Resources > Advanced (or Resources > Memory) and increase the memory to 3-4 GB to be safe. You can also adjust CPUs if needed (2 CPUs is usually fine for dev).
Apply settings: Click "Apply & Restart" if you changed any Docker settings. Docker Desktop will restart its engine to apply the new configuration.
Test Docker in WSL: Open a new Ubuntu WSL terminal (or use the existing one). Run docker --version to ensure the Docker CLI is accessible, and then run a test container:
Command:
docker run hello-worldExpected Response:
The hello-world container should download and run, printing a "Hello from Docker!" message and exiting. This confirms that Docker is working inside WSL (via Docker Desktop). You can also try docker ps (which should list no running containers, until the Moose dev containers start later) to verify the Docker daemon is reachable from WSL.
Troubleshooting: If docker run hello-world fails, ensure Docker Desktop is running and that WSL integration is enabled for your distro. In some cases, you might need to install the Docker CLI tools inside WSL, but Docker Desktop usually handles that by exposing the docker command in WSL [10]. Also verify that your Ubuntu WSL is set to version 2 (as Docker won't work with WSL1).
You now have a working Ubuntu instance on your windows machine with Docker installed. You can proceed to install MooseStack and its requirements (Node.js 20+ and/or Python 3.12+) as if you were installing on a regular linux machine. Just make sure to install from the Linux terminal prompt in the Ubuntu instance.
Primarily driven by change volume (rows changed / day) and retention period
| Component | Expected Impact |
|---|---|
Tracking tables (cdc.*_CT) | ~10-30% of source table sizes (not 2x) |
| Retention | Default 3 days, then auto-purged |
| Transaction log | May grow if capture job lags |
Primarily driven by change rate (changes / sec) and the number of CDC enabled tables
| Component | Expected Overhead |
|---|---|
| CDC Capture Job | Low additional CPU |
| Multiple tables | Low-moderate (well within normal) |
| Polling interval (e.g. 2-min) | Negligible (just reads) |
Primarily driven by write volume to tracking tables and transaction log read rate
| Operation | Expected Overhead |
|---|---|
| Transaction log reads | Low additional read I/O |
| Tracking table writes | Low additional write I/O |
| Debezium queries (periodic) | Minimal - batched reads |
Primarily driven by the number of CDC-enabled tables and their row size
| Component | Expected Overhead |
|---|---|
| Capture job buffers | Low, measured in MB |
| Tracking table indexes | Proportional to change volume |
The Moose LSP for Typescript is an experimental feature available for early access. It enables SQL validation and syntax highlighting for sql strings in Typescript objects in your Moose project. It currently has known limitations around nested SQL and SQL fragments that will be incorrectly highlighted as errors. If you would like access to experiment with this feature, let us know!
Beta distributions in IDE extension marketplaces (installation/access not guaranteed):
Prompt your copilot to seed your local database from each remote table:
Goal: seed local ClickHouse from a remote Boreal ClickHouse via HTTP using `moose query`. Assumptions:- `BOREAL_CONNECTION_STRING` is set and looks like: `https://default:PASSWORD@HOST:8443/DB_NAME`- Local ClickHouse has the same tables already created.- Use `url()` + `JSONEachRow` to stream data from remote into local. Steps: 1) Extract remote base URL + database name from `BOREAL_CONNECTION_STRING`.2) Get list of tables (exclude system + materialized views).3) For each table, generate a schema string ("col1 Type1, col2 Type2, ...") from `DESCRIBE TABLE ... FORMAT JSONEachRow`.4) For each table, generate a seed SQL file with this pattern: INSERT INTO <table> SELECT * FROM url( '<base_url>/?database=<db>&query=SELECT+*+FROM+<table>+FORMAT+JSONEachRow', JSONEachRow, '<schema_string>' ); (No LIMIT clauses: copy all rows.) 5) Run all generated seed SQL files with `moose query`.6) Verify row counts for each table (use FINAL) using `moose query`. Implement as a single bash script.This is an example of a plan generated by an AI copilot for implementing a dataset API endpoint. Use this as a reference for the level of detail and structure your own AI-generated plans should have.
# Implement Order Fulfillment Dataset API Endpoint ## Overview Implement the OrderFulfillment dataset endpoint. The endpoint should return order fulfillment metrics by merchant and time period, matching the MSSQL query behavior from the original system. ## Analysis ### MSSQL Query Structure (from specification) The original MSSQL query: 1. Gets user permissions via `getUserPermissions` stored procedure2. Joins `merchants` → `merchant_properties` → `order_metrics`3. Returns: `MerchantId`, `MerchantName`, `FulfillmentRate`, `TotalOrders`4. Filters based on user permissions (if user doesn't have view-all access, filters by authorized merchants) ### Expected Response Format Based on the API payload/response in the specification: - **Payload fields**: `distinct([MerchantName]) AS [value], [FulfillmentRate] AS [metric], [TotalOrders]` - **Response structure**: ```json { "value": "Acme Corp", // MerchantName "metric": 95.5, // FulfillmentRate "TotalOrders": 1250 // TotalOrders } ``` ### ClickHouse Tables Available - `merchants` - Merchant/company information (`merchant_name`)- `merchant_properties` - Merchant properties and configurations- `order_metrics` - Order and fulfillment data- `orders` - Individual order records- `order_items` - Line items for orders ## Implementation Plan ### Phase 1: Query Development & Testing 1. **Convert MSSQL to ClickHouse query** - Map MSSQL syntax to ClickHouse equivalents- Handle user permissions (simplify initially - may need to implement `getUserPermissions` logic later)- Join: `merchants` → `merchant_properties` → `order_metrics`- Calculate: FulfillmentRate = (fulfilled_orders / total_orders) * 100- Return: `MerchantName`, `FulfillmentRate`, `TotalOrders` 2. **Test query against production ClickHouse** - Base URL: `https://default:<PASSWORD>@<HOST>:8443/<DATABASE_NAME>`- Verify query returns data- Check column names and data types match expected format- Iterate until query is correct 3. **Handle cascading queries if needed** - If materialized views are needed, use CTEs or subqueries to emulate them- Test each step of the query chain ### Phase 2: Handler Implementation 1. **Create handler file**: `moosestack/src/analytics/apis/order-fulfillment.api.ts` - Follow pattern from existing API handlers- Use `ApiUtil` with `client` and `sql` helpers- Handle security/merchant filtering if needed 2. **Implement query with ClickHouse syntax** - Use `sql` template tag for type-safe queries- Import required table models- Handle user permissions (may need to simplify for initial implementation) 3. **Format response to match expected structure** - Return format matching the API response structure: ```typescript { cacheId: string; offset: number; limit: number; totalRecordCount: number; columns: Array<{name, dataField, displayName, type, precision, scale}>; recordsCount: number; records: Array<{value, metric, TotalOrders}>; failures: {}; warnings: {}; } ``` ### Phase 3: Registration & Testing 1. **Register handler in dataset registry** - Update `moosestack/src/dataset/registry.ts`- Add handler for the OrderFulfillment dataset 2. **Test endpoint via curl** - Test against route: `http://localhost:4000/dataset` (POST)- Use proper authentication header- Test with sample payload matching the specification format- Verify response matches expected structure 3. **Iterate and fix** - Fix any query issues- Adjust response formatting if needed- Handle edge cases (empty results, permissions, etc.) ## Files to Modify 1. **New file**: `moosestack/src/analytics/apis/order-fulfillment.api.ts` - Handler function: `fetchOrderFulfillmentData`- Query implementation- Response formatting 2. **Update**: `moosestack/src/dataset/registry.ts` - Import `fetchOrderFulfillmentData`- Register handler for OrderFulfillment dataset ## Key Considerations 1. **User Permissions**: The MSSQL query uses `getUserPermissions` stored procedure. For initial implementation, we may need to: - Simplify to return all merchants (if user context isn't available)- Or implement a simplified version of the permission logic 2. **Field Mapping**: Ensure correct mapping between frontend field names and database columns.3. **Security Filtering**: May need to handle merchant-level filtering for multi-tenant scenarios.4. **Ordering**: Implement sorting by merchant name or fulfillment rate as specified.5. **Pagination**: Implement `limit` and `offset` for large result sets. ## Testing Strategy 1. Test ClickHouse query directly first (before implementing handler)2. Show query results at each iteration3. Test handler function in isolation4. Test full endpoint with curl5. Validate response structure and data accuracySecurity appendix coming soon. Security protocols have already been implemented in this project.