Static Report Generation

TL;DR

Static reports are pre-generated snapshots (PDF, Excel, CSV) that capture data at a specific point in time
They remain essential for compliance, auditability, and stakeholder workflows despite interactive dashboards
Moose + ClickHouse provides a developer-first approach: define TypeScript models, get auto-generated infrastructure

This guide will:

Help you understand when static reports are the right choice
Walk through a complete implementation with file ingestion, materialized views, and PDF generation
Explain the underlying concepts so you can adapt the approach to your needs

Business Context

Understand when and why static reports matter

The Scenario

Apex Retail daily sales consolidation example

Start Building

Setup, data models, ingestion, and PDF export

How to use this guide

Different starting points depending on what you need:

Business leaders and operations teams: Read Parts 1-2 for the business case and use cases. Skim Part 3 for a concrete example, then hand off to developers.

Developers: Skip to Part 4 and start building. Parts 1-2 have business context if you need it later.

Data engineers evaluating this approach: Parts 1-2 cover the business case, Part 3 is a concrete scenario, Parts 4-8 are implementation, Part 9 explains the underlying concepts.

Prerequisites

Node.js 20+ (required for pdfmake 0.3)
Docker Desktop running
pnpm (npm install -g pnpm)
Basic TypeScript and SQL knowledge

Verify your environment

Before starting, confirm your tools are ready:

node --version   # Should be v20.x.x or higher

docker --version
docker info > /dev/null 2>&1 && echo "Docker is running" || echo "Docker is NOT running"

If Docker isn't running, start Docker Desktop before proceeding.

Tips for success

Keep moose dev running throughout the tutorial. Use a dedicated terminal tab or window. If you accidentally close it, run moose dev again from the project directory.

First startup takes longer. Docker images download on first run (2-5 minutes depending on connection). Subsequent starts are much faster.

Services need time to initialize. After creating new APIs or views, wait 20-30 seconds before testing. Watch the moose dev terminal for compilation messages.

File watching is automatic. Save a file and the system rebuilds—no manual restart needed. Type errors block startup, so fix them before proceeding.

Logs are invaluable for debugging. All logs are stored in ~/.moose/YYYY-MM-DD-cli.log. Watch in real-time:

tail -f ~/.moose/$(date +%Y-%m-%d)-cli.log

Verify data at each stage. Don't assume it worked—validate with queries. Test APIs with curl before running scripts that depend on them.

Part 1: What are static reports?

Static reports are pre-generated snapshots of data, delivered as fixed artifacts—most commonly PDF or Excel files, though CSV exports and dashboard views are also common formats.

These reports capture what the data looked like at a specific point in time, with a defined and unchanging scope, metric logic, and layout. Once generated, a static report remains frozen until it is explicitly regenerated.

Key characteristics

Pre-generated snapshots

Delivered as fixed artifacts (PDF, Excel, CSV)

Point-in-time capture

Defined, unchanging scope and metric logic

Stable reference

Frozen documents preferred by many stakeholders over interactive dashboards

Auditability support

Enable sign-off workflows and compliance requirements

Portable handoffs

Act as handoff layers for Finance, Operations, and Compliance

Why static reports matter

Despite the proliferation of interactive dashboards and real-time analytics tools, static reports remain essential to business operations:

Reason	Business Value
Stable reference points	Stakeholders prefer frozen documents they can review, annotate, and archive rather than dashboards that show different numbers each time
Reduced version confusion	When everyone references the same PDF from the same date, there's no ambiguity about which numbers are being discussed
Compliance and auditability	Regulatory requirements often mandate immutable records that can be retrieved and verified years later
Universal accessibility	Not everyone has system access—many stakeholders need a PDF they can email or an Excel file they can manipulate offline
Period-close reliability	Financial close processes depend on definitive snapshots that won't change after sign-off

Part 2: Business use cases

Understanding where static reports create value helps organizations prioritize their data infrastructure investments.

Selling data assets to third parties

Static reports enable revenue generation through data monetization:

Data archival and retrieval:

Medical patient record retrieval for healthcare providers and researchers
Mortgage records and refinancing documentation for financial institutions
Historical transaction records for audits and legal proceedings

Data broker services:

Medical research records for pharmaceutical companies and academic institutions
Vehicle history reports (CarFax-style) for automotive transactions
Credit history reports for lending decisions
Purchase history for marketing and customer analytics
Real estate transaction history for property valuations

Compliance and regulated reporting

For organizations in regulated industries, static reports provide critical capabilities:

Immutable, versioned records that support approval workflows and retention policies
External sharing capabilities without requiring system access
Reproducibility through stored run metadata including dataset version, query parameters, pipeline run timestamp, and result hash
Easy cross-department distribution while maintaining audit trails

Analyst workflows

Many analysts and business users prefer working with exported data rather than interactive dashboards:

Exporting curated datasets (not raw data) into Excel or Google Sheets
Faster pivot table creation and data exploration
Reconciliation workflows comparing multiple data sources
Annotation and commentary that lives alongside the data

Automations and operational handoffs

Static reports serve as predictable, machine-readable inputs to downstream processes:

Email-to-ticket systems that process report contents
Slack integrations that distribute key metrics
RPA (Robotic Process Automation) tools that consume standardized formats
System migrations enabling side-by-side comparison between old and new systems

Understanding the challenges

Before investing in static report generation capabilities, understand the technical challenges:

Challenge	Impact
Data discovery	Teams spend hours finding duplicates, missing fields, and inconsistent definitions before reports can be generated
Data cleaning	Format inconsistencies, NULL handling, and duplicate entities cause report errors
Multiple formats	Same logical data arrives in different structures from different sources
Validation and trust	Without structured UAT and golden records, errors reach executives or regulators
Timing and orchestration	Reports generated before data is complete deliver flawed results

Evaluating solutions

When evaluating static report generation solutions, consider:

Capability	What to Look For
Data quality	Automated profiling, duplicate detection, validation rules
Format support	Native handling of Excel, PDF, CSV, JSON, XML
Audit trail	Complete lineage from source to report with timestamps
Scalability	OLAP-based processing for large datasets
Orchestration	Dependency management, freshness checks, retry logic
Self-service	User-friendly export interfaces with consistent formatting

Why Moose for static reports?

Moose addresses these challenges with a developer-first approach:

Traditional Approach	With Moose
Manual ClickHouse setup	Data model → auto infrastructure
Custom ingestion code	Auto-generated `/ingest` API
Hand-written DDL	Type-safe schema from TypeScript
Slow iteration cycles	Hot reload, local-first development
Manual API boilerplate	Type-safe Consumption APIs
Risky production changes	Lifecycle management prevents accidents

Part 3: The scenario

Apex Retail daily sales consolidation

You're a data engineer at Apex Retail, a chain with 12 stores across 4 regions. Each night, store POS (point-of-sale) systems export the day's sales as CSV or JSON files.

The current process: An analyst downloads files from each store, copies data into Excel, and emails a summary to regional managers. Takes 2 hours every morning. When the analyst is sick, reports don't go out.

What you'll build: A pipeline that ingests store uploads, stores them in ClickHouse, pre-aggregates daily summaries, and generates PDF reports. About 60 lines of TypeScript.

Why automate?

Factor	Manual	Automated
Time per report	2 hours	< 1 minute
Annual analyst hours	500+	~10 (maintenance)
Error rate	5-10%	0%
Availability	One person	24/7
Audit trail	None	Git history
Scalability	Linear	Constant

Sound familiar?

Reports late when the analyst is out
"The numbers don't match" disputes
Hours debugging Excel formulas
Can't pull last Tuesday's numbers quickly

Why not existing tools?

Approach	Problem
Excel macros	No version control, breaks when formats change
BI dashboards	Good for exploration, PDF export still needs code
Scheduled SQL	Raw SQL gets unmaintainable
Manual scripts	No streaming, no type safety

This approach:

Type-safe models catch schema mismatches at compile time
Streaming ingestion via Redpanda
Materialized views make queries instant
Git history tracks all changes
Same inputs produce same outputs

Architecture

Loading diagram...

End result

File loader script for CSV/JSON ingestion
Pre-computed daily aggregations (millisecond queries)
Parameterized JSON API endpoint
PDF export via local script or API endpoint

Part 4: Setup and data model

Step 1: Install Moose CLI

bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose

Verify:

moose --version

Step 2: Initialize project

moose init sales-reports typescript-empty
cd sales-reports
pnpm install

Moose uses pnpm workspaces. npm install won't work.

Step 3: Start dev server

Stop other Moose projects first

If you have other Moose projects running, stop them before continuing. Multiple Moose instances will conflict on ports 4000, 9000, and 18123. Use docker ps to check for running containers and stop them if needed.

Open a new terminal window (or terminal tab) and run:

cd sales-reports
moose dev

Important

moose dev runs continuously and doesn't exit. Keep this terminal open for the entire tutorial. You'll run all other commands (curl, scripts, etc.) in a separate terminal.

The dev server auto-reloads on file changes. First run may auto-upgrade the CLI to match moose-lib version.

Wait for services to start

First startup takes 2-5 minutes while Docker images download. Subsequent starts are faster (10-30 seconds).

Watch the terminal for these indicators that services are ready:

✓ ClickHouse is ready
✓ Redpanda is ready
HTTP server started on port 4000

Verify services

In your other terminal, test the services:

# Test the API server
curl http://localhost:4000/health

Should return JSON with healthy and unhealthy arrays listing service status.

curl http://localhost:18123/ping

Should return Ok.

If checks fail, wait another 20-30 seconds. Common startup issues:

Connection refused: Services still initializing—wait longer
Docker errors: Ensure Docker Desktop is running (check menu bar icon on Mac)

What started:

Service	Port	Description
HTTP Server	4000	API endpoints
ClickHouse	18123	Columnar database for analytics
Redpanda	19092	Message queue for data streaming (Kafka-compatible)
Temporal	8080	Scheduled task runner

All in Docker. No cloud accounts needed.

Debugging tip

ClickHouse has a web UI at http://localhost:18123/play for running queries interactively—useful for debugging.

Step 4: Define data model

Create app/ingest/SalesTransaction.ts:

import { IngestPipeline } from "@514labs/moose-lib"; // Define the data model - matches store POS export formatexport interface SalesTransaction {  // Unique identifier  transactionId: string;  // e.g., "TXN-W001-20250115-0001"   // Timestamp - the star of time-series analytics  saleTimestamp: Date;   // Store & location  storeId: string;             // e.g., "STORE-W001"  region: string;              // West, East, Central, South   // Product info  productId: string;  productCategory: string;     // Electronics, Clothing, Food, Home, Sports   // Customer (optional - from loyalty card)  customerId: string;   // Transaction amounts  quantity: number;  unitPrice: number;  totalAmount: number;} // IMPORTANT: Always include the type parameter <SalesTransaction> - it's required// for the compiler plugin to generate the correct schemaexport const salesPipeline = new IngestPipeline<SalesTransaction>("sales", {  ingestApi: true,  stream: true,  table: {    // Time-first ordering for efficient range queries    // The first field becomes the primary key in ClickHouse    orderByFields: ["saleTimestamp", "transactionId"],  }});

The IngestPipeline must include the type parameter. Without it: "Supply the type param T so that the schema is inserted by the compiler plugin."

This declaration creates:

ClickHouse table matching the interface
Redpanda stream
REST endpoint at /ingest/sales

Step 5: Export models

Update app/index.ts:

export * from "./ingest/SalesTransaction";

Important

Every new model, view, or API must be exported from app/index.ts. Forgetting this is a common mistake—without the export, tables and endpoints won't be created, and there won't be any errors in the logs.

Verify

Check the moose dev terminal for:

✓ Detected model changes✓ Created table: sales✓ Created stream: sales✓ Registered API endpoint: /ingest/sales

If not visible, check that the file saved and look for TypeScript errors.

From that one interface, Moose inferred the ClickHouse schema, created the table with time-first ordering, set up a Redpanda topic, and generated a REST endpoint with validation.

Step 6: Test ingestion

curl -X POST http://localhost:4000/ingest/sales \
  -H "Content-Type: application/json" \
  -d '{
    "transactionId": "TXN-W001-20250115-0001",
    "saleTimestamp": "2025-01-15T10:30:00Z",
    "storeId": "STORE-W001",
    "region": "West",
    "productId": "PROD-ELE-001",
    "productCategory": "Electronics",
    "customerId": "CUST-1001",
    "quantity": 2,
    "unitPrice": 299.99,
    "totalAmount": 599.98
  }'

Returns SUCCESS.

The data flows through Redpanda into ClickHouse.

Verify in ClickHouse (credentials are dev defaults, don't use in production):

curl -s "http://localhost:18123/?user=panda&password=pandapass" \
  -d "SELECT * FROM local.sales FORMAT Pretty"

Part 5: File ingestion

Stores upload CSV or JSON files rather than calling APIs directly. Here's a file loader.

Step 1: Install dependencies

pnpm add papaparse date-fns
pnpm add -D @types/papaparse ts-node

ts-node lets you run TypeScript files directly without compiling first.

Step 2: Create file loader

Create scripts/load-file.ts:

import Papa from "papaparse";import fs from "fs";import path from "path"; // Note: saleTimestamp can be string or Date because papaparse dynamicTyping// converts ISO date strings to Date objectsinterface SalesTransaction {  transactionId: string;  saleTimestamp: string | Date;  storeId: string;  region: string;  productId: string;  productCategory: string;  customerId: string | null;  // null for customers without loyalty cards  quantity: number;  unitPrice: number;  totalAmount: number;} async function loadFile(filePath: string): Promise<void> {  const ext = path.extname(filePath).toLowerCase();  const content = fs.readFileSync(filePath, "utf-8");   let records: SalesTransaction[];   if (ext === ".csv") {    const { data } = Papa.parse<SalesTransaction>(content, {      header: true,      dynamicTyping: true,    });    // Filter empty rows and normalize fields    records = data      .filter((r) => r.transactionId)      .map((r) => ({        ...r,        // papaparse dynamicTyping converts Date objects - convert back to ISO string        saleTimestamp: r.saleTimestamp instanceof Date          ? r.saleTimestamp.toISOString()          : r.saleTimestamp,        // papaparse dynamicTyping converts empty CSV fields to null        // but the API expects empty strings for optional string fields        customerId: r.customerId ?? "",      }));  } else if (ext === ".json") {    const parsed = JSON.parse(content);    const rawRecords: SalesTransaction[] = Array.isArray(parsed) ? parsed : [parsed];    // Apply same normalization as CSV: filter empty rows and normalize fields    records = rawRecords      .filter((r) => r.transactionId)      .map((r) => ({        ...r,        // Normalize null customerId to empty string (API expects string, not null)        customerId: r.customerId ?? "",      }));  } else {    throw new Error(`Unsupported file format: ${ext}`);  }   console.log(`Parsed ${records.length} records from ${filePath}`);   // Send records to ingest API (collecting errors for partial failure handling)  const errors: Array<{ index: number; error: string }> = [];  for (let i = 0; i < records.length; i++) {    const response = await fetch("http://localhost:4000/ingest/sales", {      method: "POST",      headers: { "Content-Type": "application/json" },      body: JSON.stringify(records[i]),    });     if (!response.ok) {      errors.push({ index: i, error: response.statusText });    }  }   if (errors.length > 0) {    console.error(`Failed to ingest ${errors.length} records:`, errors);  }  console.log(`Successfully loaded ${records.length - errors.length} of ${records.length} records`);} // CLI executionconst filePath = process.argv[2];if (!filePath) {  console.error("Usage: pnpm exec ts-node scripts/load-file.ts <path-to-file>");  process.exit(1);} loadFile(filePath).catch(console.error);

This sends records one at a time. For production, batch them.

Step 3: Create sample data

Create sample-data/west-store-001.csv:

transactionId,saleTimestamp,storeId,region,customerId,productId,productCategory,quantity,unitPrice,totalAmountTXN-W001-20250115-0001,2025-01-15T09:15:00Z,STORE-W001,West,CUST-1001,PROD-ELE-001,Electronics,2,299.99,599.98TXN-W001-20250115-0002,2025-01-15T09:32:00Z,STORE-W001,West,CUST-1002,PROD-CLO-015,Clothing,1,49.99,49.99TXN-W001-20250115-0003,2025-01-15T10:05:00Z,STORE-W001,West,,PROD-FOO-042,Food,3,12.99,38.97TXN-W001-20250115-0004,2025-01-15T10:45:00Z,STORE-W001,West,CUST-1003,PROD-HOM-008,Home,1,89.99,89.99TXN-W001-20250115-0005,2025-01-15T11:20:00Z,STORE-W001,West,CUST-1004,PROD-SPO-022,Sports,2,34.99,69.98TXN-W001-20250115-0006,2025-01-15T12:00:00Z,STORE-W001,West,,PROD-ELE-055,Electronics,1,149.99,149.99TXN-W001-20250115-0007,2025-01-15T13:30:00Z,STORE-W001,West,CUST-1005,PROD-CLO-033,Clothing,3,29.99,89.97TXN-W001-20250115-0008,2025-01-15T14:15:00Z,STORE-W001,West,CUST-1001,PROD-FOO-018,Food,5,8.99,44.95TXN-W001-20250115-0009,2025-01-15T15:45:00Z,STORE-W001,West,CUST-1006,PROD-HOM-041,Home,1,199.99,199.99TXN-W001-20250115-0010,2025-01-15T16:30:00Z,STORE-W001,West,,PROD-ELE-012,Electronics,1,599.99,599.99

Empty customerId = no loyalty card.

Step 4: Load sample data

mkdir -p sample-data
# Create the CSV file above, then:
pnpm exec ts-node scripts/load-file.ts sample-data/west-store-001.csv

Script shortcut

You can also add a script to package.json: "load-file": "ts-node scripts/load-file.ts" and run pnpm run load-file sample-data/west-store-001.csv.

Output:

Parsed 10 records from sample-data/west-store-001.csvSuccessfully loaded 10 records

Verify:

curl -s "http://localhost:18123/?user=panda&password=pandapass" \
  -d "SELECT count() FROM local.sales"

Note on duplicates

Running the file loader multiple times creates duplicate records—ClickHouse doesn't enforce unique constraints. To start fresh, stop moose dev and remove the project's Docker volumes. Caution: docker volume prune deletes all unused volumes on your machine. Instead, use docker volume ls to list volumes, identify the ones for your project (typically containing clickhouse or redpanda), and remove them selectively with docker volume rm <volume_name>.

Step 5: Add validation (recommended)

Production data is messy. Add validation to catch issues early.

Part 6: Time-series aggregation

Create a materialized view that pre-aggregates daily summaries.

How ClickHouse materialized views work

Views execute on INSERT, not on a schedule
Data aggregates per batch, not globally
Background merges consolidate eventually, but always use GROUP BY for accurate results
SummingMergeTree only works for sum() and count()

Step 1: Create materialized view

Create app/views/DailySalesSummary.ts:

import { MaterializedView, sql } from "@514labs/moose-lib";import { salesPipeline } from "../ingest/SalesTransaction"; // Define the aggregated data modelexport interface DailySalesSummary {  saleDate: Date;  region: string;  productCategory: string;  transactionCount: number;  totalQuantity: number;  totalRevenue: number;} // Get table reference from pipeline (with null check for TypeScript)const salesTable = salesPipeline.table;if (!salesTable) {  throw new Error("Sales table not configured in pipeline");} // Create materialized view with auto-created target tableexport const dailySalesMV = new MaterializedView<DailySalesSummary>({  tableName: "DailySalesSummary",  materializedViewName: "DailySalesSummary_MV",  orderByFields: ["saleDate", "region", "productCategory"],  selectTables: [salesTable],  selectStatement: sql`    SELECT      toStartOfDay(saleTimestamp) AS saleDate,      region,      productCategory,      count() AS transactionCount,      sum(quantity) AS totalQuantity,      sum(totalAmount) AS totalRevenue    FROM ${salesTable}    GROUP BY saleDate, region, productCategory  `,});

toStartOfDay() truncates to daily granularity. ORDER BY saleDate first for efficient date-range queries. Always use GROUP BY when querying.

Step 2: Update exports

Add to app/index.ts:

export * from "./ingest/SalesTransaction";export * from "./views/DailySalesSummary";

Step 3: Verify

curl -s "http://localhost:18123/?user=panda&password=pandapass" \
  -d "SHOW TABLES FROM local"

Should show both sales and DailySalesSummary.

On each INSERT to sales, the materialized view (MV) runs and groups transactions by day, collapsing potentially millions of rows into one per day/region/category. Results land in DailySalesSummary. Since MVs aggregate per-batch, you need GROUP BY when querying.

Performance difference:

Target	Rows	Query time
Raw transactions (1 year)	~10M	2-10 sec
Daily summaries (1 year)	~1,500	10-50 ms

That's 100x faster because aggregation happens once at insert time, not on every query.

Step 4: Query aggregated data

Always use GROUP BY:

curl -s "http://localhost:18123/?user=panda&password=pandapass" -d "
  SELECT
    saleDate,
    region,
    sum(transactionCount) AS transactions,
    sum(totalRevenue) AS revenue
  FROM local.DailySalesSummary
  GROUP BY saleDate, region
  ORDER BY saleDate, region
  FORMAT Pretty
"

ClickHouse time functions

Function	Output	Use case
`toStartOfHour(ts)`	`2025-01-15 10:00:00`	Intraday
`toStartOfDay(ts)`	`2025-01-15`	Daily
`toStartOfWeek(ts, 1)`	`2025-01-13`	Weekly (Monday)
`toStartOfMonth(ts)`	`2025-01-01`	Monthly
`toHour(ts)`	`10`	Hour-of-day
`toDayOfWeek(ts)`	`3` (Wed)	Day-of-week

Part 7: Report API

Build a consumption API with parameterized date ranges.

Step 1: Create API

Create app/apis/salesReport.ts:

import { Api } from "@514labs/moose-lib";import { dailySalesMV } from "../views/DailySalesSummary"; // Define query parametersexport interface SalesReportParams {  startDate: string;   // Format: YYYY-MM-DD  endDate: string;     // Format: YYYY-MM-DD  region?: string;     // Optional filter} // Define response interfaceexport interface SalesReportResponse {  saleDate: string;  region: string;  productCategory: string;  transactionCount: number;  totalQuantity: number;  totalRevenue: number;} // Create the API endpointexport const salesReportApi = new Api<SalesReportParams, SalesReportResponse[]>(  "sales-report",  async ({ startDate, endDate, region }, { client, sql }) => {    // Access the target table from the materialized view    const targetTable = dailySalesMV.targetTable;     // Build query with optional region filter    // Note: We use separate queries for clarity. Production code could    // consolidate these using conditional query building.    let query;    if (region) {      query = sql`        SELECT          saleDate,          region,          productCategory,          sum(transactionCount) AS transactionCount,          sum(totalQuantity) AS totalQuantity,          sum(totalRevenue) AS totalRevenue        FROM ${targetTable}        WHERE saleDate BETWEEN ${startDate} AND ${endDate}          AND region = ${region}        GROUP BY saleDate, region, productCategory        ORDER BY saleDate DESC, totalRevenue DESC      `;    } else {      query = sql`        SELECT          saleDate,          region,          productCategory,          sum(transactionCount) AS transactionCount,          sum(totalQuantity) AS totalQuantity,          sum(totalRevenue) AS totalRevenue        FROM ${targetTable}        WHERE saleDate BETWEEN ${startDate} AND ${endDate}        GROUP BY saleDate, region, productCategory        ORDER BY saleDate DESC, totalRevenue DESC      `;    }     const result = await client.query.execute<SalesReportResponse>(query);    return result.json();  });

Step 2: Update exports

Add to app/index.ts:

export * from "./ingest/SalesTransaction";export * from "./views/DailySalesSummary";export * from "./apis/salesReport";

Creates a REST endpoint at /consumption/sales-report with startDate, endDate, and optional region parameters. Queries the MV (fast), uses parameterized SQL (no injection), returns typed JSON.

PDF generator and dashboards both hit this endpoint.

Step 3: Test

curl "http://localhost:4000/consumption/sales-report?startDate=2025-01-01&endDate=2025-01-31"

A successful response returns JSON like this:

[  {    "saleDate": "2025-01-15",    "region": "West",    "productCategory": "Electronics",    "transactionCount": 3,    "totalQuantity": 4,    "totalRevenue": 1349.96  },  {    "saleDate": "2025-01-15",    "region": "West",    "productCategory": "Clothing",    "transactionCount": 2,    "totalQuantity": 4,    "totalRevenue": 139.96  }]

If you get an empty array [], verify data was ingested in Part 3 and the date range matches your data.

Filter by region:

curl "http://localhost:4000/consumption/sales-report?startDate=2025-01-15&endDate=2025-01-15&region=West"

Part 8: PDF export

Transform JSON into PDF. You'll implement two complementary approaches:

Local script - For batch generation, scheduled jobs, and local development
API endpoint - For on-demand generation and external system integration

Implement both to support the full range of PDF generation use cases.

Approach A: Local script

Step 1: Install pdfmake

pnpm add pdfmake
pnpm add -D @types/pdfmake

Node.js requirement

This guide uses pdfmake 0.3.x which requires Node.js 20 LTS (Long-Term Support) or higher. The API changed significantly from version 0.2.x—see the pdfmake 0.3 documentation for details.

Step 2: Create PDF generator

Create scripts/generate-pdf-report.ts:

import { TDocumentDefinitions } from "pdfmake/interfaces"; // Use require for pdfmake as it doesn't have proper ESM exportsconst pdfmake = require("pdfmake"); // Use standard PDF fonts (no external files needed)// Note: In pdfmake 0.3, these are built-in standard fontsconst fonts = {  Roboto: {    normal: "Helvetica",    bold: "Helvetica-Bold",    italics: "Helvetica-Oblique",    bolditalics: "Helvetica-BoldOblique",  },}; // Register fonts with pdfmake (required for pdfmake 0.3)pdfmake.addFonts(fonts); // Report data interface (matches API response)interface SalesReportRow {  saleDate: string;  region: string;  productCategory: string;  transactionCount: number;  totalQuantity: number;  totalRevenue: number;} async function fetchReportData(  startDate: string,  endDate: string): Promise<SalesReportRow[]> {  const url = `http://localhost:4000/consumption/sales-report?startDate=${startDate}&endDate=${endDate}`;  const response = await fetch(url);   if (!response.ok) {    throw new Error(`API request failed: ${response.statusText}`);  }   return response.json();} function aggregateByRegion(  data: SalesReportRow[]): Map<string, { revenue: number; transactions: number }> {  const regionMap = new Map<string, { revenue: number; transactions: number }>();   for (const row of data) {    const existing = regionMap.get(row.region) || { revenue: 0, transactions: 0 };    regionMap.set(row.region, {      revenue: existing.revenue + row.totalRevenue,      transactions: existing.transactions + row.transactionCount,    });  }   return regionMap;} function aggregateByCategory(  data: SalesReportRow[]): Map<string, { revenue: number; transactions: number }> {  const categoryMap = new Map<string, { revenue: number; transactions: number }>();   for (const row of data) {    const existing = categoryMap.get(row.productCategory) || {      revenue: 0,      transactions: 0,    };    categoryMap.set(row.productCategory, {      revenue: existing.revenue + row.totalRevenue,      transactions: existing.transactions + row.transactionCount,    });  }   return categoryMap;} function formatCurrency(amount: number): string {  return `$${amount.toLocaleString("en-US", { minimumFractionDigits: 2, maximumFractionDigits: 2 })}`;} async function generatePdf(  data: SalesReportRow[],  reportDate: string,  outputPath: string): Promise<void> {   // Calculate totals  const totalRevenue = data.reduce((sum, row) => sum + row.totalRevenue, 0);  const totalTransactions = data.reduce(    (sum, row) => sum + row.transactionCount,    0  );   // Aggregate by region and category  const regionData = aggregateByRegion(data);  const categoryData = aggregateByCategory(data);   // Sort regions by revenue descending  const sortedRegions = Array.from(regionData.entries()).sort(    (a, b) => b[1].revenue - a[1].revenue  );   // Sort categories by revenue descending  const sortedCategories = Array.from(categoryData.entries()).sort(    (a, b) => b[1].revenue - a[1].revenue  );   const docDefinition: TDocumentDefinitions = {    content: [      // Header      { text: "APEX RETAIL", style: "companyName" },      { text: "Daily Sales Summary", style: "reportTitle" },      { text: reportDate, style: "reportDate" },      {        canvas: [          { type: "line", x1: 0, y1: 5, x2: 515, y2: 5, lineWidth: 1 },        ],      },       // Executive Summary      {        text: "EXECUTIVE SUMMARY",        style: "sectionHeader",        margin: [0, 15, 0, 10],      },      {        columns: [          {            text: [              { text: "Total Revenue: ", bold: true },              formatCurrency(totalRevenue),            ],          },          {            text: [              { text: "Transactions: ", bold: true },              totalTransactions.toLocaleString(),            ],          },        ],        margin: [0, 0, 0, 15],      },       // Revenue by Region      {        text: "REVENUE BY REGION",        style: "sectionHeader",        margin: [0, 10, 0, 10],      },      {        table: {          headerRows: 1,          widths: ["*", "auto", "auto", "auto"],          body: [            [              { text: "Region", style: "tableHeader" },              { text: "Revenue", style: "tableHeader" },              { text: "Transactions", style: "tableHeader" },              { text: "Share", style: "tableHeader" },            ],            ...sortedRegions.map(([region, stats]) => [              region,              { text: formatCurrency(stats.revenue), alignment: "right" as const },              { text: stats.transactions.toLocaleString(), alignment: "right" as const },              {                text: `${totalRevenue > 0 ? ((stats.revenue / totalRevenue) * 100).toFixed(1) : "0.0"}%`,                alignment: "right" as const,              },            ]),          ],        },        layout: "lightHorizontalLines",      },       // Revenue by Category      {        text: "REVENUE BY CATEGORY",        style: "sectionHeader",        margin: [0, 20, 0, 10],      },      {        table: {          headerRows: 1,          widths: ["*", "auto", "auto", "auto"],          body: [            [              { text: "Category", style: "tableHeader" },              { text: "Revenue", style: "tableHeader" },              { text: "Transactions", style: "tableHeader" },              { text: "Share", style: "tableHeader" },            ],            ...sortedCategories.map(([category, stats]) => [              category,              { text: formatCurrency(stats.revenue), alignment: "right" as const },              { text: stats.transactions.toLocaleString(), alignment: "right" as const },              {                text: `${totalRevenue > 0 ? ((stats.revenue / totalRevenue) * 100).toFixed(1) : "0.0"}%`,                alignment: "right" as const,              },            ]),          ],        },        layout: "lightHorizontalLines",      },    ],     footer: {      text: `Generated: ${new Date().toISOString()} | Apex Retail Confidential`,      style: "footer",      margin: [40, 0],    },     styles: {      companyName: { fontSize: 24, bold: true, alignment: "center" },      reportTitle: {        fontSize: 16,        alignment: "center",        margin: [0, 5, 0, 5],      },      reportDate: { fontSize: 12, alignment: "center", color: "gray" },      sectionHeader: { fontSize: 12, bold: true },      tableHeader: { bold: true, fillColor: "#eeeeee" },      footer: { fontSize: 8, color: "gray", alignment: "center" },    },    defaultStyle: { font: "Roboto", fontSize: 10 },    pageMargins: [40, 40, 40, 60],  };   // Create PDF using pdfmake 0.3 API  const pdf = pdfmake.createPdf(docDefinition);   // Use the promise-based write() method (pdfmake 0.3+)  await pdf.write(outputPath);  console.log(`PDF report saved to ${outputPath}`);} // Main executionasync function main() {  const startDate = process.argv[2] || "2025-01-01";  const endDate = process.argv[3] || "2025-12-31";  const outputPath = process.argv[4] || "apex-retail-report.pdf";   console.log(`Fetching report data from ${startDate} to ${endDate}...`);   try {    const data = await fetchReportData(startDate, endDate);     if (data.length === 0) {      console.log("No data found for the specified date range.");      return;    }     console.log(`Generating PDF with ${data.length} data rows...`);    await generatePdf(data, `${startDate} to ${endDate}`, outputPath);  } catch (error) {    console.error("Error generating report:", error);    process.exit(1);  }} main();

Step 3: Generate report

With moose dev running, first verify the API returns data:

curl "http://localhost:4000/consumption/sales-report?startDate=2025-01-15&endDate=2025-01-15"

If the API returns valid JSON (not an empty []), generate the PDF:

pnpm exec ts-node scripts/generate-pdf-report.ts 2025-01-15 2025-01-15 january-15-report.pdf

Expected output:

Fetching report data from 2025-01-15 to 2025-01-15...Generating PDF with 5 data rows...PDF report saved to january-15-report.pdf

Test first

Always test the API with curl before running the PDF generator. If you see "No data found" or the script hangs, verify the API returns data for your date range.

Step 4: Verify

Actually open the PDF file to verify formatting and data accuracy—the console output won't show layout issues. The PDF should contain:

Header with company name and date range
Executive summary (revenue, transactions)
Revenue by region with percentages
Revenue by category with percentages
Footer with timestamp

PDF generates from pre-aggregated data, so it's fast regardless of raw transaction count.

Approach B: API endpoint

For production use cases, you may want to generate PDFs on-demand via an API endpoint. This approach:

Returns PDF binary directly in the HTTP response
Allows external systems to request reports via HTTP
Supports direct download via curl or browser

Why WebApp instead of Api?

The standard Api class cannot return binary data—it hardcodes Content-Type: application/json and JSON-serializes all responses. The WebApp class provides direct access to the HTTP handler, allowing you to set custom headers and write binary data.

Step 1: Create the PDF API endpoint

Create app/apis/salesReportPdf.ts:

import { WebApp, getMooseUtils } from "@514labs/moose-lib";import { IncomingMessage, ServerResponse } from "http";import { parse as parseUrl } from "url";import { TDocumentDefinitions } from "pdfmake/interfaces";import { dailySalesMV } from "../views/DailySalesSummary"; // Use require for pdfmake as it doesn't have proper ESM exportsconst pdfmake = require("pdfmake"); // Use standard PDF fonts (no external files needed)const fonts = {  Roboto: {    normal: "Helvetica",    bold: "Helvetica-Bold",    italics: "Helvetica-Oblique",    bolditalics: "Helvetica-BoldOblique",  },};pdfmake.addFonts(fonts); // Report data interface (matches materialized view output)interface SalesReportRow {  saleDate: string;  region: string;  productCategory: string;  transactionCount: number;  totalQuantity: number;  totalRevenue: number;} function formatCurrency(amount: number): string {  return `$${amount.toLocaleString("en-US", { minimumFractionDigits: 2, maximumFractionDigits: 2 })}`;} // Generate PDF buffer using pdfmake 0.3 getBuffer() methodasync function generatePdfBuffer(  data: SalesReportRow[],  reportDateRange: string): Promise<Buffer> {  // Calculate totals  const totalRevenue = data.reduce((sum, row) => sum + row.totalRevenue, 0);  const totalTransactions = data.reduce(    (sum, row) => sum + row.transactionCount,    0  );   // Aggregate by region  const regionMap = new Map<string, { revenue: number; transactions: number }>();  for (const row of data) {    const existing = regionMap.get(row.region) || { revenue: 0, transactions: 0 };    regionMap.set(row.region, {      revenue: existing.revenue + row.totalRevenue,      transactions: existing.transactions + row.transactionCount,    });  }  const sortedRegions = Array.from(regionMap.entries()).sort(    (a, b) => b[1].revenue - a[1].revenue  );   // Aggregate by category  const categoryMap = new Map<string, { revenue: number; transactions: number }>();  for (const row of data) {    const existing = categoryMap.get(row.productCategory) || {      revenue: 0,      transactions: 0,    };    categoryMap.set(row.productCategory, {      revenue: existing.revenue + row.totalRevenue,      transactions: existing.transactions + row.transactionCount,    });  }  const sortedCategories = Array.from(categoryMap.entries()).sort(    (a, b) => b[1].revenue - a[1].revenue  );   const docDefinition: TDocumentDefinitions = {    content: [      { text: "APEX RETAIL", style: "companyName" },      { text: "Daily Sales Summary", style: "reportTitle" },      { text: reportDateRange, style: "reportDate" },      {        canvas: [          { type: "line", x1: 0, y1: 5, x2: 515, y2: 5, lineWidth: 1 },        ],      },      {        text: "EXECUTIVE SUMMARY",        style: "sectionHeader",        margin: [0, 15, 0, 10],      },      {        columns: [          {            text: [              { text: "Total Revenue: ", bold: true },              formatCurrency(totalRevenue),            ],          },          {            text: [              { text: "Transactions: ", bold: true },              totalTransactions.toLocaleString(),            ],          },        ],        margin: [0, 0, 0, 15],      },      {        text: "REVENUE BY REGION",        style: "sectionHeader",        margin: [0, 10, 0, 10],      },      {        table: {          headerRows: 1,          widths: ["*", "auto", "auto", "auto"],          body: [            [              { text: "Region", style: "tableHeader" },              { text: "Revenue", style: "tableHeader" },              { text: "Transactions", style: "tableHeader" },              { text: "Share", style: "tableHeader" },            ],            ...sortedRegions.map(([region, stats]) => [              region,              { text: formatCurrency(stats.revenue), alignment: "right" as const },              { text: stats.transactions.toLocaleString(), alignment: "right" as const },              {                text: `${totalRevenue > 0 ? ((stats.revenue / totalRevenue) * 100).toFixed(1) : "0.0"}%`,                alignment: "right" as const,              },            ]),          ],        },        layout: "lightHorizontalLines",      },      {        text: "REVENUE BY CATEGORY",        style: "sectionHeader",        margin: [0, 20, 0, 10],      },      {        table: {          headerRows: 1,          widths: ["*", "auto", "auto", "auto"],          body: [            [              { text: "Category", style: "tableHeader" },              { text: "Revenue", style: "tableHeader" },              { text: "Transactions", style: "tableHeader" },              { text: "Share", style: "tableHeader" },            ],            ...sortedCategories.map(([category, stats]) => [              category,              { text: formatCurrency(stats.revenue), alignment: "right" as const },              { text: stats.transactions.toLocaleString(), alignment: "right" as const },              {                text: `${totalRevenue > 0 ? ((stats.revenue / totalRevenue) * 100).toFixed(1) : "0.0"}%`,                alignment: "right" as const,              },            ]),          ],        },        layout: "lightHorizontalLines",      },    ],    footer: {      text: `Generated: ${new Date().toISOString()} | Apex Retail Confidential`,      style: "footer",      margin: [40, 0],    },    styles: {      companyName: { fontSize: 24, bold: true, alignment: "center" },      reportTitle: {        fontSize: 16,        alignment: "center",        margin: [0, 5, 0, 5],      },      reportDate: { fontSize: 12, alignment: "center", color: "gray" },      sectionHeader: { fontSize: 12, bold: true },      tableHeader: { bold: true, fillColor: "#eeeeee" },      footer: { fontSize: 8, color: "gray", alignment: "center" },    },    defaultStyle: { font: "Roboto", fontSize: 10 },    pageMargins: [40, 40, 40, 60],  };   const pdf = pdfmake.createPdf(docDefinition);  // pdfmake 0.3 provides promise-based getBuffer()  return pdf.getBuffer();} const handler = async (req: IncomingMessage, res: ServerResponse) => {  const url = parseUrl(req.url || "", true);  const startDate = url.query.startDate as string;  const endDate = url.query.endDate as string;  const region = url.query.region as string | undefined;   // Validate required parameters  if (!startDate || !endDate) {    res.writeHead(400, { "Content-Type": "application/json" });    res.end(JSON.stringify({ error: "startDate and endDate query parameters are required" }));    return;  }   // Validate date format (YYYY-MM-DD)  const dateRegex = /^\d{4}-\d{2}-\d{2}$/;  if (!dateRegex.test(startDate) || !dateRegex.test(endDate)) {    res.writeHead(400, { "Content-Type": "application/json" });    res.end(JSON.stringify({ error: "Dates must be in YYYY-MM-DD format" }));    return;  }   // Validate date order  if (startDate > endDate) {    res.writeHead(400, { "Content-Type": "application/json" });    res.end(JSON.stringify({ error: "startDate must be before or equal to endDate" }));    return;  }   try {    // Query ClickHouse directly using Moose utilities    const { client, sql } = await getMooseUtils();    const targetTable = dailySalesMV.targetTable;     // Build query with optional region filter    let query;    if (region) {      query = sql`        SELECT          saleDate,          region,          productCategory,          sum(transactionCount) AS transactionCount,          sum(totalQuantity) AS totalQuantity,          sum(totalRevenue) AS totalRevenue        FROM ${targetTable}        WHERE saleDate BETWEEN ${startDate} AND ${endDate}          AND region = ${region}        GROUP BY saleDate, region, productCategory        ORDER BY saleDate DESC, totalRevenue DESC      `;    } else {      query = sql`        SELECT          saleDate,          region,          productCategory,          sum(transactionCount) AS transactionCount,          sum(totalQuantity) AS totalQuantity,          sum(totalRevenue) AS totalRevenue        FROM ${targetTable}        WHERE saleDate BETWEEN ${startDate} AND ${endDate}        GROUP BY saleDate, region, productCategory        ORDER BY saleDate DESC, totalRevenue DESC      `;    }     const result = await client.query.execute<SalesReportRow>(query);    const data = await result.json() as SalesReportRow[];     if (data.length === 0) {      res.writeHead(404, { "Content-Type": "application/json" });      res.end(JSON.stringify({ error: "No data found for the specified date range" }));      return;    }     // Generate PDF buffer    const pdfBuffer = await generatePdfBuffer(data, `${startDate} to ${endDate}`);     // Return binary PDF response with appropriate headers    res.writeHead(200, {      "Content-Type": "application/pdf",      "Content-Disposition": `attachment; filename="sales-report-${startDate}-to-${endDate}.pdf"`,      "Content-Length": pdfBuffer.length,    });    res.end(pdfBuffer);  } catch (error) {    console.error("PDF generation error:", error);    res.writeHead(500, { "Content-Type": "application/json" });    res.end(JSON.stringify({ error: "Failed to generate PDF" }));  }}; // Create WebApp with custom mount path// Note: Cannot use reserved paths like /consumption or /ingestexport const salesReportPdfApi = new WebApp("salesReportPdf", handler, {  mountPath: "/reports",  metadata: {    description: "Generate sales report as PDF",  },});

Mount path restrictions

WebApp mount paths cannot use reserved paths like /consumption, /ingest, or /api. Use a custom path like /reports instead.

Step 2: Update exports

Add the new API to app/index.ts:

export * from "./ingest/SalesTransaction";export * from "./views/DailySalesSummary";export * from "./apis/salesReport";export * from "./apis/salesReportPdf";

Step 3: Test the PDF endpoint

With moose dev running, generate a PDF directly via curl:

curl "http://localhost:4000/reports?startDate=2025-01-15&endDate=2025-01-15" > report.pdf

You can also filter by region:

curl "http://localhost:4000/reports?startDate=2025-01-15&endDate=2025-01-15&region=West" > west-report.pdf

The endpoint returns:

200 OK with PDF binary on success
400 Bad Request if startDate or endDate is missing
404 Not Found if no data exists for the date range
500 Internal Server Error on unexpected failures

When to use each approach

With both approaches implemented, you have flexibility for different scenarios:

Approach	Best For
Local script (Approach A)	Batch generation, scheduled jobs, local development, one-off reports
API endpoint (Approach B)	On-demand generation, external consumers, integration with other systems, user-triggered downloads

Both approaches use the same pdfmake library and produce identical PDF output. The key differences are:

Local script fetches data from the JSON API, then generates a PDF file on disk
API endpoint queries ClickHouse directly and returns the PDF binary in the HTTP response

Having both gives you the best of both worlds: scripts for automation and batch processing, plus an API for real-time requests.

Part 9: How it works

Why ClickHouse is fast

ClickHouse is columnar. It stores data differently from row-oriented databases like PostgreSQL.

Row storage (PostgreSQL):

Record 1: [txn_id=1, timestamp=..., store=W001, region=West, amount=599.98]Record 2: [txn_id=2, timestamp=..., store=W001, region=West, amount=49.99]Record 3: [txn_id=3, timestamp=..., store=W002, region=East, amount=89.99]

For SELECT SUM(amount) FROM sales WHERE region='West':

Read every row from disk
Check region
Add matching amounts
Repeat millions of times

You read 100% of data to use 2% of it.

Column storage (ClickHouse):

txn_ids:    [1, 2, 3, ...]timestamps: [2025-01-15T09:00, 2025-01-15T09:30, ...]stores:     [W001, W001, W002, ...]regions:    [West, West, East, ...]amounts:    [599.98, 49.99, 89.99, ...]

Same query in ClickHouse:

Read only region column
Read only amount column
Scan both in parallel
Skip everything else

Read 2% of data, use 100% of it. 10-100x faster.

Why materialized views beat batch jobs

Traditional: scheduled job (cron, Airflow) aggregates data hourly/daily.

Raw Data → [Wait for schedule] → Batch Job → Aggregated Table

Problems: data goes stale between runs, batch job rescans everything, failures leave gaps until next run, scaling means bigger windows or incremental logic.

ClickHouse MVs:

Raw Data → [INSERT triggers MV] → Aggregated Table (immediate)

INSERT into sales
MV's SELECT runs on new rows immediately
Results go into DailySalesSummary
Synchronous: MV updated by the time INSERT returns

Data stays current, no separate batch system to manage, scales with write volume, and a failed INSERT doesn't leave the MV in a bad state.

Batch aggregation

MVs process batches, not global data. Multiple inserts for the same day/region/category create multiple rows.

Two inserts for West/Electronics:

Batch 1 (morning): 50 transactions → MV inserts [saleDate=2025-01-15, region=West, category=Electronics, count=50, revenue=5000]Batch 2 (evening): 30 transactions → MV inserts [saleDate=2025-01-15, region=West, category=Electronics, count=30, revenue=3000]

Two rows for the same day/region/category. This is why GROUP BY is required:

-- Wrong: May return multiple rows per day/regionSELECT * FROM DailySalesSummary WHERE saleDate = '2025-01-15' -- Correct: Consolidates all batches into final aggregatesSELECT saleDate, region, sum(transactionCount), sum(totalRevenue)FROM DailySalesSummaryGROUP BY saleDate, region

SummingMergeTree auto-combines during background merges, but GROUP BY is still safe practice since merges are async.

When to use what

Strategy	When	Example
Raw queries	Ad-hoc, <1M rows	Debugging
Materialized view	Repeated queries, predictable patterns	Dashboards
Pre-aggregated API	External consumers	PDF reports

If you run the same aggregation twice, make an MV.

Time-series patterns

Order by time first:

orderByFields: ["saleTimestamp", "transactionId"]  // GoodorderByFields: ["transactionId", "saleTimestamp"]  // Bad for time queries

Time-first ORDER BY means date-range queries skip irrelevant blocks.

Time granularity:

Granularity	Function	Rows/year	Use
Hourly	`toStartOfHour()`	~8,760	Intraday
Daily	`toStartOfDay()`	~365	Standard
Weekly	`toStartOfWeek()`	~52	Trends
Monthly	`toStartOfMonth()`	~12	Executive

Finer = more rows = slower but more detail.

Multiple levels:

Raw transactions → Hourly MV → Daily MV → Monthly MV

Each level 10-30x smaller than the previous.

How Moose helps

The patterns above—columnar storage, materialized views, time-first ordering—are what make ClickHouse fast. But implementing them correctly requires expertise: writing DDL, managing schemas, configuring streams, and keeping everything in sync.

Moose handles this complexity so you can focus on your data model:

ClickHouse Best Practice	Without Moose	With Moose
Columnar schema design	Hand-write DDL, manage migrations	Define TypeScript interface, schema auto-generated
Time-first ordering	Configure ORDER BY manually	Set `orderByFields` in pipeline config
Materialized views	Write CREATE MATERIALIZED VIEW DDL	Declare `MaterializedView` with typed query
Streaming ingestion	Set up Kafka/Redpanda, write consumers	Auto-configured with `IngestPipeline`
Type-safe APIs	Build REST endpoints manually	`Api` class with typed params and responses

This is what we covered in Why Moose for static reports?—a developer-first approach that gives you ClickHouse performance without ClickHouse complexity.

Wrap-up

What you built

CSV/JSON Files → Ingest API → ClickHouse Table → Materialized View → API → PDF

Components

SalesTransaction model

Type-safe schema that creates ClickHouse table

File loader

Parses CSV/JSON, sends to ingest API

DailySalesSummary MV

Pre-aggregates at insert time

sales-report API

Date-range queries with JSON response

PDF export (two options)

Local script or API endpoint returning binary PDF

Before vs after

Before: Analyst downloads 12 CSVs, copies to Excel, writes formulas, formats Word doc, emails PDF. 2+ hours daily. Breaks when analyst is out.

After: POS uploads to API, ClickHouse aggregates, API serves any date range, PDF generates in under a second. Runs unattended. Git tracks every change.

In this scenario, that's around 500 hours of manual Excel work eliminated (2 hours/day × 250 days). Your actual savings depend on your current process. The code is deterministic, so the same inputs produce the same outputs. And since it's just querying a database, adding more stores doesn't change how long reports take to generate.

Performance

Operation	Row DB	Moose + ClickHouse
Transaction scan	Seconds-minutes	Milliseconds
Daily aggregation	10-30+ sec	<100ms
Report generation	Minutes	<1 sec

Next steps

Add dimension tables

Customer, product, and store reference data

Create additional MVs

Hourly, weekly, and monthly aggregations

Add Excel export

Use ExcelJS for spreadsheet output

Schedule delivery

Automate with Temporal workflows

Deploy to production

Use Fiveonefour hosting

Reference

Sample queries

Daily revenue by region:

SELECT  saleDate,  region,  sum(totalRevenue) AS revenue,  sum(transactionCount) AS transactionsFROM local.DailySalesSummaryGROUP BY saleDate, regionORDER BY saleDate DESC, revenue DESC;

Hourly pattern (raw data):

SELECT  toHour(saleTimestamp) AS hour_of_day,  count() AS transactions,  sum(totalAmount) AS revenueFROM local.salesGROUP BY hour_of_dayORDER BY hour_of_day;

Top categories:

SELECT  productCategory,  sum(totalRevenue) AS revenue,  sum(transactionCount) AS transactionsFROM local.DailySalesSummaryWHERE saleDate BETWEEN '2025-01-01' AND '2025-01-31'GROUP BY productCategoryORDER BY revenue DESCLIMIT 5;

Project structure

- index.ts
  SalesTransaction.ts
  DailySalesSummary.ts
  salesReport.ts
  salesReportPdf.ts
- load-file.ts
  generate-pdf-report.ts
- west-store-001.csv
- moose.config.toml
- package.json
- tsconfig.json

File	Purpose
`app/index.ts`	Export all modules
`app/ingest/SalesTransaction.ts`	IngestPipeline definition
`app/views/DailySalesSummary.ts`	MaterializedView
`app/apis/salesReport.ts`	Consumption API (JSON)
`app/apis/salesReportPdf.ts`	PDF API (binary response)
`scripts/load-file.ts`	CSV/JSON file loader
`scripts/generate-pdf-report.ts`	PDF generator (local script)
`sample-data/west-store-001.csv`	Sample transactions

Configuration

moose.config.toml:

[project]name = "sales-reports"language = "typescript" [features]olap = truestreaming = true [clickhouse_config]db_name = "local"host = "localhost"host_port = 18123native_port = 9000user = "panda"password = "pandapass"use_ssl = false

Troubleshooting

Setup

EUNSUPPORTEDPROTOCOL: Use pnpm install, not npm install.

pnpm build warnings: You'll see "Ignored build scripts" warnings—these are safe to ignore. They're security warnings about packages that want to run build scripts.

Supply the type param T: Add type parameter: new IngestPipeline<SalesTransaction>(...).

Primary key must be prefix of sorting key: First field in orderByFields becomes primary key.

Port 4000, 9000, or 18123 in use:

# Find what's using the port
lsof -i :4000
lsof -i :9000
lsof -i :18123
 
# Kill the process if needed
kill -9 <PID>

Stop other Moose projects before starting a new one. Port 9000 is the ClickHouse native port and commonly conflicts with other Moose instances.

ClickHouse connection refused: Wait 20-30 seconds for containers (longer on first startup). Verify:

curl http://localhost:18123/ping

Check Docker:

docker ps | grep clickhouse

Docker not starting: Run docker info. On Mac/Windows, check Docker Desktop is running (menu bar icon). If Docker stops mid-tutorial, all services fail with "connection refused" errors.

Model not detected: Check file saved, export in app/index.ts, TypeScript errors in terminal. Type errors block the dev server from processing changes.

TypeScript errors blocking startup: The moose dev terminal shows compilation errors. Fix all errors before proceeding—the system won't process new models or APIs until code compiles.

TypeScript errors persist after fixing code: If errors continue to appear with stale line numbers after you've fixed the code, restart moose dev. The TypeScript compiler may cache compiled code, and a full restart clears the cache.

Data

CSV date parsing fails: Use ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ).

Wrong Format	Correct Format
`01/15/2025` (US format)	`2025-01-15T09:00:00Z`
`15-01-2025` (EU format)	`2025-01-15T09:00:00Z`
`2025-1-15` (no leading zeros)	`2025-01-15T09:00:00Z`

Ingestion error: Verify JSON matches interface:

# Test with minimal valid payload
curl -X POST http://localhost:4000/ingest/sales \
  -H "Content-Type: application/json" \
  -d '{"transactionId":"TEST-001","saleTimestamp":"2025-01-15T10:00:00Z","storeId":"STORE-001","region":"West","productId":"PROD-001","productCategory":"Electronics","customerId":"","quantity":1,"unitPrice":10.00,"totalAmount":10.00}'

Type mismatches: quantity must be number not "1", totalAmount must be number not "10.00".

Duplicate IDs: ClickHouse doesn't enforce uniqueness. Check:

SELECT transactionId, count() as cntFROM local.salesGROUP BY transactionIdHAVING cnt > 1

Deduplicate with GROUP BY or subquery. FINAL only works with ReplacingMergeTree/CollapsingMergeTree.

Queries

MV returns unexpected results: MVs aggregate per-batch. Use GROUP BY:

-- Wrong (may show partial aggregates)SELECT * FROM local.DailySalesSummary -- Correct (consolidates partial aggregates)SELECT  saleDate,  region,  sum(transactionCount) AS transactionCount,  sum(totalRevenue) AS totalRevenueFROM local.DailySalesSummaryGROUP BY saleDate, region

API returns empty: Check date format (YYYY-MM-DD not MM-DD-YYYY). Verify data exists for the range.

Query slow: Check you're querying the MV not raw table. Check date filter uses indexed column. Run EXPLAIN.

PDF

Not generating: Check API accessible, fetch errors in output, pdfmake installed. PDF generation should complete in under 2 seconds—if it hangs, the API is probably not responding.

Empty/malformed: Check API response has data, date range has transactions.

Test the API first:

curl "http://localhost:4000/consumption/sales-report?startDate=2025-01-15&endDate=2025-01-15"

If this returns empty [], verify data was ingested and the date range is correct.

Cleanup

Stopping the dev server: Press Ctrl+C in the moose dev terminal. Docker containers stop automatically.

Starting over completely: Stop moose dev, delete the project directory, and run moose init again.

Reset data but keep code: Stop moose dev, then remove the project's Docker volumes. Caution: docker volume prune deletes all unused volumes on your machine—not just this project's. Use docker volume ls to list volumes, identify yours (typically containing clickhouse or redpanda), and remove them with docker volume rm <volume_name>.

Data persists across restarts: If you stop and restart moose dev, your ingested data is still there—it's stored in Docker volumes.

Resources

Static Report Generation

TL;DR

Static reports are pre-generated snapshots (PDF, Excel, CSV) that capture data at a specific point in time
They remain essential for compliance, auditability, and stakeholder workflows despite interactive dashboards
Moose + ClickHouse provides a developer-first approach: define TypeScript models, get auto-generated infrastructure

This guide will:

Help you understand when static reports are the right choice
Walk through a complete implementation with file ingestion, materialized views, and PDF generation
Explain the underlying concepts so you can adapt the approach to your needs

Business Context

Understand when and why static reports matter

The Scenario

Apex Retail daily sales consolidation example

Start Building

Setup, data models, ingestion, and PDF export

How to use this guide

Different starting points depending on what you need:

Business leaders and operations teams: Read Parts 1-2 for the business case and use cases. Skim Part 3 for a concrete example, then hand off to developers.

Developers: Skip to Part 4 and start building. Parts 1-2 have business context if you need it later.

Data engineers evaluating this approach: Parts 1-2 cover the business case, Part 3 is a concrete scenario, Parts 4-8 are implementation, Part 9 explains the underlying concepts.

Prerequisites

Node.js 20+ (required for pdfmake 0.3)
Docker Desktop running
pnpm (npm install -g pnpm)
Basic TypeScript and SQL knowledge

Verify your environment

Before starting, confirm your tools are ready:

node --version   # Should be v20.x.x or higher

docker --version
docker info > /dev/null 2>&1 && echo "Docker is running" || echo "Docker is NOT running"

If Docker isn't running, start Docker Desktop before proceeding.

Tips for success

Keep moose dev running throughout the tutorial. Use a dedicated terminal tab or window. If you accidentally close it, run moose dev again from the project directory.

First startup takes longer. Docker images download on first run (2-5 minutes depending on connection). Subsequent starts are much faster.

Services need time to initialize. After creating new APIs or views, wait 20-30 seconds before testing. Watch the moose dev terminal for compilation messages.

File watching is automatic. Save a file and the system rebuilds—no manual restart needed. Type errors block startup, so fix them before proceeding.

Logs are invaluable for debugging. All logs are stored in ~/.moose/YYYY-MM-DD-cli.log. Watch in real-time:

tail -f ~/.moose/$(date +%Y-%m-%d)-cli.log

Verify data at each stage. Don't assume it worked—validate with queries. Test APIs with curl before running scripts that depend on them.

Part 1: What are static reports?

Static reports are pre-generated snapshots of data, delivered as fixed artifacts—most commonly PDF or Excel files, though CSV exports and dashboard views are also common formats.

Key characteristics

Pre-generated snapshots

Delivered as fixed artifacts (PDF, Excel, CSV)

Point-in-time capture

Defined, unchanging scope and metric logic

Stable reference

Frozen documents preferred by many stakeholders over interactive dashboards

Auditability support

Enable sign-off workflows and compliance requirements

Portable handoffs

Act as handoff layers for Finance, Operations, and Compliance

Why static reports matter

Despite the proliferation of interactive dashboards and real-time analytics tools, static reports remain essential to business operations:

Reason	Business Value
Stable reference points	Stakeholders prefer frozen documents they can review, annotate, and archive rather than dashboards that show different numbers each time
Reduced version confusion	When everyone references the same PDF from the same date, there's no ambiguity about which numbers are being discussed
Compliance and auditability	Regulatory requirements often mandate immutable records that can be retrieved and verified years later
Universal accessibility	Not everyone has system access—many stakeholders need a PDF they can email or an Excel file they can manipulate offline
Period-close reliability	Financial close processes depend on definitive snapshots that won't change after sign-off

Part 2: Business use cases

Understanding where static reports create value helps organizations prioritize their data infrastructure investments.

Selling data assets to third parties

Static reports enable revenue generation through data monetization:

Data archival and retrieval:

Medical patient record retrieval for healthcare providers and researchers
Mortgage records and refinancing documentation for financial institutions
Historical transaction records for audits and legal proceedings

Data broker services:

Medical research records for pharmaceutical companies and academic institutions
Vehicle history reports (CarFax-style) for automotive transactions
Credit history reports for lending decisions
Purchase history for marketing and customer analytics
Real estate transaction history for property valuations

Compliance and regulated reporting

For organizations in regulated industries, static reports provide critical capabilities:

Immutable, versioned records that support approval workflows and retention policies
External sharing capabilities without requiring system access
Reproducibility through stored run metadata including dataset version, query parameters, pipeline run timestamp, and result hash
Easy cross-department distribution while maintaining audit trails

Analyst workflows

Many analysts and business users prefer working with exported data rather than interactive dashboards:

Exporting curated datasets (not raw data) into Excel or Google Sheets
Faster pivot table creation and data exploration
Reconciliation workflows comparing multiple data sources
Annotation and commentary that lives alongside the data

Automations and operational handoffs

Static reports serve as predictable, machine-readable inputs to downstream processes:

Email-to-ticket systems that process report contents
Slack integrations that distribute key metrics
RPA (Robotic Process Automation) tools that consume standardized formats
System migrations enabling side-by-side comparison between old and new systems

Understanding the challenges

Before investing in static report generation capabilities, understand the technical challenges:

Challenge	Impact
Data discovery	Teams spend hours finding duplicates, missing fields, and inconsistent definitions before reports can be generated
Data cleaning	Format inconsistencies, NULL handling, and duplicate entities cause report errors
Multiple formats	Same logical data arrives in different structures from different sources
Validation and trust	Without structured UAT and golden records, errors reach executives or regulators
Timing and orchestration	Reports generated before data is complete deliver flawed results

Evaluating solutions

When evaluating static report generation solutions, consider:

Capability	What to Look For
Data quality	Automated profiling, duplicate detection, validation rules
Format support	Native handling of Excel, PDF, CSV, JSON, XML
Audit trail	Complete lineage from source to report with timestamps
Scalability	OLAP-based processing for large datasets
Orchestration	Dependency management, freshness checks, retry logic
Self-service	User-friendly export interfaces with consistent formatting

Why Moose for static reports?

Moose addresses these challenges with a developer-first approach:

Traditional Approach	With Moose
Manual ClickHouse setup	Data model → auto infrastructure
Custom ingestion code	Auto-generated `/ingest` API
Hand-written DDL	Type-safe schema from TypeScript
Slow iteration cycles	Hot reload, local-first development
Manual API boilerplate	Type-safe Consumption APIs
Risky production changes	Lifecycle management prevents accidents

Part 3: The scenario

Apex Retail daily sales consolidation

You're a data engineer at Apex Retail, a chain with 12 stores across 4 regions. Each night, store POS (point-of-sale) systems export the day's sales as CSV or JSON files.

What you'll build: A pipeline that ingests store uploads, stores them in ClickHouse, pre-aggregates daily summaries, and generates PDF reports. About 60 lines of TypeScript.

Why automate?

Factor	Manual	Automated
Time per report	2 hours	< 1 minute
Annual analyst hours	500+	~10 (maintenance)
Error rate	5-10%	0%
Availability	One person	24/7
Audit trail	None	Git history
Scalability	Linear	Constant

Sound familiar?

Reports late when the analyst is out
"The numbers don't match" disputes
Hours debugging Excel formulas
Can't pull last Tuesday's numbers quickly

Why not existing tools?

Approach	Problem
Excel macros	No version control, breaks when formats change
BI dashboards	Good for exploration, PDF export still needs code
Scheduled SQL	Raw SQL gets unmaintainable
Manual scripts	No streaming, no type safety

This approach:

Type-safe models catch schema mismatches at compile time
Streaming ingestion via Redpanda
Materialized views make queries instant
Git history tracks all changes
Same inputs produce same outputs

Architecture

Loading diagram...

End result

File loader script for CSV/JSON ingestion
Pre-computed daily aggregations (millisecond queries)
Parameterized JSON API endpoint
PDF export via local script or API endpoint

Part 4: Setup and data model

Step 1: Install Moose CLI

bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose

Verify:

moose --version

Step 2: Initialize project

moose init sales-reports typescript-empty
cd sales-reports
pnpm install

Moose uses pnpm workspaces. npm install won't work.

Step 3: Start dev server

Stop other Moose projects first

Open a new terminal window (or terminal tab) and run:

cd sales-reports
moose dev

Important

moose dev runs continuously and doesn't exit. Keep this terminal open for the entire tutorial. You'll run all other commands (curl, scripts, etc.) in a separate terminal.

The dev server auto-reloads on file changes. First run may auto-upgrade the CLI to match moose-lib version.

Wait for services to start

First startup takes 2-5 minutes while Docker images download. Subsequent starts are faster (10-30 seconds).

Watch the terminal for these indicators that services are ready:

✓ ClickHouse is ready
✓ Redpanda is ready
HTTP server started on port 4000

Verify services

In your other terminal, test the services:

# Test the API server
curl http://localhost:4000/health

Should return JSON with healthy and unhealthy arrays listing service status.

curl http://localhost:18123/ping

Should return Ok.

If checks fail, wait another 20-30 seconds. Common startup issues:

Connection refused: Services still initializing—wait longer
Docker errors: Ensure Docker Desktop is running (check menu bar icon on Mac)

What started:

Service	Port	Description
HTTP Server	4000	API endpoints
ClickHouse	18123	Columnar database for analytics
Redpanda	19092	Message queue for data streaming (Kafka-compatible)
Temporal	8080	Scheduled task runner

All in Docker. No cloud accounts needed.

Debugging tip

ClickHouse has a web UI at http://localhost:18123/play for running queries interactively—useful for debugging.

Step 4: Define data model

Create app/ingest/SalesTransaction.ts:

import { IngestPipeline } from "@514labs/moose-lib"; // Define the data model - matches store POS export formatexport interface SalesTransaction {  // Unique identifier  transactionId: string;  // e.g., "TXN-W001-20250115-0001"   // Timestamp - the star of time-series analytics  saleTimestamp: Date;   // Store & location  storeId: string;             // e.g., "STORE-W001"  region: string;              // West, East, Central, South   // Product info  productId: string;  productCategory: string;     // Electronics, Clothing, Food, Home, Sports   // Customer (optional - from loyalty card)  customerId: string;   // Transaction amounts  quantity: number;  unitPrice: number;  totalAmount: number;} // IMPORTANT: Always include the type parameter <SalesTransaction> - it's required// for the compiler plugin to generate the correct schemaexport const salesPipeline = new IngestPipeline<SalesTransaction>("sales", {  ingestApi: true,  stream: true,  table: {    // Time-first ordering for efficient range queries    // The first field becomes the primary key in ClickHouse    orderByFields: ["saleTimestamp", "transactionId"],  }});

The IngestPipeline must include the type parameter. Without it: "Supply the type param T so that the schema is inserted by the compiler plugin."

This declaration creates:

ClickHouse table matching the interface
Redpanda stream
REST endpoint at /ingest/sales

Step 5: Export models

Update app/index.ts:

export * from "./ingest/SalesTransaction";

Important

Verify

Check the moose dev terminal for:

✓ Detected model changes✓ Created table: sales✓ Created stream: sales✓ Registered API endpoint: /ingest/sales

If not visible, check that the file saved and look for TypeScript errors.

From that one interface, Moose inferred the ClickHouse schema, created the table with time-first ordering, set up a Redpanda topic, and generated a REST endpoint with validation.

Step 6: Test ingestion

curl -X POST http://localhost:4000/ingest/sales \
  -H "Content-Type: application/json" \
  -d '{
    "transactionId": "TXN-W001-20250115-0001",
    "saleTimestamp": "2025-01-15T10:30:00Z",
    "storeId": "STORE-W001",
    "region": "West",
    "productId": "PROD-ELE-001",
    "productCategory": "Electronics",
    "customerId": "CUST-1001",
    "quantity": 2,
    "unitPrice": 299.99,
    "totalAmount": 599.98
  }'

Returns SUCCESS.

The data flows through Redpanda into ClickHouse.

Verify in ClickHouse (credentials are dev defaults, don't use in production):

curl -s "http://localhost:18123/?user=panda&password=pandapass" \
  -d "SELECT * FROM local.sales FORMAT Pretty"

Part 5: File ingestion

Stores upload CSV or JSON files rather than calling APIs directly. Here's a file loader.

Step 1: Install dependencies

pnpm add papaparse date-fns
pnpm add -D @types/papaparse ts-node

ts-node lets you run TypeScript files directly without compiling first.

Step 2: Create file loader

Create scripts/load-file.ts:

import Papa from "papaparse";import fs from "fs";import path from "path"; // Note: saleTimestamp can be string or Date because papaparse dynamicTyping// converts ISO date strings to Date objectsinterface SalesTransaction {  transactionId: string;  saleTimestamp: string | Date;  storeId: string;  region: string;  productId: string;  productCategory: string;  customerId: string | null;  // null for customers without loyalty cards  quantity: number;  unitPrice: number;  totalAmount: number;} async function loadFile(filePath: string): Promise<void> {  const ext = path.extname(filePath).toLowerCase();  const content = fs.readFileSync(filePath, "utf-8");   let records: SalesTransaction[];   if (ext === ".csv") {    const { data } = Papa.parse<SalesTransaction>(content, {      header: true,      dynamicTyping: true,    });    // Filter empty rows and normalize fields    records = data      .filter((r) => r.transactionId)      .map((r) => ({        ...r,        // papaparse dynamicTyping converts Date objects - convert back to ISO string        saleTimestamp: r.saleTimestamp instanceof Date          ? r.saleTimestamp.toISOString()          : r.saleTimestamp,        // papaparse dynamicTyping converts empty CSV fields to null        // but the API expects empty strings for optional string fields        customerId: r.customerId ?? "",      }));  } else if (ext === ".json") {    const parsed = JSON.parse(content);    const rawRecords: SalesTransaction[] = Array.isArray(parsed) ? parsed : [parsed];    // Apply same normalization as CSV: filter empty rows and normalize fields    records = rawRecords      .filter((r) => r.transactionId)      .map((r) => ({        ...r,        // Normalize null customerId to empty string (API expects string, not null)        customerId: r.customerId ?? "",      }));  } else {    throw new Error(`Unsupported file format: ${ext}`);  }   console.log(`Parsed ${records.length} records from ${filePath}`);   // Send records to ingest API (collecting errors for partial failure handling)  const errors: Array<{ index: number; error: string }> = [];  for (let i = 0; i < records.length; i++) {    const response = await fetch("http://localhost:4000/ingest/sales", {      method: "POST",      headers: { "Content-Type": "application/json" },      body: JSON.stringify(records[i]),    });     if (!response.ok) {      errors.push({ index: i, error: response.statusText });    }  }   if (errors.length > 0) {    console.error(`Failed to ingest ${errors.length} records:`, errors);  }  console.log(`Successfully loaded ${records.length - errors.length} of ${records.length} records`);} // CLI executionconst filePath = process.argv[2];if (!filePath) {  console.error("Usage: pnpm exec ts-node scripts/load-file.ts <path-to-file>");  process.exit(1);} loadFile(filePath).catch(console.error);

This sends records one at a time. For production, batch them.

Step 3: Create sample data

Create sample-data/west-store-001.csv:

transactionId,saleTimestamp,storeId,region,customerId,productId,productCategory,quantity,unitPrice,totalAmountTXN-W001-20250115-0001,2025-01-15T09:15:00Z,STORE-W001,West,CUST-1001,PROD-ELE-001,Electronics,2,299.99,599.98TXN-W001-20250115-0002,2025-01-15T09:32:00Z,STORE-W001,West,CUST-1002,PROD-CLO-015,Clothing,1,49.99,49.99TXN-W001-20250115-0003,2025-01-15T10:05:00Z,STORE-W001,West,,PROD-FOO-042,Food,3,12.99,38.97TXN-W001-20250115-0004,2025-01-15T10:45:00Z,STORE-W001,West,CUST-1003,PROD-HOM-008,Home,1,89.99,89.99TXN-W001-20250115-0005,2025-01-15T11:20:00Z,STORE-W001,West,CUST-1004,PROD-SPO-022,Sports,2,34.99,69.98TXN-W001-20250115-0006,2025-01-15T12:00:00Z,STORE-W001,West,,PROD-ELE-055,Electronics,1,149.99,149.99TXN-W001-20250115-0007,2025-01-15T13:30:00Z,STORE-W001,West,CUST-1005,PROD-CLO-033,Clothing,3,29.99,89.97TXN-W001-20250115-0008,2025-01-15T14:15:00Z,STORE-W001,West,CUST-1001,PROD-FOO-018,Food,5,8.99,44.95TXN-W001-20250115-0009,2025-01-15T15:45:00Z,STORE-W001,West,CUST-1006,PROD-HOM-041,Home,1,199.99,199.99TXN-W001-20250115-0010,2025-01-15T16:30:00Z,STORE-W001,West,,PROD-ELE-012,Electronics,1,599.99,599.99

Empty customerId = no loyalty card.

Step 4: Load sample data

mkdir -p sample-data
# Create the CSV file above, then:
pnpm exec ts-node scripts/load-file.ts sample-data/west-store-001.csv

Script shortcut

You can also add a script to package.json: "load-file": "ts-node scripts/load-file.ts" and run pnpm run load-file sample-data/west-store-001.csv.

Output:

Parsed 10 records from sample-data/west-store-001.csvSuccessfully loaded 10 records

Verify:

curl -s "http://localhost:18123/?user=panda&password=pandapass" \
  -d "SELECT count() FROM local.sales"

Note on duplicates

Step 5: Add validation (recommended)

Production data is messy. Add validation to catch issues early.

Part 6: Time-series aggregation

Create a materialized view that pre-aggregates daily summaries.

How ClickHouse materialized views work

Views execute on INSERT, not on a schedule
Data aggregates per batch, not globally
Background merges consolidate eventually, but always use GROUP BY for accurate results
SummingMergeTree only works for sum() and count()

Step 1: Create materialized view

Create app/views/DailySalesSummary.ts:

import { MaterializedView, sql } from "@514labs/moose-lib";import { salesPipeline } from "../ingest/SalesTransaction"; // Define the aggregated data modelexport interface DailySalesSummary {  saleDate: Date;  region: string;  productCategory: string;  transactionCount: number;  totalQuantity: number;  totalRevenue: number;} // Get table reference from pipeline (with null check for TypeScript)const salesTable = salesPipeline.table;if (!salesTable) {  throw new Error("Sales table not configured in pipeline");} // Create materialized view with auto-created target tableexport const dailySalesMV = new MaterializedView<DailySalesSummary>({  tableName: "DailySalesSummary",  materializedViewName: "DailySalesSummary_MV",  orderByFields: ["saleDate", "region", "productCategory"],  selectTables: [salesTable],  selectStatement: sql`    SELECT      toStartOfDay(saleTimestamp) AS saleDate,      region,      productCategory,      count() AS transactionCount,      sum(quantity) AS totalQuantity,      sum(totalAmount) AS totalRevenue    FROM ${salesTable}    GROUP BY saleDate, region, productCategory  `,});

toStartOfDay() truncates to daily granularity. ORDER BY saleDate first for efficient date-range queries. Always use GROUP BY when querying.

Step 2: Update exports

Add to app/index.ts:

export * from "./ingest/SalesTransaction";export * from "./views/DailySalesSummary";

Step 3: Verify

curl -s "http://localhost:18123/?user=panda&password=pandapass" \
  -d "SHOW TABLES FROM local"

Should show both sales and DailySalesSummary.

Performance difference:

Target	Rows	Query time
Raw transactions (1 year)	~10M	2-10 sec
Daily summaries (1 year)	~1,500	10-50 ms

That's 100x faster because aggregation happens once at insert time, not on every query.

Step 4: Query aggregated data

Always use GROUP BY:

curl -s "http://localhost:18123/?user=panda&password=pandapass" -d "
  SELECT
    saleDate,
    region,
    sum(transactionCount) AS transactions,
    sum(totalRevenue) AS revenue
  FROM local.DailySalesSummary
  GROUP BY saleDate, region
  ORDER BY saleDate, region
  FORMAT Pretty
"

ClickHouse time functions

Function	Output	Use case
`toStartOfHour(ts)`	`2025-01-15 10:00:00`	Intraday
`toStartOfDay(ts)`	`2025-01-15`	Daily
`toStartOfWeek(ts, 1)`	`2025-01-13`	Weekly (Monday)
`toStartOfMonth(ts)`	`2025-01-01`	Monthly
`toHour(ts)`	`10`	Hour-of-day
`toDayOfWeek(ts)`	`3` (Wed)	Day-of-week

Part 7: Report API

Build a consumption API with parameterized date ranges.

Step 1: Create API

Create app/apis/salesReport.ts:

import { Api } from "@514labs/moose-lib";import { dailySalesMV } from "../views/DailySalesSummary"; // Define query parametersexport interface SalesReportParams {  startDate: string;   // Format: YYYY-MM-DD  endDate: string;     // Format: YYYY-MM-DD  region?: string;     // Optional filter} // Define response interfaceexport interface SalesReportResponse {  saleDate: string;  region: string;  productCategory: string;  transactionCount: number;  totalQuantity: number;  totalRevenue: number;} // Create the API endpointexport const salesReportApi = new Api<SalesReportParams, SalesReportResponse[]>(  "sales-report",  async ({ startDate, endDate, region }, { client, sql }) => {    // Access the target table from the materialized view    const targetTable = dailySalesMV.targetTable;     // Build query with optional region filter    // Note: We use separate queries for clarity. Production code could    // consolidate these using conditional query building.    let query;    if (region) {      query = sql`        SELECT          saleDate,          region,          productCategory,          sum(transactionCount) AS transactionCount,          sum(totalQuantity) AS totalQuantity,          sum(totalRevenue) AS totalRevenue        FROM ${targetTable}        WHERE saleDate BETWEEN ${startDate} AND ${endDate}          AND region = ${region}        GROUP BY saleDate, region, productCategory        ORDER BY saleDate DESC, totalRevenue DESC      `;    } else {      query = sql`        SELECT          saleDate,          region,          productCategory,          sum(transactionCount) AS transactionCount,          sum(totalQuantity) AS totalQuantity,          sum(totalRevenue) AS totalRevenue        FROM ${targetTable}        WHERE saleDate BETWEEN ${startDate} AND ${endDate}        GROUP BY saleDate, region, productCategory        ORDER BY saleDate DESC, totalRevenue DESC      `;    }     const result = await client.query.execute<SalesReportResponse>(query);    return result.json();  });

Step 2: Update exports

Add to app/index.ts:

export * from "./ingest/SalesTransaction";export * from "./views/DailySalesSummary";export * from "./apis/salesReport";

Creates a REST endpoint at /consumption/sales-report with startDate, endDate, and optional region parameters. Queries the MV (fast), uses parameterized SQL (no injection), returns typed JSON.

PDF generator and dashboards both hit this endpoint.

Step 3: Test

curl "http://localhost:4000/consumption/sales-report?startDate=2025-01-01&endDate=2025-01-31"

A successful response returns JSON like this:

[  {    "saleDate": "2025-01-15",    "region": "West",    "productCategory": "Electronics",    "transactionCount": 3,    "totalQuantity": 4,    "totalRevenue": 1349.96  },  {    "saleDate": "2025-01-15",    "region": "West",    "productCategory": "Clothing",    "transactionCount": 2,    "totalQuantity": 4,    "totalRevenue": 139.96  }]

If you get an empty array [], verify data was ingested in Part 3 and the date range matches your data.

Filter by region:

curl "http://localhost:4000/consumption/sales-report?startDate=2025-01-15&endDate=2025-01-15&region=West"

Part 8: PDF export

Transform JSON into PDF. You'll implement two complementary approaches:

Local script - For batch generation, scheduled jobs, and local development
API endpoint - For on-demand generation and external system integration

Implement both to support the full range of PDF generation use cases.

Approach A: Local script

Step 1: Install pdfmake

pnpm add pdfmake
pnpm add -D @types/pdfmake

Node.js requirement

This guide uses pdfmake 0.3.x which requires Node.js 20 LTS (Long-Term Support) or higher. The API changed significantly from version 0.2.x—see the pdfmake 0.3 documentation for details.

Step 2: Create PDF generator

Create scripts/generate-pdf-report.ts:

import { TDocumentDefinitions } from "pdfmake/interfaces"; // Use require for pdfmake as it doesn't have proper ESM exportsconst pdfmake = require("pdfmake"); // Use standard PDF fonts (no external files needed)// Note: In pdfmake 0.3, these are built-in standard fontsconst fonts = {  Roboto: {    normal: "Helvetica",    bold: "Helvetica-Bold",    italics: "Helvetica-Oblique",    bolditalics: "Helvetica-BoldOblique",  },}; // Register fonts with pdfmake (required for pdfmake 0.3)pdfmake.addFonts(fonts); // Report data interface (matches API response)interface SalesReportRow {  saleDate: string;  region: string;  productCategory: string;  transactionCount: number;  totalQuantity: number;  totalRevenue: number;} async function fetchReportData(  startDate: string,  endDate: string): Promise<SalesReportRow[]> {  const url = `http://localhost:4000/consumption/sales-report?startDate=${startDate}&endDate=${endDate}`;  const response = await fetch(url);   if (!response.ok) {    throw new Error(`API request failed: ${response.statusText}`);  }   return response.json();} function aggregateByRegion(  data: SalesReportRow[]): Map<string, { revenue: number; transactions: number }> {  const regionMap = new Map<string, { revenue: number; transactions: number }>();   for (const row of data) {    const existing = regionMap.get(row.region) || { revenue: 0, transactions: 0 };    regionMap.set(row.region, {      revenue: existing.revenue + row.totalRevenue,      transactions: existing.transactions + row.transactionCount,    });  }   return regionMap;} function aggregateByCategory(  data: SalesReportRow[]): Map<string, { revenue: number; transactions: number }> {  const categoryMap = new Map<string, { revenue: number; transactions: number }>();   for (const row of data) {    const existing = categoryMap.get(row.productCategory) || {      revenue: 0,      transactions: 0,    };    categoryMap.set(row.productCategory, {      revenue: existing.revenue + row.totalRevenue,      transactions: existing.transactions + row.transactionCount,    });  }   return categoryMap;} function formatCurrency(amount: number): string {  return `$${amount.toLocaleString("en-US", { minimumFractionDigits: 2, maximumFractionDigits: 2 })}`;} async function generatePdf(  data: SalesReportRow[],  reportDate: string,  outputPath: string): Promise<void> {   // Calculate totals  const totalRevenue = data.reduce((sum, row) => sum + row.totalRevenue, 0);  const totalTransactions = data.reduce(    (sum, row) => sum + row.transactionCount,    0  );   // Aggregate by region and category  const regionData = aggregateByRegion(data);  const categoryData = aggregateByCategory(data);   // Sort regions by revenue descending  const sortedRegions = Array.from(regionData.entries()).sort(    (a, b) => b[1].revenue - a[1].revenue  );   // Sort categories by revenue descending  const sortedCategories = Array.from(categoryData.entries()).sort(    (a, b) => b[1].revenue - a[1].revenue  );   const docDefinition: TDocumentDefinitions = {    content: [      // Header      { text: "APEX RETAIL", style: "companyName" },      { text: "Daily Sales Summary", style: "reportTitle" },      { text: reportDate, style: "reportDate" },      {        canvas: [          { type: "line", x1: 0, y1: 5, x2: 515, y2: 5, lineWidth: 1 },        ],      },       // Executive Summary      {        text: "EXECUTIVE SUMMARY",        style: "sectionHeader",        margin: [0, 15, 0, 10],      },      {        columns: [          {            text: [              { text: "Total Revenue: ", bold: true },              formatCurrency(totalRevenue),            ],          },          {            text: [              { text: "Transactions: ", bold: true },              totalTransactions.toLocaleString(),            ],          },        ],        margin: [0, 0, 0, 15],      },       // Revenue by Region      {        text: "REVENUE BY REGION",        style: "sectionHeader",        margin: [0, 10, 0, 10],      },      {        table: {          headerRows: 1,          widths: ["*", "auto", "auto", "auto"],          body: [            [              { text: "Region", style: "tableHeader" },              { text: "Revenue", style: "tableHeader" },              { text: "Transactions", style: "tableHeader" },              { text: "Share", style: "tableHeader" },            ],            ...sortedRegions.map(([region, stats]) => [              region,              { text: formatCurrency(stats.revenue), alignment: "right" as const },              { text: stats.transactions.toLocaleString(), alignment: "right" as const },              {                text: `${totalRevenue > 0 ? ((stats.revenue / totalRevenue) * 100).toFixed(1) : "0.0"}%`,                alignment: "right" as const,              },            ]),          ],        },        layout: "lightHorizontalLines",      },       // Revenue by Category      {        text: "REVENUE BY CATEGORY",        style: "sectionHeader",        margin: [0, 20, 0, 10],      },      {        table: {          headerRows: 1,          widths: ["*", "auto", "auto", "auto"],          body: [            [              { text: "Category", style: "tableHeader" },              { text: "Revenue", style: "tableHeader" },              { text: "Transactions", style: "tableHeader" },              { text: "Share", style: "tableHeader" },            ],            ...sortedCategories.map(([category, stats]) => [              category,              { text: formatCurrency(stats.revenue), alignment: "right" as const },              { text: stats.transactions.toLocaleString(), alignment: "right" as const },              {                text: `${totalRevenue > 0 ? ((stats.revenue / totalRevenue) * 100).toFixed(1) : "0.0"}%`,                alignment: "right" as const,              },            ]),          ],        },        layout: "lightHorizontalLines",      },    ],     footer: {      text: `Generated: ${new Date().toISOString()} | Apex Retail Confidential`,      style: "footer",      margin: [40, 0],    },     styles: {      companyName: { fontSize: 24, bold: true, alignment: "center" },      reportTitle: {        fontSize: 16,        alignment: "center",        margin: [0, 5, 0, 5],      },      reportDate: { fontSize: 12, alignment: "center", color: "gray" },      sectionHeader: { fontSize: 12, bold: true },      tableHeader: { bold: true, fillColor: "#eeeeee" },      footer: { fontSize: 8, color: "gray", alignment: "center" },    },    defaultStyle: { font: "Roboto", fontSize: 10 },    pageMargins: [40, 40, 40, 60],  };   // Create PDF using pdfmake 0.3 API  const pdf = pdfmake.createPdf(docDefinition);   // Use the promise-based write() method (pdfmake 0.3+)  await pdf.write(outputPath);  console.log(`PDF report saved to ${outputPath}`);} // Main executionasync function main() {  const startDate = process.argv[2] || "2025-01-01";  const endDate = process.argv[3] || "2025-12-31";  const outputPath = process.argv[4] || "apex-retail-report.pdf";   console.log(`Fetching report data from ${startDate} to ${endDate}...`);   try {    const data = await fetchReportData(startDate, endDate);     if (data.length === 0) {      console.log("No data found for the specified date range.");      return;    }     console.log(`Generating PDF with ${data.length} data rows...`);    await generatePdf(data, `${startDate} to ${endDate}`, outputPath);  } catch (error) {    console.error("Error generating report:", error);    process.exit(1);  }} main();

Step 3: Generate report

With moose dev running, first verify the API returns data:

curl "http://localhost:4000/consumption/sales-report?startDate=2025-01-15&endDate=2025-01-15"

If the API returns valid JSON (not an empty []), generate the PDF:

pnpm exec ts-node scripts/generate-pdf-report.ts 2025-01-15 2025-01-15 january-15-report.pdf

Expected output:

Fetching report data from 2025-01-15 to 2025-01-15...Generating PDF with 5 data rows...PDF report saved to january-15-report.pdf

Test first

Always test the API with curl before running the PDF generator. If you see "No data found" or the script hangs, verify the API returns data for your date range.

Step 4: Verify

Actually open the PDF file to verify formatting and data accuracy—the console output won't show layout issues. The PDF should contain:

Header with company name and date range
Executive summary (revenue, transactions)
Revenue by region with percentages
Revenue by category with percentages
Footer with timestamp

PDF generates from pre-aggregated data, so it's fast regardless of raw transaction count.

Approach B: API endpoint

For production use cases, you may want to generate PDFs on-demand via an API endpoint. This approach:

Returns PDF binary directly in the HTTP response
Allows external systems to request reports via HTTP
Supports direct download via curl or browser

Why WebApp instead of Api?

Step 1: Create the PDF API endpoint

Create app/apis/salesReportPdf.ts:

import { WebApp, getMooseUtils } from "@514labs/moose-lib";import { IncomingMessage, ServerResponse } from "http";import { parse as parseUrl } from "url";import { TDocumentDefinitions } from "pdfmake/interfaces";import { dailySalesMV } from "../views/DailySalesSummary"; // Use require for pdfmake as it doesn't have proper ESM exportsconst pdfmake = require("pdfmake"); // Use standard PDF fonts (no external files needed)const fonts = {  Roboto: {    normal: "Helvetica",    bold: "Helvetica-Bold",    italics: "Helvetica-Oblique",    bolditalics: "Helvetica-BoldOblique",  },};pdfmake.addFonts(fonts); // Report data interface (matches materialized view output)interface SalesReportRow {  saleDate: string;  region: string;  productCategory: string;  transactionCount: number;  totalQuantity: number;  totalRevenue: number;} function formatCurrency(amount: number): string {  return `$${amount.toLocaleString("en-US", { minimumFractionDigits: 2, maximumFractionDigits: 2 })}`;} // Generate PDF buffer using pdfmake 0.3 getBuffer() methodasync function generatePdfBuffer(  data: SalesReportRow[],  reportDateRange: string): Promise<Buffer> {  // Calculate totals  const totalRevenue = data.reduce((sum, row) => sum + row.totalRevenue, 0);  const totalTransactions = data.reduce(    (sum, row) => sum + row.transactionCount,    0  );   // Aggregate by region  const regionMap = new Map<string, { revenue: number; transactions: number }>();  for (const row of data) {    const existing = regionMap.get(row.region) || { revenue: 0, transactions: 0 };    regionMap.set(row.region, {      revenue: existing.revenue + row.totalRevenue,      transactions: existing.transactions + row.transactionCount,    });  }  const sortedRegions = Array.from(regionMap.entries()).sort(    (a, b) => b[1].revenue - a[1].revenue  );   // Aggregate by category  const categoryMap = new Map<string, { revenue: number; transactions: number }>();  for (const row of data) {    const existing = categoryMap.get(row.productCategory) || {      revenue: 0,      transactions: 0,    };    categoryMap.set(row.productCategory, {      revenue: existing.revenue + row.totalRevenue,      transactions: existing.transactions + row.transactionCount,    });  }  const sortedCategories = Array.from(categoryMap.entries()).sort(    (a, b) => b[1].revenue - a[1].revenue  );   const docDefinition: TDocumentDefinitions = {    content: [      { text: "APEX RETAIL", style: "companyName" },      { text: "Daily Sales Summary", style: "reportTitle" },      { text: reportDateRange, style: "reportDate" },      {        canvas: [          { type: "line", x1: 0, y1: 5, x2: 515, y2: 5, lineWidth: 1 },        ],      },      {        text: "EXECUTIVE SUMMARY",        style: "sectionHeader",        margin: [0, 15, 0, 10],      },      {        columns: [          {            text: [              { text: "Total Revenue: ", bold: true },              formatCurrency(totalRevenue),            ],          },          {            text: [              { text: "Transactions: ", bold: true },              totalTransactions.toLocaleString(),            ],          },        ],        margin: [0, 0, 0, 15],      },      {        text: "REVENUE BY REGION",        style: "sectionHeader",        margin: [0, 10, 0, 10],      },      {        table: {          headerRows: 1,          widths: ["*", "auto", "auto", "auto"],          body: [            [              { text: "Region", style: "tableHeader" },              { text: "Revenue", style: "tableHeader" },              { text: "Transactions", style: "tableHeader" },              { text: "Share", style: "tableHeader" },            ],            ...sortedRegions.map(([region, stats]) => [              region,              { text: formatCurrency(stats.revenue), alignment: "right" as const },              { text: stats.transactions.toLocaleString(), alignment: "right" as const },              {                text: `${totalRevenue > 0 ? ((stats.revenue / totalRevenue) * 100).toFixed(1) : "0.0"}%`,                alignment: "right" as const,              },            ]),          ],        },        layout: "lightHorizontalLines",      },      {        text: "REVENUE BY CATEGORY",        style: "sectionHeader",        margin: [0, 20, 0, 10],      },      {        table: {          headerRows: 1,          widths: ["*", "auto", "auto", "auto"],          body: [            [              { text: "Category", style: "tableHeader" },              { text: "Revenue", style: "tableHeader" },              { text: "Transactions", style: "tableHeader" },              { text: "Share", style: "tableHeader" },            ],            ...sortedCategories.map(([category, stats]) => [              category,              { text: formatCurrency(stats.revenue), alignment: "right" as const },              { text: stats.transactions.toLocaleString(), alignment: "right" as const },              {                text: `${totalRevenue > 0 ? ((stats.revenue / totalRevenue) * 100).toFixed(1) : "0.0"}%`,                alignment: "right" as const,              },            ]),          ],        },        layout: "lightHorizontalLines",      },    ],    footer: {      text: `Generated: ${new Date().toISOString()} | Apex Retail Confidential`,      style: "footer",      margin: [40, 0],    },    styles: {      companyName: { fontSize: 24, bold: true, alignment: "center" },      reportTitle: {        fontSize: 16,        alignment: "center",        margin: [0, 5, 0, 5],      },      reportDate: { fontSize: 12, alignment: "center", color: "gray" },      sectionHeader: { fontSize: 12, bold: true },      tableHeader: { bold: true, fillColor: "#eeeeee" },      footer: { fontSize: 8, color: "gray", alignment: "center" },    },    defaultStyle: { font: "Roboto", fontSize: 10 },    pageMargins: [40, 40, 40, 60],  };   const pdf = pdfmake.createPdf(docDefinition);  // pdfmake 0.3 provides promise-based getBuffer()  return pdf.getBuffer();} const handler = async (req: IncomingMessage, res: ServerResponse) => {  const url = parseUrl(req.url || "", true);  const startDate = url.query.startDate as string;  const endDate = url.query.endDate as string;  const region = url.query.region as string | undefined;   // Validate required parameters  if (!startDate || !endDate) {    res.writeHead(400, { "Content-Type": "application/json" });    res.end(JSON.stringify({ error: "startDate and endDate query parameters are required" }));    return;  }   // Validate date format (YYYY-MM-DD)  const dateRegex = /^\d{4}-\d{2}-\d{2}$/;  if (!dateRegex.test(startDate) || !dateRegex.test(endDate)) {    res.writeHead(400, { "Content-Type": "application/json" });    res.end(JSON.stringify({ error: "Dates must be in YYYY-MM-DD format" }));    return;  }   // Validate date order  if (startDate > endDate) {    res.writeHead(400, { "Content-Type": "application/json" });    res.end(JSON.stringify({ error: "startDate must be before or equal to endDate" }));    return;  }   try {    // Query ClickHouse directly using Moose utilities    const { client, sql } = await getMooseUtils();    const targetTable = dailySalesMV.targetTable;     // Build query with optional region filter    let query;    if (region) {      query = sql`        SELECT          saleDate,          region,          productCategory,          sum(transactionCount) AS transactionCount,          sum(totalQuantity) AS totalQuantity,          sum(totalRevenue) AS totalRevenue        FROM ${targetTable}        WHERE saleDate BETWEEN ${startDate} AND ${endDate}          AND region = ${region}        GROUP BY saleDate, region, productCategory        ORDER BY saleDate DESC, totalRevenue DESC      `;    } else {      query = sql`        SELECT          saleDate,          region,          productCategory,          sum(transactionCount) AS transactionCount,          sum(totalQuantity) AS totalQuantity,          sum(totalRevenue) AS totalRevenue        FROM ${targetTable}        WHERE saleDate BETWEEN ${startDate} AND ${endDate}        GROUP BY saleDate, region, productCategory        ORDER BY saleDate DESC, totalRevenue DESC      `;    }     const result = await client.query.execute<SalesReportRow>(query);    const data = await result.json() as SalesReportRow[];     if (data.length === 0) {      res.writeHead(404, { "Content-Type": "application/json" });      res.end(JSON.stringify({ error: "No data found for the specified date range" }));      return;    }     // Generate PDF buffer    const pdfBuffer = await generatePdfBuffer(data, `${startDate} to ${endDate}`);     // Return binary PDF response with appropriate headers    res.writeHead(200, {      "Content-Type": "application/pdf",      "Content-Disposition": `attachment; filename="sales-report-${startDate}-to-${endDate}.pdf"`,      "Content-Length": pdfBuffer.length,    });    res.end(pdfBuffer);  } catch (error) {    console.error("PDF generation error:", error);    res.writeHead(500, { "Content-Type": "application/json" });    res.end(JSON.stringify({ error: "Failed to generate PDF" }));  }}; // Create WebApp with custom mount path// Note: Cannot use reserved paths like /consumption or /ingestexport const salesReportPdfApi = new WebApp("salesReportPdf", handler, {  mountPath: "/reports",  metadata: {    description: "Generate sales report as PDF",  },});

Mount path restrictions

WebApp mount paths cannot use reserved paths like /consumption, /ingest, or /api. Use a custom path like /reports instead.

Step 2: Update exports

Add the new API to app/index.ts:

export * from "./ingest/SalesTransaction";export * from "./views/DailySalesSummary";export * from "./apis/salesReport";export * from "./apis/salesReportPdf";

Step 3: Test the PDF endpoint

With moose dev running, generate a PDF directly via curl:

curl "http://localhost:4000/reports?startDate=2025-01-15&endDate=2025-01-15" > report.pdf

You can also filter by region:

curl "http://localhost:4000/reports?startDate=2025-01-15&endDate=2025-01-15&region=West" > west-report.pdf

The endpoint returns:

200 OK with PDF binary on success
400 Bad Request if startDate or endDate is missing
404 Not Found if no data exists for the date range
500 Internal Server Error on unexpected failures

When to use each approach

With both approaches implemented, you have flexibility for different scenarios:

Approach	Best For
Local script (Approach A)	Batch generation, scheduled jobs, local development, one-off reports
API endpoint (Approach B)	On-demand generation, external consumers, integration with other systems, user-triggered downloads

Both approaches use the same pdfmake library and produce identical PDF output. The key differences are:

Local script fetches data from the JSON API, then generates a PDF file on disk
API endpoint queries ClickHouse directly and returns the PDF binary in the HTTP response

Having both gives you the best of both worlds: scripts for automation and batch processing, plus an API for real-time requests.

Part 9: How it works

Why ClickHouse is fast

ClickHouse is columnar. It stores data differently from row-oriented databases like PostgreSQL.

Row storage (PostgreSQL):

Record 1: [txn_id=1, timestamp=..., store=W001, region=West, amount=599.98]Record 2: [txn_id=2, timestamp=..., store=W001, region=West, amount=49.99]Record 3: [txn_id=3, timestamp=..., store=W002, region=East, amount=89.99]

For SELECT SUM(amount) FROM sales WHERE region='West':

Read every row from disk
Check region
Add matching amounts
Repeat millions of times

You read 100% of data to use 2% of it.

Column storage (ClickHouse):

txn_ids:    [1, 2, 3, ...]timestamps: [2025-01-15T09:00, 2025-01-15T09:30, ...]stores:     [W001, W001, W002, ...]regions:    [West, West, East, ...]amounts:    [599.98, 49.99, 89.99, ...]

Same query in ClickHouse:

Read only region column
Read only amount column
Scan both in parallel
Skip everything else

Read 2% of data, use 100% of it. 10-100x faster.

Why materialized views beat batch jobs

Traditional: scheduled job (cron, Airflow) aggregates data hourly/daily.

Raw Data → [Wait for schedule] → Batch Job → Aggregated Table

Problems: data goes stale between runs, batch job rescans everything, failures leave gaps until next run, scaling means bigger windows or incremental logic.

ClickHouse MVs:

Raw Data → [INSERT triggers MV] → Aggregated Table (immediate)

INSERT into sales
MV's SELECT runs on new rows immediately
Results go into DailySalesSummary
Synchronous: MV updated by the time INSERT returns

Data stays current, no separate batch system to manage, scales with write volume, and a failed INSERT doesn't leave the MV in a bad state.

Batch aggregation

MVs process batches, not global data. Multiple inserts for the same day/region/category create multiple rows.

Two inserts for West/Electronics:

Batch 1 (morning): 50 transactions → MV inserts [saleDate=2025-01-15, region=West, category=Electronics, count=50, revenue=5000]Batch 2 (evening): 30 transactions → MV inserts [saleDate=2025-01-15, region=West, category=Electronics, count=30, revenue=3000]

Two rows for the same day/region/category. This is why GROUP BY is required:

-- Wrong: May return multiple rows per day/regionSELECT * FROM DailySalesSummary WHERE saleDate = '2025-01-15' -- Correct: Consolidates all batches into final aggregatesSELECT saleDate, region, sum(transactionCount), sum(totalRevenue)FROM DailySalesSummaryGROUP BY saleDate, region

SummingMergeTree auto-combines during background merges, but GROUP BY is still safe practice since merges are async.

When to use what

Strategy	When	Example
Raw queries	Ad-hoc, <1M rows	Debugging
Materialized view	Repeated queries, predictable patterns	Dashboards
Pre-aggregated API	External consumers	PDF reports

If you run the same aggregation twice, make an MV.

Time-series patterns

Order by time first:

orderByFields: ["saleTimestamp", "transactionId"]  // GoodorderByFields: ["transactionId", "saleTimestamp"]  // Bad for time queries

Time-first ORDER BY means date-range queries skip irrelevant blocks.

Time granularity:

Granularity	Function	Rows/year	Use
Hourly	`toStartOfHour()`	~8,760	Intraday
Daily	`toStartOfDay()`	~365	Standard
Weekly	`toStartOfWeek()`	~52	Trends
Monthly	`toStartOfMonth()`	~12	Executive

Finer = more rows = slower but more detail.

Multiple levels:

Raw transactions → Hourly MV → Daily MV → Monthly MV

Each level 10-30x smaller than the previous.

How Moose helps

Moose handles this complexity so you can focus on your data model:

ClickHouse Best Practice	Without Moose	With Moose
Columnar schema design	Hand-write DDL, manage migrations	Define TypeScript interface, schema auto-generated
Time-first ordering	Configure ORDER BY manually	Set `orderByFields` in pipeline config
Materialized views	Write CREATE MATERIALIZED VIEW DDL	Declare `MaterializedView` with typed query
Streaming ingestion	Set up Kafka/Redpanda, write consumers	Auto-configured with `IngestPipeline`
Type-safe APIs	Build REST endpoints manually	`Api` class with typed params and responses

This is what we covered in Why Moose for static reports?—a developer-first approach that gives you ClickHouse performance without ClickHouse complexity.

Wrap-up

What you built

CSV/JSON Files → Ingest API → ClickHouse Table → Materialized View → API → PDF

Components

SalesTransaction model

Type-safe schema that creates ClickHouse table

File loader

Parses CSV/JSON, sends to ingest API

DailySalesSummary MV

Pre-aggregates at insert time

sales-report API

Date-range queries with JSON response

PDF export (two options)

Local script or API endpoint returning binary PDF

Before vs after

Before: Analyst downloads 12 CSVs, copies to Excel, writes formulas, formats Word doc, emails PDF. 2+ hours daily. Breaks when analyst is out.

After: POS uploads to API, ClickHouse aggregates, API serves any date range, PDF generates in under a second. Runs unattended. Git tracks every change.

Performance

Operation	Row DB	Moose + ClickHouse
Transaction scan	Seconds-minutes	Milliseconds
Daily aggregation	10-30+ sec	<100ms
Report generation	Minutes	<1 sec

Next steps

Add dimension tables

Customer, product, and store reference data

Create additional MVs

Hourly, weekly, and monthly aggregations

Add Excel export

Use ExcelJS for spreadsheet output

Schedule delivery

Automate with Temporal workflows

Deploy to production

Use Fiveonefour hosting

Reference

Sample queries

Daily revenue by region:

SELECT  saleDate,  region,  sum(totalRevenue) AS revenue,  sum(transactionCount) AS transactionsFROM local.DailySalesSummaryGROUP BY saleDate, regionORDER BY saleDate DESC, revenue DESC;

Hourly pattern (raw data):

SELECT  toHour(saleTimestamp) AS hour_of_day,  count() AS transactions,  sum(totalAmount) AS revenueFROM local.salesGROUP BY hour_of_dayORDER BY hour_of_day;

Top categories:

SELECT  productCategory,  sum(totalRevenue) AS revenue,  sum(transactionCount) AS transactionsFROM local.DailySalesSummaryWHERE saleDate BETWEEN '2025-01-01' AND '2025-01-31'GROUP BY productCategoryORDER BY revenue DESCLIMIT 5;

Project structure

- index.ts
  SalesTransaction.ts
  DailySalesSummary.ts
  salesReport.ts
  salesReportPdf.ts
- load-file.ts
  generate-pdf-report.ts
- west-store-001.csv
- moose.config.toml
- package.json
- tsconfig.json

File	Purpose
`app/index.ts`	Export all modules
`app/ingest/SalesTransaction.ts`	IngestPipeline definition
`app/views/DailySalesSummary.ts`	MaterializedView
`app/apis/salesReport.ts`	Consumption API (JSON)
`app/apis/salesReportPdf.ts`	PDF API (binary response)
`scripts/load-file.ts`	CSV/JSON file loader
`scripts/generate-pdf-report.ts`	PDF generator (local script)
`sample-data/west-store-001.csv`	Sample transactions

Configuration

moose.config.toml:

[project]name = "sales-reports"language = "typescript" [features]olap = truestreaming = true [clickhouse_config]db_name = "local"host = "localhost"host_port = 18123native_port = 9000user = "panda"password = "pandapass"use_ssl = false

Troubleshooting

Setup

EUNSUPPORTEDPROTOCOL: Use pnpm install, not npm install.

pnpm build warnings: You'll see "Ignored build scripts" warnings—these are safe to ignore. They're security warnings about packages that want to run build scripts.

Supply the type param T: Add type parameter: new IngestPipeline<SalesTransaction>(...).

Primary key must be prefix of sorting key: First field in orderByFields becomes primary key.

Port 4000, 9000, or 18123 in use:

# Find what's using the port
lsof -i :4000
lsof -i :9000
lsof -i :18123
 
# Kill the process if needed
kill -9 <PID>

Stop other Moose projects before starting a new one. Port 9000 is the ClickHouse native port and commonly conflicts with other Moose instances.

ClickHouse connection refused: Wait 20-30 seconds for containers (longer on first startup). Verify:

curl http://localhost:18123/ping

Check Docker:

docker ps | grep clickhouse

Docker not starting: Run docker info. On Mac/Windows, check Docker Desktop is running (menu bar icon). If Docker stops mid-tutorial, all services fail with "connection refused" errors.

Model not detected: Check file saved, export in app/index.ts, TypeScript errors in terminal. Type errors block the dev server from processing changes.

TypeScript errors blocking startup: The moose dev terminal shows compilation errors. Fix all errors before proceeding—the system won't process new models or APIs until code compiles.

Data

CSV date parsing fails: Use ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ).

Wrong Format	Correct Format
`01/15/2025` (US format)	`2025-01-15T09:00:00Z`
`15-01-2025` (EU format)	`2025-01-15T09:00:00Z`
`2025-1-15` (no leading zeros)	`2025-01-15T09:00:00Z`

Ingestion error: Verify JSON matches interface:

# Test with minimal valid payload
curl -X POST http://localhost:4000/ingest/sales \
  -H "Content-Type: application/json" \
  -d '{"transactionId":"TEST-001","saleTimestamp":"2025-01-15T10:00:00Z","storeId":"STORE-001","region":"West","productId":"PROD-001","productCategory":"Electronics","customerId":"","quantity":1,"unitPrice":10.00,"totalAmount":10.00}'

Type mismatches: quantity must be number not "1", totalAmount must be number not "10.00".

Duplicate IDs: ClickHouse doesn't enforce uniqueness. Check:

SELECT transactionId, count() as cntFROM local.salesGROUP BY transactionIdHAVING cnt > 1

Deduplicate with GROUP BY or subquery. FINAL only works with ReplacingMergeTree/CollapsingMergeTree.

Queries

MV returns unexpected results: MVs aggregate per-batch. Use GROUP BY:

-- Wrong (may show partial aggregates)SELECT * FROM local.DailySalesSummary -- Correct (consolidates partial aggregates)SELECT  saleDate,  region,  sum(transactionCount) AS transactionCount,  sum(totalRevenue) AS totalRevenueFROM local.DailySalesSummaryGROUP BY saleDate, region

API returns empty: Check date format (YYYY-MM-DD not MM-DD-YYYY). Verify data exists for the range.

Query slow: Check you're querying the MV not raw table. Check date filter uses indexed column. Run EXPLAIN.

PDF

Not generating: Check API accessible, fetch errors in output, pdfmake installed. PDF generation should complete in under 2 seconds—if it hangs, the API is probably not responding.

Empty/malformed: Check API response has data, date range has transactions.

Test the API first:

curl "http://localhost:4000/consumption/sales-report?startDate=2025-01-15&endDate=2025-01-15"

If this returns empty [], verify data was ingested and the date range is correct.

Cleanup

Stopping the dev server: Press Ctrl+C in the moose dev terminal. Docker containers stop automatically.

Starting over completely: Stop moose dev, delete the project directory, and run moose init again.

Data persists across restarts: If you stop and restart moose dev, your ingested data is still there—it's stored in Docker volumes.