Modeling Tables

Code-first schema

Type-safe

Hot-reload

Overview

Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing.

You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more.

FirstTable.ts

interface MyFirstTable {  id: Key<string>;  name: string;  age: number;} // Create a table named "first_table"export const myTable = new OlapTable<MyFirstTable>("first_table");

Benefits:

Boilerplate CREATE/ALTER TABLE statements handled for you

Automatic type mapping to ClickHouse types

Built-in type validation on insert

Version-controlled schema management

Basic Usage

Standalone Tables

Create a table directly for custom data flows or when you need fine-grained control:

StandaloneTable.ts

import { OlapTable, Key } from "@514labs/moose-lib"; // Define your schemainterface ExampleSchema {  id: Key<string>;  dateField: Date;  numericField: number;  booleanField: boolean;  floatField: number;  integerField: number & tags.Type<"int64">; // Moose supports native tagged types so you can use Integers in typescript} // Create a standalone table named "example_table"export const exampleTable = new OlapTable<ExampleSchema>("example_table", {  orderByFields: ["id", "dateField"] // Optional when using a primary key}); // For deduplication, use the ReplacingMergeTree factoryexport const dedupTable = OlapTable.withReplacingMergeTree<ExampleSchema>("example_table", {  orderByFields: ["id", "dateField"],  ver: "updatedAt",  // Optional: version column (keeps highest value)  isDeleted: "deleted"  // Optional: soft delete flag (requires ver)}); // Now you can:// - Write to this table from streams// - Query it directly// - Use it as a source for materialized views

Use Standalone Tables When:

Use when you need to do a bulk import of data

Use when you have in-memory ETL/ELT workflows that need to write directly to a table as opposed to a streaming ingestion pipeline

Use when you have some external service that is maintaining and writing to the table, like a CDC or other external ETL service

Creating Tables in Ingestion Pipelines

For end-to-end data flows, create tables as part of an ingestion pipeline:

PipelineTable.ts

import { IngestPipeline, Key } from "@514labs/moose-lib"; // Define your schemainterface UserEvent {  id: Key<string>;  userId: string;  timestamp: Date;  eventType: string;} // Create a complete ingestion pipeline with a tableconst eventsPipeline = new IngestPipeline<UserEvent>("user_events", {  ingestApi: true,    // Creates a REST API endpoint at POST localhost:4000/ingest/user_events  stream: true,    // Creates Kafka/Redpanda topic  table: {         // Creates and configures the table named "user_events"    orderByFields: ["id", "timestamp"]  }}); // Access the table component when neededconst eventsTable = eventsPipeline.table;

Data Modeling

Special ClickHouse Types (LowCardinality, Nullable, etc)

ClickHouseTypes.ts

import { Key, Decimal, ClickHouseDecimal, LowCardinality, ClickHouseNamedTuple, tags } from "@514labs/moose-lib"; export interface ClickHouseOptimizedExample {  id: Key<string>;  stringField: string;  numberField: number;  decimalField: Decimal<10, 2>;                         // Precise decimal storage  // Alternative: decimalField: string & ClickHouseDecimal<10, 2>; // Verbose syntax still works  lowCardinalityField: string & LowCardinality;         // Faster queries for enum-like data  nestedObject: {    innerString: string;    innerNumber: number;  };  namedTupleField: {    name: string;    value: number;  } & ClickHouseNamedTuple;                             // Optimized nested storage  numberArray: number[];  mapField: Record<string, number>;  literalField: "optionA" | "optionB";  optionalField?: string;                               // Nullable field  dateField: Date;}

Default values

Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL.

Defaults.ts

import { OlapTable, Key, ClickHouseDefault, Decimal, ClickHouseDecimal } from "@514labs/moose-lib"; interface Event {  id: Key<string>;  // Static defaults (ClickHouse expression as a string literal)  status: string & ClickHouseDefault<"'pending'">;   // DEFAULT 'pending'  retries: number & ClickHouseDefault<"0">;          // DEFAULT 0  // Server-side timestamps  createdAt: Date & ClickHouseDefault<"now()">;      // DEFAULT now()  // Decimal with default  amount: Decimal<10, 2> & ClickHouseDefault<"0">;  // Alternative: amount: (string & ClickHouseDecimal<10, 2> & ClickHouseDefault<"0">); // Verbose syntax} export const events = new OlapTable<Event>("events", {  orderByFields: ["id", "createdAt"],});

The value passed into the ClickHouseDefault<""> tag can either be a string literal or a stringified ClickHouse SQL expression. If you run into typing issues specifically on Date fields with ClickHouseDefault, use WithDefault<Date, "now()"> as a fallback workaround.

ClickHouse defaults with optional fields

If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.

Optional without default (e.g., field?: number) → ClickHouse Nullable type.
Optional with default (e.g., field?: number & ClickHouseDefault<"18"> or WithDefault<number, "18">) → non-nullable column with default 18.

This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.

Column Comments

Add documentation directly to your data model fields that propagates to ClickHouse COMMENT clauses. This makes your schema self-documenting and easier to understand for anyone querying the database.

Use TSDoc comments (/** ... */) directly above fields:

ColumnComments.ts

import { Key, DateTime, OlapTable } from "@514labs/moose-lib"; interface UserEvent {  /** Unique identifier for the user event */  id: Key<string>;  /** Timestamp when the event occurred */  timestamp: DateTime;  /** Email address of the user (must be valid format) */  email: string;  /** Total transaction amount in USD */  amount: number;  // Fields without TSDoc comments have no COMMENT in DDL  status: string;} export const userEventTable = new OlapTable<UserEvent>("user_events", {  orderByFields: ["id", "timestamp"],});

The generated ClickHouse DDL includes column comments:

CREATE TABLE user_events (  `id` String COMMENT 'Unique identifier for the user event',  `timestamp` DateTime COMMENT 'Timestamp when the event occurred',  `email` String COMMENT 'Email address of the user (must be valid format)',  `amount` Float64 COMMENT 'Total transaction amount in USD',  `status` String) ENGINE = MergeTree()ORDER BY (id, timestamp)

Viewing comments in ClickHouse

Query column comments using SELECT name, comment FROM system.columns WHERE table = 'user_events' or DESCRIBE TABLE user_events FORMAT Vertical.

Database Selection

By default, tables are created in the database specified in your moose.config.toml ClickHouse configuration. You can override this on a per-table basis using the database field:

DatabaseOverride.ts

import { OlapTable } from "@514labs/moose-lib"; interface UserData {  id: Key<string>;  name: string;  email: string;} // Table in default database (from moose.config.toml)const defaultTable = new OlapTable<UserData>("users"); // Table in specific database (e.g., "analytics")const analyticsTable = new OlapTable<UserData>("users", {  database: "analytics",  orderByFields: ["id"]});

Multi-database setup

To use custom databases, configure them in your moose.config.toml:

[clickhouse_config]db_name = "local"additional_databases = ["analytics", "staging"]

The databases in additional_databases will be created automatically when you start your Moose application.

Primary Keys and Sorting

You must configure table indexing using one of these approaches:

Define at least one Key in your table schema
Specify orderByFields in the table config
Use both (all Key fields must come first in the orderByFields array)

PrimaryKeyConfig.ts

import { OlapTable, Key } from '@514labs/moose-lib'; // Approach 1: Using primary key onlyinterface Record1 {  id: Key<string>;  // Primary key field  field1: string;  field2: number;} const table1 = new OlapTable<Record1>("table1");  // id is the primary key

Order By Fields Only

OrderByFieldsOnly.ts

// Approach 2: Using orderByFields onlyinterface SchemaWithoutPrimaryKey {  field1: string;  field2: number;  field3: Date;} const tableWithOrderByFieldsOnly = new OlapTable<SchemaWithoutPrimaryKey>("table2", {  orderByFields: ["field1", "field2"]  // Specify ordering without primary key});

Order By Expression

Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely.

OrderByExpression.ts

// Use a ClickHouse expression for ORDER BYinterface Events {  userId: string;  createdAt: Date;  eventType: string;} const tableWithOrderByExpression = new OlapTable<Events>("events", {  // Equivalent to orderByFields: ["userId", "createdAt", "eventType"]  orderByExpression: "(userId, createdAt, eventType)",}); // Advanced: functions inside expressionconst tableWithMonthBucketing = new OlapTable<Events>("events_by_month", {  orderByExpression: "(userId, toYYYYMM(createdAt))",}); // No sorting (ClickHouse tuple() means empty ORDER BY)const unsortedTable = new OlapTable<Events>("events_unsorted", {  orderByExpression: "tuple()",});

Primary Key Expression

Use a ClickHouse SQL expression to define the primary key explicitly. This is useful when:

You need functions in the primary key (e.g., cityHash64(id))
The primary key column ordering should differ from the schema definition
The primary key should differ from the ORDER BY

Important: When primaryKeyExpression is specified, any Key<T> annotations on columns are ignored for PRIMARY KEY generation.

PrimaryKeyExpression.ts

import { OlapTable, Key } from "@514labs/moose-lib"; // Example 1: Primary key with functioninterface UserEvents {  userId: string;  eventId: string;  timestamp: Date;} const eventsTable = new OlapTable<UserEvents>("user_events", {  // Use hash function in primary key for better distribution  primaryKeyExpression: "(userId, cityHash64(eventId))",  orderByExpression: "(userId, timestamp)",}); // Example 2: Different ordering in primary key vs columnsinterface Product {  category: string;  brand: string;  productId: string;  name: string;} const productsTable = new OlapTable<Product>("products", {  // Primary key order optimized for uniqueness  primaryKeyExpression: "productId",  // Order by optimized for common queries  orderByFields: ["category", "brand", "productId"],}); // Example 3: Override Key<T> annotationinterface Record {  id: Key<string>;  // This Key<T> annotation will be IGNORED  otherId: string;} const recordTable = new OlapTable<Record>("records", {  // This expression overrides the Key<T> annotation  primaryKeyExpression: "(otherId, id)",  orderByExpression: "(otherId, id)",});

Rationale for Primary Key Expression:

Function Support: Primary keys can use ClickHouse functions like cityHash64() for better data distribution, which cannot be expressed through column-level annotations.
Flexible Ordering: The ordering of columns in the primary key can be different from the ordering in the schema definition, allowing optimization for both data uniqueness and query patterns.
Separation of Concerns: PRIMARY KEY and ORDER BY serve different purposes in ClickHouse:
- PRIMARY KEY defines uniqueness and deduplication
- ORDER BY defines physical data layout and query optimization
Sometimes these need different column orderings for optimal performance.

Important Constraint:

⚠️ PRIMARY KEY must be a prefix of ORDER BY in ClickHouse. This means ORDER BY must start with all PRIMARY KEY columns in the same order.

Valid:

PRIMARY KEY (userId) with ORDER BY (userId, timestamp) ✅
PRIMARY KEY (userId, cityHash64(eventId)) with ORDER BY (userId, cityHash64(eventId), timestamp) ✅

Invalid:

PRIMARY KEY (userId, eventId) with ORDER BY (userId, timestamp) ❌ (missing eventId)
PRIMARY KEY (userId, eventId) with ORDER BY (eventId, userId) ❌ (wrong order)

Using Both Primary Key and Order By Fields

ComboKeyAndOrderByFields.ts

// Approach 3: Using both (primary key must be first)interface SchemaWithKey {  id: Key<string>;  // Primary key field  field1: string;  field2: number;} const tableWithKeyAndOrderByFields = new OlapTable<SchemaWithKey>("table3", {  orderByFields: ["id", "field1"]  // Primary key must be first});

Using Multiple Primary Keys

MultiKeyTable.ts

interface MultiKeyRecord {  key1: Key<string>;  key2: Key<number>;  field1: string;} const multiKeyTable = new OlapTable<MultiKeyRecord>("multi_key_table", {  orderByFields: ["key1", "key2", "field1"]  // Multiple keys must come first});

Table engines

By default, Moose will create tables with the MergeTree engine. You can use different engines by setting the engine in the table configuration.

TableEngine.ts

import { OlapTable } from "@514labs/moose-lib"; // Default MergeTree engineconst table = new OlapTable<Record>("table", {  orderByFields: ["id"]}); // Use engine configuration for other enginesconst dedupTable = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"],  ver: "version",  // Optional: keeps row with highest version  isDeleted: "deleted"  // Optional: soft delete when deleted=1});

Deduplication (`ReplacingMergeTree`)

Use the ReplacingMergeTree engine to keep only the latest record for your designated sort key:

DeduplicatedTable.ts

// Basic deduplicationconst table = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"]}); // With version column (keeps record with highest version)const versionedTable = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"],  ver: "updated_at"  // Column that determines which version to keep}); // With soft deletes (requires ver parameter)const softDeleteTable = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"],  ver: "updated_at",  isDeleted: "deleted"  // UInt8 column: 1 marks row for deletion});

Deduplication Caveats

ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately.

Version Column (ver): When specified, ClickHouse keeps the row with the maximum version value for each unique sort key.

Soft Deletes (is_deleted): When specified along with ver, rows where this column equals 1 are deleted during merges. This column must be UInt8 type.

For more details, see the ClickHouse documentation.

Streaming from S3 (`S3Queue`)

Use the S3Queue engine to automatically ingest data from S3 buckets as files are added:

S3Queue with Materialized Views

S3Queue tables only process new files added to S3 after table creation. When used as a source for materialized views, no backfill occurs - the MV will only start populating as new files arrive. See the Materialized Views documentation for more details.

S3StreamingTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // Use direct configuration (S3Queue does not support orderByFields)export const s3Events = new OlapTable<S3Event>("s3_events", {  engine: ClickHouseEngines.S3Queue,  s3Path: "s3://my-bucket/data/*.json",  format: "JSONEachRow",  settings: {    mode: "unordered",    keeper_path: "/clickhouse/s3queue/events"  }});

S3Queue ORDER BY not supported

S3Queue is a streaming engine and does not support orderByFields or ORDER BY clauses. Configure only engine-specific parameters like s3Path, format, and settings.

⚠️ NEVER hardcode AWS credentials!

Security Risk: Hardcoding credentials in your code embeds them in Docker images and deployment artifacts, creating serious security vulnerabilities.

Solution: Use mooseRuntimeEnv for runtime credential resolution:

SecureS3Streaming.ts

import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // ✅ RECOMMENDED: Runtime environment variable resolutionexport const secureS3Events = new OlapTable<S3Event>("s3_events", {  engine: ClickHouseEngines.S3Queue,  s3Path: "s3://my-bucket/data/*.json",  format: "JSONEachRow",  awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"),  awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),  settings: {    mode: "unordered",    keeper_path: "/clickhouse/s3queue/events"  }});

Then set environment variables:

export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="your-secret-key"
moose prod up

Benefits:

Credentials never embedded in Docker images
Supports credential rotation (changing passwords triggers table recreation)
Different credentials per environment (dev/staging/prod)
Clear error messages if environment variables are missing

S3Queue Requirements

S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas.

Direct S3 Access (`S3`)

Use the S3 engine for direct read/write access to S3 storage without streaming semantics:

S3Table.ts

import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // S3 table with credentials (recommended with mooseRuntimeEnv)export const s3Data = new OlapTable<DataRecord>("s3_data", {  engine: ClickHouseEngines.S3,  path: "s3://my-bucket/data/file.json",  format: "JSONEachRow",  awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"),  awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),  compression: "gzip"}); // Public S3 bucket (no authentication)export const publicS3 = new OlapTable<DataRecord>("public_s3", {  engine: ClickHouseEngines.S3,  path: "s3://public-bucket/data/*.parquet",  format: "Parquet",  noSign: true  // Use NOSIGN for public buckets});

S3 vs S3Queue

S3: Direct read/write access to S3 files. Use for batch processing or querying static data.
S3Queue: Streaming engine that automatically processes new files as they arrive. Use for continuous data ingestion.

Both engines support the same credential management and format options.

IcebergS3

The IcebergS3 engine provides read-only access to Iceberg tables stored in S3:

IcebergTable.ts

import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // Iceberg table with AWS credentials (recommended with mooseRuntimeEnv)export const icebergEvents = new OlapTable<Event>("iceberg_events", {  engine: ClickHouseEngines.IcebergS3,  path: "s3://my-bucket/warehouse/db/table/",  format: "Parquet", // or "ORC"  awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"),  awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),});

IcebergS3 is read-only

IcebergS3 tables are read-only and provide access to the latest state of your Iceberg table
orderByFields, orderByExpression, partitionBy, and sampleByExpression are not supported
The table automatically reflects the current state of the Iceberg table in S3
Supported formats: Parquet and ORC only

In-Memory Buffer (`Buffer`)

The Buffer engine provides an in-memory buffer that flushes data to a destination table based on time, row count, or size thresholds:

BufferTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // First create the destination tableexport const destinationTable = new OlapTable<Record>("destination", {  engine: ClickHouseEngines.MergeTree,  orderByFields: ["id", "timestamp"]}); // Then create buffer that writes to itexport const bufferTable = new OlapTable<Record>("buffer", {  engine: ClickHouseEngines.Buffer,  targetDatabase: "local",  targetTable: "destination",  numLayers: 16,  minTime: 10,        // Min 10 seconds before flush  maxTime: 100,       // Max 100 seconds before flush  minRows: 10000,     // Min 10k rows before flush  maxRows: 1000000,   // Max 1M rows before flush  minBytes: 10485760, // Min 10MB before flush  maxBytes: 104857600 // Max 100MB before flush});

Buffer Engine Considerations

Data in buffer is lost if server crashes before flush
Not suitable for critical data that must be durable
Best for high-throughput scenarios where minor data loss is acceptable
Buffer and destination table must have identical schemas
Cannot use orderByFields, partitionBy, or sampleByExpression on buffer tables

For more details, see the ClickHouse Buffer documentation.

Distributed Tables (`Distributed`)

The Distributed engine creates a distributed table across a ClickHouse cluster for horizontal scaling:

DistributedTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // Distributed table across clusterexport const distributedTable = new OlapTable<Record>("distributed_data", {  engine: ClickHouseEngines.Distributed,  cluster: "my_cluster",  targetDatabase: "default",  targetTable: "local_table",  shardingKey: "cityHash64(id)"  // Optional: how to distribute data});

Distributed Table Requirements

Requires a configured ClickHouse cluster with remote_servers configuration
The local table must exist on all cluster nodes
Distributed tables are virtual - data is stored in local tables
Cannot use orderByFields, partitionBy, or sampleByExpression on distributed tables
The cluster name must match a cluster defined in your ClickHouse configuration

For more details, see the ClickHouse Distributed documentation.

Replicated Engines

Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants:

ReplicatedMergeTree - Replicated version of MergeTree
ReplicatedReplacingMergeTree - Replicated with deduplication
ReplicatedAggregatingMergeTree - Replicated with aggregation
ReplicatedSummingMergeTree - Replicated with summation

ReplicatedEngines.ts

import { OlapTable, ClickHouseEngines } from "@514labs/moose-lib"; // Basic replicated table with explicit pathsconst replicatedTable = new OlapTable<Record>("records", {  engine: ClickHouseEngines.ReplicatedMergeTree,  keeperPath: "/clickhouse/tables/{database}/{shard}/records",  replicaName: "{replica}",  orderByFields: ["id"]}); // Replicated with deduplicationconst replicatedDedup = new OlapTable<Record>("dedup_records", {  engine: ClickHouseEngines.ReplicatedReplacingMergeTree,  keeperPath: "/clickhouse/tables/{database}/{shard}/dedup_records",  replicaName: "{replica}",  ver: "updated_at",  isDeleted: "deleted",  orderByFields: ["id"]}); // For ClickHouse Cloud or Boreal (no parameters needed)const cloudReplicated = new OlapTable<Record>("cloud_records", {  engine: ClickHouseEngines.ReplicatedMergeTree,  orderByFields: ["id"]});

Configuring Replication

Replicated engines support three configuration approaches. Choose the one that fits your deployment:

Default

Omit all replication parameters. Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments:

DefaultReplication.ts

const table = new OlapTable<Record>("my_table", {  engine: ClickHouseEngines.ReplicatedMergeTree,  orderByFields: ["id"]  // No keeper_path, replica_name, or cluster needed});

Moose auto-injects: /clickhouse/tables/{database}/{shard}/{table_name} and {replica} in local development. ClickHouse Cloud uses its own patterns automatically.

Cluster

For multi-node deployments, specify a cluster name to use ON CLUSTER DDL operations:

ClusterReplication.ts

const table = new OlapTable<Record>("my_table", {  engine: ClickHouseEngines.ReplicatedMergeTree,  orderByFields: ["id"],  cluster: "default"  // References cluster from moose.config.toml});

Configuration in moose.config.toml:

[[clickhouse_config.clusters]]name = "default"

Use when:

Running multi-node self-managed ClickHouse with cluster configuration
Need ON CLUSTER DDL for distributed operations

Replication Paths

For custom replication topology, specify both keeper_path and replica_name:

ExplicitReplication.ts

const table = new OlapTable<Record>("my_table", {  engine: ClickHouseEngines.ReplicatedMergeTree,  keeperPath: "/clickhouse/tables/{database}/{shard}/my_table",  replicaName: "{replica}",  orderByFields: ["id"]});

Use when:

Need custom replication paths for advanced configurations
Both parameters must be provided together

Warning:

Cannot mix approaches: Specifying both cluster and explicit keeper_path/replica_name will cause an error. Choose one approach.

Cluster is a deployment directive: Changing cluster won't recreate your table—it only affects future DDL operations.

For more details, see the ClickHouse documentation on data replication.

Kafka

Use the Kafka engine to consume data directly from Kafka compatible topics.

For more details on the Kafka engine, see the ClickHouse documentation on Kafka integration.

Kafka with Materialized Views

Kafka tables are streaming interfaces that don't persist data. Use a MaterializedView to continuously move data to a table.

KafkaTable.ts

import { OlapTable, ClickHouseEngines, MaterializedView, sql } from '@514labs/moose-lib'; interface KafkaEvent {  eventId: string;  userId: string;  timestamp: number; // Unix seconds for JSONEachRow} // Kafka table - reads from topic, doesn't persistexport const kafkaSource = new OlapTable<KafkaEvent>("kafka_events", {  engine: ClickHouseEngines.Kafka,  brokerList: "redpanda:9092",  topicList: "events",  groupName: "my_consumer_group",  format: "JSONEachRow",  settings: {    kafka_num_consumers: "1",  }}); // MaterializedView moves data to MergeTree for persistenceconst cols = kafkaSource.columns;export const eventsMV = new MaterializedView<KafkaEvent>({  tableName: "events_dest",  materializedViewName: "events_mv",  orderByFields: ["eventId"],  selectStatement: sql`SELECT ${cols.eventId}, ${cols.userId}, ${cols.timestamp} FROM ${kafkaSource}`,  selectTables: [kafkaSource],});

Kafka Engine Limitations

No ORDER BY: Kafka is a streaming engine and doesn't support orderByFields
No ALTER TABLE: Column or settings changes require DROP and CREATE (Moose handles this automatically)
No direct SELECT: Query data from the MaterializedView destination table, not the Kafka table

Externally Managed Tables

If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to EXTERNALLY_MANAGED.

ExternallyManagedTable.ts

import { OlapTable, LifeCycle } from "@514labs/moose-lib"; // Table managed by external systemconst externalTable = new OlapTable<UserData>("external_users", {  orderByFields: ["id", "timestamp"],  lifeCycle: LifeCycle.EXTERNALLY_MANAGED  // Moose won't create or modify this table});

Learn More About LifeCycle Management

Learn more about the different lifecycle options and how to use them in the LifeCycle Management documentation.

Invalid Configurations

InvalidConfig.ts

// Error: No primary key or orderByFieldsinterface BadRecord1 {  field1: string;  field2: number;}const badTable1 = new OlapTable<BadRecord1>("bad_table1"); // Error: Primary key not first in orderByFieldsinterface BadRecord2 {  id: Key<string>;  field1: string;}const badTable2 = new OlapTable<BadRecord2>("bad_table2", {  orderByFields: ["field1", "id"]  // Wrong order - primary key must be first}); // Error: Nullable field in orderByFieldsinterface BadRecord3 {  id: Key<string>;  field1: string;  field2?: number;}const badTable3 = new OlapTable<BadRecord3>("bad_table3", {  orderByFields: ["id", "field2"]  // Can't have nullable field in orderByFields});

Development Workflow

Local Development with Hot Reloading

One of the powerful features of Moose is its integration with the local development server:

Start your local development server with moose dev
When you define or modify an OlapTable in your code and save the file:
- The changes are automatically detected
- The TypeScript compiler plugin processes your schema definitions
- The infrastructure is updated in real-time to match your code changes
- Your tables are immediately available for testing

For example, if you add a new field to your schema:

HotReloading.ts

// Beforeinterface BasicSchema {  id: Key<string>;  name: string;} // After adding a fieldinterface BasicSchema {  id: Key<string>;  name: string;  createdAt: Date;  // New field}

The Moose framework will:

Detect the change when you save the file
Update the table schema in the local ClickHouse instance
Make the new field immediately available for use

Verifying Your Tables

You can verify your tables were created correctly using:

# List all tables in your local environment
moose ls

Connecting to your local ClickHouse instance

You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your moose.config.toml file:

moose.config.toml

[clickhouse_config]db_name = "local"user = "panda"password = "pandapass"use_ssl = falsehost = "localhost"host_port = 18123native_port = 9000

Modeling Tables

Code-first schema

Type-safe

Hot-reload

Overview

Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing.

You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more.

FirstTable.ts

interface MyFirstTable {  id: Key<string>;  name: string;  age: number;} // Create a table named "first_table"export const myTable = new OlapTable<MyFirstTable>("first_table");

FirstTable.ts

interface MyFirstTable {  id: Key<string>;  name: string;  age: number;} // Create a table named "first_table"export const myTable = new OlapTable<MyFirstTable>("first_table");

Benefits:

Boilerplate CREATE/ALTER TABLE statements handled for you

Automatic type mapping to ClickHouse types

Built-in type validation on insert

Version-controlled schema management

Basic Usage

Standalone Tables

Create a table directly for custom data flows or when you need fine-grained control:

StandaloneTable.ts

import { OlapTable, Key } from "@514labs/moose-lib"; // Define your schemainterface ExampleSchema {  id: Key<string>;  dateField: Date;  numericField: number;  booleanField: boolean;  floatField: number;  integerField: number & tags.Type<"int64">; // Moose supports native tagged types so you can use Integers in typescript} // Create a standalone table named "example_table"export const exampleTable = new OlapTable<ExampleSchema>("example_table", {  orderByFields: ["id", "dateField"] // Optional when using a primary key}); // For deduplication, use the ReplacingMergeTree factoryexport const dedupTable = OlapTable.withReplacingMergeTree<ExampleSchema>("example_table", {  orderByFields: ["id", "dateField"],  ver: "updatedAt",  // Optional: version column (keeps highest value)  isDeleted: "deleted"  // Optional: soft delete flag (requires ver)}); // Now you can:// - Write to this table from streams// - Query it directly// - Use it as a source for materialized views

StandaloneTable.ts

import { OlapTable, Key } from "@514labs/moose-lib"; // Define your schemainterface ExampleSchema {  id: Key<string>;  dateField: Date;  numericField: number;  booleanField: boolean;  floatField: number;  integerField: number & tags.Type<"int64">; // Moose supports native tagged types so you can use Integers in typescript} // Create a standalone table named "example_table"export const exampleTable = new OlapTable<ExampleSchema>("example_table", {  orderByFields: ["id", "dateField"] // Optional when using a primary key}); // For deduplication, use the ReplacingMergeTree factoryexport const dedupTable = OlapTable.withReplacingMergeTree<ExampleSchema>("example_table", {  orderByFields: ["id", "dateField"],  ver: "updatedAt",  // Optional: version column (keeps highest value)  isDeleted: "deleted"  // Optional: soft delete flag (requires ver)}); // Now you can:// - Write to this table from streams// - Query it directly// - Use it as a source for materialized views

Use Standalone Tables When:

Use when you need to do a bulk import of data

Use when you have in-memory ETL/ELT workflows that need to write directly to a table as opposed to a streaming ingestion pipeline

Use when you have some external service that is maintaining and writing to the table, like a CDC or other external ETL service

Creating Tables in Ingestion Pipelines

For end-to-end data flows, create tables as part of an ingestion pipeline:

PipelineTable.ts

import { IngestPipeline, Key } from "@514labs/moose-lib"; // Define your schemainterface UserEvent {  id: Key<string>;  userId: string;  timestamp: Date;  eventType: string;} // Create a complete ingestion pipeline with a tableconst eventsPipeline = new IngestPipeline<UserEvent>("user_events", {  ingestApi: true,    // Creates a REST API endpoint at POST localhost:4000/ingest/user_events  stream: true,    // Creates Kafka/Redpanda topic  table: {         // Creates and configures the table named "user_events"    orderByFields: ["id", "timestamp"]  }}); // Access the table component when neededconst eventsTable = eventsPipeline.table;

PipelineTable.ts

import { IngestPipeline, Key } from "@514labs/moose-lib"; // Define your schemainterface UserEvent {  id: Key<string>;  userId: string;  timestamp: Date;  eventType: string;} // Create a complete ingestion pipeline with a tableconst eventsPipeline = new IngestPipeline<UserEvent>("user_events", {  ingestApi: true,    // Creates a REST API endpoint at POST localhost:4000/ingest/user_events  stream: true,    // Creates Kafka/Redpanda topic  table: {         // Creates and configures the table named "user_events"    orderByFields: ["id", "timestamp"]  }}); // Access the table component when neededconst eventsTable = eventsPipeline.table;

Data Modeling

Special ClickHouse Types (LowCardinality, Nullable, etc)

ClickHouseTypes.ts

import { Key, Decimal, ClickHouseDecimal, LowCardinality, ClickHouseNamedTuple, tags } from "@514labs/moose-lib"; export interface ClickHouseOptimizedExample {  id: Key<string>;  stringField: string;  numberField: number;  decimalField: Decimal<10, 2>;                         // Precise decimal storage  // Alternative: decimalField: string & ClickHouseDecimal<10, 2>; // Verbose syntax still works  lowCardinalityField: string & LowCardinality;         // Faster queries for enum-like data  nestedObject: {    innerString: string;    innerNumber: number;  };  namedTupleField: {    name: string;    value: number;  } & ClickHouseNamedTuple;                             // Optimized nested storage  numberArray: number[];  mapField: Record<string, number>;  literalField: "optionA" | "optionB";  optionalField?: string;                               // Nullable field  dateField: Date;}

ClickHouseTypes.ts

import { Key, Decimal, ClickHouseDecimal, LowCardinality, ClickHouseNamedTuple, tags } from "@514labs/moose-lib"; export interface ClickHouseOptimizedExample {  id: Key<string>;  stringField: string;  numberField: number;  decimalField: Decimal<10, 2>;                         // Precise decimal storage  // Alternative: decimalField: string & ClickHouseDecimal<10, 2>; // Verbose syntax still works  lowCardinalityField: string & LowCardinality;         // Faster queries for enum-like data  nestedObject: {    innerString: string;    innerNumber: number;  };  namedTupleField: {    name: string;    value: number;  } & ClickHouseNamedTuple;                             // Optimized nested storage  numberArray: number[];  mapField: Record<string, number>;  literalField: "optionA" | "optionB";  optionalField?: string;                               // Nullable field  dateField: Date;}

Default values

Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL.

Defaults.ts

import { OlapTable, Key, ClickHouseDefault, Decimal, ClickHouseDecimal } from "@514labs/moose-lib"; interface Event {  id: Key<string>;  // Static defaults (ClickHouse expression as a string literal)  status: string & ClickHouseDefault<"'pending'">;   // DEFAULT 'pending'  retries: number & ClickHouseDefault<"0">;          // DEFAULT 0  // Server-side timestamps  createdAt: Date & ClickHouseDefault<"now()">;      // DEFAULT now()  // Decimal with default  amount: Decimal<10, 2> & ClickHouseDefault<"0">;  // Alternative: amount: (string & ClickHouseDecimal<10, 2> & ClickHouseDefault<"0">); // Verbose syntax} export const events = new OlapTable<Event>("events", {  orderByFields: ["id", "createdAt"],});

Defaults.ts

import { OlapTable, Key, ClickHouseDefault, Decimal, ClickHouseDecimal } from "@514labs/moose-lib"; interface Event {  id: Key<string>;  // Static defaults (ClickHouse expression as a string literal)  status: string & ClickHouseDefault<"'pending'">;   // DEFAULT 'pending'  retries: number & ClickHouseDefault<"0">;          // DEFAULT 0  // Server-side timestamps  createdAt: Date & ClickHouseDefault<"now()">;      // DEFAULT now()  // Decimal with default  amount: Decimal<10, 2> & ClickHouseDefault<"0">;  // Alternative: amount: (string & ClickHouseDecimal<10, 2> & ClickHouseDefault<"0">); // Verbose syntax} export const events = new OlapTable<Event>("events", {  orderByFields: ["id", "createdAt"],});

ClickHouse defaults with optional fields

If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.

Optional without default (e.g., field?: number) → ClickHouse Nullable type.
Optional with default (e.g., field?: number & ClickHouseDefault<"18"> or WithDefault<number, "18">) → non-nullable column with default 18.

Optional without default (e.g., field?: number) → ClickHouse Nullable type.
Optional with default (e.g., field?: number & ClickHouseDefault<"18"> or WithDefault<number, "18">) → non-nullable column with default 18.

This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.

Column Comments

Use TSDoc comments (/** ... */) directly above fields:

ColumnComments.ts

import { Key, DateTime, OlapTable } from "@514labs/moose-lib"; interface UserEvent {  /** Unique identifier for the user event */  id: Key<string>;  /** Timestamp when the event occurred */  timestamp: DateTime;  /** Email address of the user (must be valid format) */  email: string;  /** Total transaction amount in USD */  amount: number;  // Fields without TSDoc comments have no COMMENT in DDL  status: string;} export const userEventTable = new OlapTable<UserEvent>("user_events", {  orderByFields: ["id", "timestamp"],});

Use TSDoc comments (/** ... */) directly above fields:

ColumnComments.ts

import { Key, DateTime, OlapTable } from "@514labs/moose-lib"; interface UserEvent {  /** Unique identifier for the user event */  id: Key<string>;  /** Timestamp when the event occurred */  timestamp: DateTime;  /** Email address of the user (must be valid format) */  email: string;  /** Total transaction amount in USD */  amount: number;  // Fields without TSDoc comments have no COMMENT in DDL  status: string;} export const userEventTable = new OlapTable<UserEvent>("user_events", {  orderByFields: ["id", "timestamp"],});

The generated ClickHouse DDL includes column comments:

CREATE TABLE user_events (  `id` String COMMENT 'Unique identifier for the user event',  `timestamp` DateTime COMMENT 'Timestamp when the event occurred',  `email` String COMMENT 'Email address of the user (must be valid format)',  `amount` Float64 COMMENT 'Total transaction amount in USD',  `status` String) ENGINE = MergeTree()ORDER BY (id, timestamp)

Viewing comments in ClickHouse

Query column comments using SELECT name, comment FROM system.columns WHERE table = 'user_events' or DESCRIBE TABLE user_events FORMAT Vertical.

Database Selection

By default, tables are created in the database specified in your moose.config.toml ClickHouse configuration. You can override this on a per-table basis using the database field:

DatabaseOverride.ts

import { OlapTable } from "@514labs/moose-lib"; interface UserData {  id: Key<string>;  name: string;  email: string;} // Table in default database (from moose.config.toml)const defaultTable = new OlapTable<UserData>("users"); // Table in specific database (e.g., "analytics")const analyticsTable = new OlapTable<UserData>("users", {  database: "analytics",  orderByFields: ["id"]});

DatabaseOverride.ts

import { OlapTable } from "@514labs/moose-lib"; interface UserData {  id: Key<string>;  name: string;  email: string;} // Table in default database (from moose.config.toml)const defaultTable = new OlapTable<UserData>("users"); // Table in specific database (e.g., "analytics")const analyticsTable = new OlapTable<UserData>("users", {  database: "analytics",  orderByFields: ["id"]});

Multi-database setup

To use custom databases, configure them in your moose.config.toml:

[clickhouse_config]db_name = "local"additional_databases = ["analytics", "staging"]

The databases in additional_databases will be created automatically when you start your Moose application.

Primary Keys and Sorting

You must configure table indexing using one of these approaches:

Define at least one Key in your table schema
Specify orderByFields in the table config
Use both (all Key fields must come first in the orderByFields array)

PrimaryKeyConfig.ts

import { OlapTable, Key } from '@514labs/moose-lib'; // Approach 1: Using primary key onlyinterface Record1 {  id: Key<string>;  // Primary key field  field1: string;  field2: number;} const table1 = new OlapTable<Record1>("table1");  // id is the primary key

PrimaryKeyConfig.ts

import { OlapTable, Key } from '@514labs/moose-lib'; // Approach 1: Using primary key onlyinterface Record1 {  id: Key<string>;  // Primary key field  field1: string;  field2: number;} const table1 = new OlapTable<Record1>("table1");  // id is the primary key

Order By Fields Only

OrderByFieldsOnly.ts

// Approach 2: Using orderByFields onlyinterface SchemaWithoutPrimaryKey {  field1: string;  field2: number;  field3: Date;} const tableWithOrderByFieldsOnly = new OlapTable<SchemaWithoutPrimaryKey>("table2", {  orderByFields: ["field1", "field2"]  // Specify ordering without primary key});

OrderByFieldsOnly.ts

// Approach 2: Using orderByFields onlyinterface SchemaWithoutPrimaryKey {  field1: string;  field2: number;  field3: Date;} const tableWithOrderByFieldsOnly = new OlapTable<SchemaWithoutPrimaryKey>("table2", {  orderByFields: ["field1", "field2"]  // Specify ordering without primary key});

Order By Expression

Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely.

OrderByExpression.ts

// Use a ClickHouse expression for ORDER BYinterface Events {  userId: string;  createdAt: Date;  eventType: string;} const tableWithOrderByExpression = new OlapTable<Events>("events", {  // Equivalent to orderByFields: ["userId", "createdAt", "eventType"]  orderByExpression: "(userId, createdAt, eventType)",}); // Advanced: functions inside expressionconst tableWithMonthBucketing = new OlapTable<Events>("events_by_month", {  orderByExpression: "(userId, toYYYYMM(createdAt))",}); // No sorting (ClickHouse tuple() means empty ORDER BY)const unsortedTable = new OlapTable<Events>("events_unsorted", {  orderByExpression: "tuple()",});

OrderByExpression.ts

// Use a ClickHouse expression for ORDER BYinterface Events {  userId: string;  createdAt: Date;  eventType: string;} const tableWithOrderByExpression = new OlapTable<Events>("events", {  // Equivalent to orderByFields: ["userId", "createdAt", "eventType"]  orderByExpression: "(userId, createdAt, eventType)",}); // Advanced: functions inside expressionconst tableWithMonthBucketing = new OlapTable<Events>("events_by_month", {  orderByExpression: "(userId, toYYYYMM(createdAt))",}); // No sorting (ClickHouse tuple() means empty ORDER BY)const unsortedTable = new OlapTable<Events>("events_unsorted", {  orderByExpression: "tuple()",});

Primary Key Expression

Use a ClickHouse SQL expression to define the primary key explicitly. This is useful when:

You need functions in the primary key (e.g., cityHash64(id))
The primary key column ordering should differ from the schema definition
The primary key should differ from the ORDER BY

Important: When primaryKeyExpression is specified, any Key<T> annotations on columns are ignored for PRIMARY KEY generation.

PrimaryKeyExpression.ts

import { OlapTable, Key } from "@514labs/moose-lib"; // Example 1: Primary key with functioninterface UserEvents {  userId: string;  eventId: string;  timestamp: Date;} const eventsTable = new OlapTable<UserEvents>("user_events", {  // Use hash function in primary key for better distribution  primaryKeyExpression: "(userId, cityHash64(eventId))",  orderByExpression: "(userId, timestamp)",}); // Example 2: Different ordering in primary key vs columnsinterface Product {  category: string;  brand: string;  productId: string;  name: string;} const productsTable = new OlapTable<Product>("products", {  // Primary key order optimized for uniqueness  primaryKeyExpression: "productId",  // Order by optimized for common queries  orderByFields: ["category", "brand", "productId"],}); // Example 3: Override Key<T> annotationinterface Record {  id: Key<string>;  // This Key<T> annotation will be IGNORED  otherId: string;} const recordTable = new OlapTable<Record>("records", {  // This expression overrides the Key<T> annotation  primaryKeyExpression: "(otherId, id)",  orderByExpression: "(otherId, id)",});

PrimaryKeyExpression.ts

import { OlapTable, Key } from "@514labs/moose-lib"; // Example 1: Primary key with functioninterface UserEvents {  userId: string;  eventId: string;  timestamp: Date;} const eventsTable = new OlapTable<UserEvents>("user_events", {  // Use hash function in primary key for better distribution  primaryKeyExpression: "(userId, cityHash64(eventId))",  orderByExpression: "(userId, timestamp)",}); // Example 2: Different ordering in primary key vs columnsinterface Product {  category: string;  brand: string;  productId: string;  name: string;} const productsTable = new OlapTable<Product>("products", {  // Primary key order optimized for uniqueness  primaryKeyExpression: "productId",  // Order by optimized for common queries  orderByFields: ["category", "brand", "productId"],}); // Example 3: Override Key<T> annotationinterface Record {  id: Key<string>;  // This Key<T> annotation will be IGNORED  otherId: string;} const recordTable = new OlapTable<Record>("records", {  // This expression overrides the Key<T> annotation  primaryKeyExpression: "(otherId, id)",  orderByExpression: "(otherId, id)",});

Rationale for Primary Key Expression:

Function Support: Primary keys can use ClickHouse functions like cityHash64() for better data distribution, which cannot be expressed through column-level annotations.
Flexible Ordering: The ordering of columns in the primary key can be different from the ordering in the schema definition, allowing optimization for both data uniqueness and query patterns.
Separation of Concerns: PRIMARY KEY and ORDER BY serve different purposes in ClickHouse:
- PRIMARY KEY defines uniqueness and deduplication
- ORDER BY defines physical data layout and query optimization
Sometimes these need different column orderings for optimal performance.

Important Constraint:

⚠️ PRIMARY KEY must be a prefix of ORDER BY in ClickHouse. This means ORDER BY must start with all PRIMARY KEY columns in the same order.

Valid:

PRIMARY KEY (userId) with ORDER BY (userId, timestamp) ✅
PRIMARY KEY (userId, cityHash64(eventId)) with ORDER BY (userId, cityHash64(eventId), timestamp) ✅

Invalid:

PRIMARY KEY (userId, eventId) with ORDER BY (userId, timestamp) ❌ (missing eventId)
PRIMARY KEY (userId, eventId) with ORDER BY (eventId, userId) ❌ (wrong order)

Using Both Primary Key and Order By Fields

ComboKeyAndOrderByFields.ts

// Approach 3: Using both (primary key must be first)interface SchemaWithKey {  id: Key<string>;  // Primary key field  field1: string;  field2: number;} const tableWithKeyAndOrderByFields = new OlapTable<SchemaWithKey>("table3", {  orderByFields: ["id", "field1"]  // Primary key must be first});

ComboKeyAndOrderByFields.ts

// Approach 3: Using both (primary key must be first)interface SchemaWithKey {  id: Key<string>;  // Primary key field  field1: string;  field2: number;} const tableWithKeyAndOrderByFields = new OlapTable<SchemaWithKey>("table3", {  orderByFields: ["id", "field1"]  // Primary key must be first});

Using Multiple Primary Keys

MultiKeyTable.ts

interface MultiKeyRecord {  key1: Key<string>;  key2: Key<number>;  field1: string;} const multiKeyTable = new OlapTable<MultiKeyRecord>("multi_key_table", {  orderByFields: ["key1", "key2", "field1"]  // Multiple keys must come first});

MultiKeyTable.ts

interface MultiKeyRecord {  key1: Key<string>;  key2: Key<number>;  field1: string;} const multiKeyTable = new OlapTable<MultiKeyRecord>("multi_key_table", {  orderByFields: ["key1", "key2", "field1"]  // Multiple keys must come first});

Table engines

By default, Moose will create tables with the MergeTree engine. You can use different engines by setting the engine in the table configuration.

TableEngine.ts

import { OlapTable } from "@514labs/moose-lib"; // Default MergeTree engineconst table = new OlapTable<Record>("table", {  orderByFields: ["id"]}); // Use engine configuration for other enginesconst dedupTable = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"],  ver: "version",  // Optional: keeps row with highest version  isDeleted: "deleted"  // Optional: soft delete when deleted=1});

TableEngine.ts

import { OlapTable } from "@514labs/moose-lib"; // Default MergeTree engineconst table = new OlapTable<Record>("table", {  orderByFields: ["id"]}); // Use engine configuration for other enginesconst dedupTable = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"],  ver: "version",  // Optional: keeps row with highest version  isDeleted: "deleted"  // Optional: soft delete when deleted=1});

Deduplication (`ReplacingMergeTree`)

Use the ReplacingMergeTree engine to keep only the latest record for your designated sort key:

DeduplicatedTable.ts

// Basic deduplicationconst table = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"]}); // With version column (keeps record with highest version)const versionedTable = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"],  ver: "updated_at"  // Column that determines which version to keep}); // With soft deletes (requires ver parameter)const softDeleteTable = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"],  ver: "updated_at",  isDeleted: "deleted"  // UInt8 column: 1 marks row for deletion});

DeduplicatedTable.ts

// Basic deduplicationconst table = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"]}); // With version column (keeps record with highest version)const versionedTable = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"],  ver: "updated_at"  // Column that determines which version to keep}); // With soft deletes (requires ver parameter)const softDeleteTable = new OlapTable<Record>("table", {  engine: ClickHouseEngines.ReplacingMergeTree,  orderByFields: ["id"],  ver: "updated_at",  isDeleted: "deleted"  // UInt8 column: 1 marks row for deletion});

Deduplication Caveats

ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately.

Version Column (ver): When specified, ClickHouse keeps the row with the maximum version value for each unique sort key.

Soft Deletes (is_deleted): When specified along with ver, rows where this column equals 1 are deleted during merges. This column must be UInt8 type.

For more details, see the ClickHouse documentation.

Streaming from S3 (`S3Queue`)

Use the S3Queue engine to automatically ingest data from S3 buckets as files are added:

S3Queue with Materialized Views

S3StreamingTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // Use direct configuration (S3Queue does not support orderByFields)export const s3Events = new OlapTable<S3Event>("s3_events", {  engine: ClickHouseEngines.S3Queue,  s3Path: "s3://my-bucket/data/*.json",  format: "JSONEachRow",  settings: {    mode: "unordered",    keeper_path: "/clickhouse/s3queue/events"  }});

S3Queue ORDER BY not supported

S3Queue is a streaming engine and does not support orderByFields or ORDER BY clauses. Configure only engine-specific parameters like s3Path, format, and settings.

S3StreamingTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // Use direct configuration (S3Queue does not support orderByFields)export const s3Events = new OlapTable<S3Event>("s3_events", {  engine: ClickHouseEngines.S3Queue,  s3Path: "s3://my-bucket/data/*.json",  format: "JSONEachRow",  settings: {    mode: "unordered",    keeper_path: "/clickhouse/s3queue/events"  }});

S3Queue ORDER BY not supported

S3Queue is a streaming engine and does not support orderByFields or ORDER BY clauses. Configure only engine-specific parameters like s3Path, format, and settings.

⚠️ NEVER hardcode AWS credentials!

Security Risk: Hardcoding credentials in your code embeds them in Docker images and deployment artifacts, creating serious security vulnerabilities.

Solution: Use mooseRuntimeEnv for runtime credential resolution:

SecureS3Streaming.ts

import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // ✅ RECOMMENDED: Runtime environment variable resolutionexport const secureS3Events = new OlapTable<S3Event>("s3_events", {  engine: ClickHouseEngines.S3Queue,  s3Path: "s3://my-bucket/data/*.json",  format: "JSONEachRow",  awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"),  awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),  settings: {    mode: "unordered",    keeper_path: "/clickhouse/s3queue/events"  }});

SecureS3Streaming.ts

import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // ✅ RECOMMENDED: Runtime environment variable resolutionexport const secureS3Events = new OlapTable<S3Event>("s3_events", {  engine: ClickHouseEngines.S3Queue,  s3Path: "s3://my-bucket/data/*.json",  format: "JSONEachRow",  awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"),  awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),  settings: {    mode: "unordered",    keeper_path: "/clickhouse/s3queue/events"  }});

Then set environment variables:

export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="your-secret-key"
moose prod up

Benefits:

Credentials never embedded in Docker images
Supports credential rotation (changing passwords triggers table recreation)
Different credentials per environment (dev/staging/prod)
Clear error messages if environment variables are missing

S3Queue Requirements

S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas.

Direct S3 Access (`S3`)

Use the S3 engine for direct read/write access to S3 storage without streaming semantics:

S3Table.ts

import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // S3 table with credentials (recommended with mooseRuntimeEnv)export const s3Data = new OlapTable<DataRecord>("s3_data", {  engine: ClickHouseEngines.S3,  path: "s3://my-bucket/data/file.json",  format: "JSONEachRow",  awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"),  awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),  compression: "gzip"}); // Public S3 bucket (no authentication)export const publicS3 = new OlapTable<DataRecord>("public_s3", {  engine: ClickHouseEngines.S3,  path: "s3://public-bucket/data/*.parquet",  format: "Parquet",  noSign: true  // Use NOSIGN for public buckets});

S3Table.ts

import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // S3 table with credentials (recommended with mooseRuntimeEnv)export const s3Data = new OlapTable<DataRecord>("s3_data", {  engine: ClickHouseEngines.S3,  path: "s3://my-bucket/data/file.json",  format: "JSONEachRow",  awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"),  awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),  compression: "gzip"}); // Public S3 bucket (no authentication)export const publicS3 = new OlapTable<DataRecord>("public_s3", {  engine: ClickHouseEngines.S3,  path: "s3://public-bucket/data/*.parquet",  format: "Parquet",  noSign: true  // Use NOSIGN for public buckets});

S3 vs S3Queue

S3: Direct read/write access to S3 files. Use for batch processing or querying static data.
S3Queue: Streaming engine that automatically processes new files as they arrive. Use for continuous data ingestion.

Both engines support the same credential management and format options.

IcebergS3

The IcebergS3 engine provides read-only access to Iceberg tables stored in S3:

IcebergTable.ts

import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // Iceberg table with AWS credentials (recommended with mooseRuntimeEnv)export const icebergEvents = new OlapTable<Event>("iceberg_events", {  engine: ClickHouseEngines.IcebergS3,  path: "s3://my-bucket/warehouse/db/table/",  format: "Parquet", // or "ORC"  awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"),  awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),});

IcebergTable.ts

import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // Iceberg table with AWS credentials (recommended with mooseRuntimeEnv)export const icebergEvents = new OlapTable<Event>("iceberg_events", {  engine: ClickHouseEngines.IcebergS3,  path: "s3://my-bucket/warehouse/db/table/",  format: "Parquet", // or "ORC"  awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"),  awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),});

IcebergS3 is read-only

IcebergS3 tables are read-only and provide access to the latest state of your Iceberg table
orderByFields, orderByExpression, partitionBy, and sampleByExpression are not supported
The table automatically reflects the current state of the Iceberg table in S3
Supported formats: Parquet and ORC only

In-Memory Buffer (`Buffer`)

The Buffer engine provides an in-memory buffer that flushes data to a destination table based on time, row count, or size thresholds:

BufferTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // First create the destination tableexport const destinationTable = new OlapTable<Record>("destination", {  engine: ClickHouseEngines.MergeTree,  orderByFields: ["id", "timestamp"]}); // Then create buffer that writes to itexport const bufferTable = new OlapTable<Record>("buffer", {  engine: ClickHouseEngines.Buffer,  targetDatabase: "local",  targetTable: "destination",  numLayers: 16,  minTime: 10,        // Min 10 seconds before flush  maxTime: 100,       // Max 100 seconds before flush  minRows: 10000,     // Min 10k rows before flush  maxRows: 1000000,   // Max 1M rows before flush  minBytes: 10485760, // Min 10MB before flush  maxBytes: 104857600 // Max 100MB before flush});

BufferTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // First create the destination tableexport const destinationTable = new OlapTable<Record>("destination", {  engine: ClickHouseEngines.MergeTree,  orderByFields: ["id", "timestamp"]}); // Then create buffer that writes to itexport const bufferTable = new OlapTable<Record>("buffer", {  engine: ClickHouseEngines.Buffer,  targetDatabase: "local",  targetTable: "destination",  numLayers: 16,  minTime: 10,        // Min 10 seconds before flush  maxTime: 100,       // Max 100 seconds before flush  minRows: 10000,     // Min 10k rows before flush  maxRows: 1000000,   // Max 1M rows before flush  minBytes: 10485760, // Min 10MB before flush  maxBytes: 104857600 // Max 100MB before flush});

Buffer Engine Considerations

Data in buffer is lost if server crashes before flush
Not suitable for critical data that must be durable
Best for high-throughput scenarios where minor data loss is acceptable
Buffer and destination table must have identical schemas
Cannot use orderByFields, partitionBy, or sampleByExpression on buffer tables

For more details, see the ClickHouse Buffer documentation.

Distributed Tables (`Distributed`)

The Distributed engine creates a distributed table across a ClickHouse cluster for horizontal scaling:

DistributedTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // Distributed table across clusterexport const distributedTable = new OlapTable<Record>("distributed_data", {  engine: ClickHouseEngines.Distributed,  cluster: "my_cluster",  targetDatabase: "default",  targetTable: "local_table",  shardingKey: "cityHash64(id)"  // Optional: how to distribute data});

DistributedTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // Distributed table across clusterexport const distributedTable = new OlapTable<Record>("distributed_data", {  engine: ClickHouseEngines.Distributed,  cluster: "my_cluster",  targetDatabase: "default",  targetTable: "local_table",  shardingKey: "cityHash64(id)"  // Optional: how to distribute data});

Distributed Table Requirements

Requires a configured ClickHouse cluster with remote_servers configuration
The local table must exist on all cluster nodes
Distributed tables are virtual - data is stored in local tables
Cannot use orderByFields, partitionBy, or sampleByExpression on distributed tables
The cluster name must match a cluster defined in your ClickHouse configuration

For more details, see the ClickHouse Distributed documentation.

Replicated Engines

Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants:

ReplicatedMergeTree - Replicated version of MergeTree
ReplicatedReplacingMergeTree - Replicated with deduplication
ReplicatedAggregatingMergeTree - Replicated with aggregation
ReplicatedSummingMergeTree - Replicated with summation

ReplicatedEngines.ts

import { OlapTable, ClickHouseEngines } from "@514labs/moose-lib"; // Basic replicated table with explicit pathsconst replicatedTable = new OlapTable<Record>("records", {  engine: ClickHouseEngines.ReplicatedMergeTree,  keeperPath: "/clickhouse/tables/{database}/{shard}/records",  replicaName: "{replica}",  orderByFields: ["id"]}); // Replicated with deduplicationconst replicatedDedup = new OlapTable<Record>("dedup_records", {  engine: ClickHouseEngines.ReplicatedReplacingMergeTree,  keeperPath: "/clickhouse/tables/{database}/{shard}/dedup_records",  replicaName: "{replica}",  ver: "updated_at",  isDeleted: "deleted",  orderByFields: ["id"]}); // For ClickHouse Cloud or Boreal (no parameters needed)const cloudReplicated = new OlapTable<Record>("cloud_records", {  engine: ClickHouseEngines.ReplicatedMergeTree,  orderByFields: ["id"]});

ReplicatedEngines.ts

import { OlapTable, ClickHouseEngines } from "@514labs/moose-lib"; // Basic replicated table with explicit pathsconst replicatedTable = new OlapTable<Record>("records", {  engine: ClickHouseEngines.ReplicatedMergeTree,  keeperPath: "/clickhouse/tables/{database}/{shard}/records",  replicaName: "{replica}",  orderByFields: ["id"]}); // Replicated with deduplicationconst replicatedDedup = new OlapTable<Record>("dedup_records", {  engine: ClickHouseEngines.ReplicatedReplacingMergeTree,  keeperPath: "/clickhouse/tables/{database}/{shard}/dedup_records",  replicaName: "{replica}",  ver: "updated_at",  isDeleted: "deleted",  orderByFields: ["id"]}); // For ClickHouse Cloud or Boreal (no parameters needed)const cloudReplicated = new OlapTable<Record>("cloud_records", {  engine: ClickHouseEngines.ReplicatedMergeTree,  orderByFields: ["id"]});

Configuring Replication

Replicated engines support three configuration approaches. Choose the one that fits your deployment:

Default

Omit all replication parameters. Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments:

DefaultReplication.ts

const table = new OlapTable<Record>("my_table", {  engine: ClickHouseEngines.ReplicatedMergeTree,  orderByFields: ["id"]  // No keeper_path, replica_name, or cluster needed});

DefaultReplication.ts

const table = new OlapTable<Record>("my_table", {  engine: ClickHouseEngines.ReplicatedMergeTree,  orderByFields: ["id"]  // No keeper_path, replica_name, or cluster needed});

Moose auto-injects: /clickhouse/tables/{database}/{shard}/{table_name} and {replica} in local development. ClickHouse Cloud uses its own patterns automatically.

Cluster

For multi-node deployments, specify a cluster name to use ON CLUSTER DDL operations:

ClusterReplication.ts

const table = new OlapTable<Record>("my_table", {  engine: ClickHouseEngines.ReplicatedMergeTree,  orderByFields: ["id"],  cluster: "default"  // References cluster from moose.config.toml});

ClusterReplication.ts

const table = new OlapTable<Record>("my_table", {  engine: ClickHouseEngines.ReplicatedMergeTree,  orderByFields: ["id"],  cluster: "default"  // References cluster from moose.config.toml});

Configuration in moose.config.toml:

[[clickhouse_config.clusters]]name = "default"

Use when:

Running multi-node self-managed ClickHouse with cluster configuration
Need ON CLUSTER DDL for distributed operations

Replication Paths

For custom replication topology, specify both keeper_path and replica_name:

ExplicitReplication.ts

const table = new OlapTable<Record>("my_table", {  engine: ClickHouseEngines.ReplicatedMergeTree,  keeperPath: "/clickhouse/tables/{database}/{shard}/my_table",  replicaName: "{replica}",  orderByFields: ["id"]});

ExplicitReplication.ts

const table = new OlapTable<Record>("my_table", {  engine: ClickHouseEngines.ReplicatedMergeTree,  keeperPath: "/clickhouse/tables/{database}/{shard}/my_table",  replicaName: "{replica}",  orderByFields: ["id"]});

Use when:

Need custom replication paths for advanced configurations
Both parameters must be provided together

Warning:

Cannot mix approaches: Specifying both cluster and explicit keeper_path/replica_name will cause an error. Choose one approach.

Cluster is a deployment directive: Changing cluster won't recreate your table—it only affects future DDL operations.

For more details, see the ClickHouse documentation on data replication.

Kafka

Use the Kafka engine to consume data directly from Kafka compatible topics.

For more details on the Kafka engine, see the ClickHouse documentation on Kafka integration.

Kafka with Materialized Views

Kafka tables are streaming interfaces that don't persist data. Use a MaterializedView to continuously move data to a table.

KafkaTable.ts

import { OlapTable, ClickHouseEngines, MaterializedView, sql } from '@514labs/moose-lib'; interface KafkaEvent {  eventId: string;  userId: string;  timestamp: number; // Unix seconds for JSONEachRow} // Kafka table - reads from topic, doesn't persistexport const kafkaSource = new OlapTable<KafkaEvent>("kafka_events", {  engine: ClickHouseEngines.Kafka,  brokerList: "redpanda:9092",  topicList: "events",  groupName: "my_consumer_group",  format: "JSONEachRow",  settings: {    kafka_num_consumers: "1",  }}); // MaterializedView moves data to MergeTree for persistenceconst cols = kafkaSource.columns;export const eventsMV = new MaterializedView<KafkaEvent>({  tableName: "events_dest",  materializedViewName: "events_mv",  orderByFields: ["eventId"],  selectStatement: sql`SELECT ${cols.eventId}, ${cols.userId}, ${cols.timestamp} FROM ${kafkaSource}`,  selectTables: [kafkaSource],});

KafkaTable.ts

import { OlapTable, ClickHouseEngines, MaterializedView, sql } from '@514labs/moose-lib'; interface KafkaEvent {  eventId: string;  userId: string;  timestamp: number; // Unix seconds for JSONEachRow} // Kafka table - reads from topic, doesn't persistexport const kafkaSource = new OlapTable<KafkaEvent>("kafka_events", {  engine: ClickHouseEngines.Kafka,  brokerList: "redpanda:9092",  topicList: "events",  groupName: "my_consumer_group",  format: "JSONEachRow",  settings: {    kafka_num_consumers: "1",  }}); // MaterializedView moves data to MergeTree for persistenceconst cols = kafkaSource.columns;export const eventsMV = new MaterializedView<KafkaEvent>({  tableName: "events_dest",  materializedViewName: "events_mv",  orderByFields: ["eventId"],  selectStatement: sql`SELECT ${cols.eventId}, ${cols.userId}, ${cols.timestamp} FROM ${kafkaSource}`,  selectTables: [kafkaSource],});

Kafka Engine Limitations

No ORDER BY: Kafka is a streaming engine and doesn't support orderByFields
No ALTER TABLE: Column or settings changes require DROP and CREATE (Moose handles this automatically)
No direct SELECT: Query data from the MaterializedView destination table, not the Kafka table

Externally Managed Tables

ExternallyManagedTable.ts

import { OlapTable, LifeCycle } from "@514labs/moose-lib"; // Table managed by external systemconst externalTable = new OlapTable<UserData>("external_users", {  orderByFields: ["id", "timestamp"],  lifeCycle: LifeCycle.EXTERNALLY_MANAGED  // Moose won't create or modify this table});

ExternallyManagedTable.ts

import { OlapTable, LifeCycle } from "@514labs/moose-lib"; // Table managed by external systemconst externalTable = new OlapTable<UserData>("external_users", {  orderByFields: ["id", "timestamp"],  lifeCycle: LifeCycle.EXTERNALLY_MANAGED  // Moose won't create or modify this table});

Learn More About LifeCycle Management

Learn more about the different lifecycle options and how to use them in the LifeCycle Management documentation.

Invalid Configurations

InvalidConfig.ts

// Error: No primary key or orderByFieldsinterface BadRecord1 {  field1: string;  field2: number;}const badTable1 = new OlapTable<BadRecord1>("bad_table1"); // Error: Primary key not first in orderByFieldsinterface BadRecord2 {  id: Key<string>;  field1: string;}const badTable2 = new OlapTable<BadRecord2>("bad_table2", {  orderByFields: ["field1", "id"]  // Wrong order - primary key must be first}); // Error: Nullable field in orderByFieldsinterface BadRecord3 {  id: Key<string>;  field1: string;  field2?: number;}const badTable3 = new OlapTable<BadRecord3>("bad_table3", {  orderByFields: ["id", "field2"]  // Can't have nullable field in orderByFields});

InvalidConfig.ts

// Error: No primary key or orderByFieldsinterface BadRecord1 {  field1: string;  field2: number;}const badTable1 = new OlapTable<BadRecord1>("bad_table1"); // Error: Primary key not first in orderByFieldsinterface BadRecord2 {  id: Key<string>;  field1: string;}const badTable2 = new OlapTable<BadRecord2>("bad_table2", {  orderByFields: ["field1", "id"]  // Wrong order - primary key must be first}); // Error: Nullable field in orderByFieldsinterface BadRecord3 {  id: Key<string>;  field1: string;  field2?: number;}const badTable3 = new OlapTable<BadRecord3>("bad_table3", {  orderByFields: ["id", "field2"]  // Can't have nullable field in orderByFields});

Development Workflow

Local Development with Hot Reloading

One of the powerful features of Moose is its integration with the local development server:

Start your local development server with moose dev
When you define or modify an OlapTable in your code and save the file:
- The changes are automatically detected
- The TypeScript compiler plugin processes your schema definitions
- The infrastructure is updated in real-time to match your code changes
- Your tables are immediately available for testing

For example, if you add a new field to your schema:

HotReloading.ts

// Beforeinterface BasicSchema {  id: Key<string>;  name: string;} // After adding a fieldinterface BasicSchema {  id: Key<string>;  name: string;  createdAt: Date;  // New field}

HotReloading.ts

// Beforeinterface BasicSchema {  id: Key<string>;  name: string;} // After adding a fieldinterface BasicSchema {  id: Key<string>;  name: string;  createdAt: Date;  // New field}

The Moose framework will:

Detect the change when you save the file
Update the table schema in the local ClickHouse instance
Make the new field immediately available for use

Verifying Your Tables

You can verify your tables were created correctly using:

# List all tables in your local environment
moose ls

Connecting to your local ClickHouse instance

You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your moose.config.toml file:

moose.config.toml

[clickhouse_config]db_name = "local"user = "panda"password = "pandapass"use_ssl = falsehost = "localhost"host_port = 18123native_port = 9000