Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing.
You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more.
interface MyFirstTable { id: Key<string>; name: string; age: number;} // Create a table named "first_table"export const myTable = new OlapTable<MyFirstTable>("first_table");Create a table directly for custom data flows or when you need fine-grained control:
import { OlapTable, Key } from "@514labs/moose-lib"; // Define your schemainterface ExampleSchema { id: Key<string>; dateField: Date; numericField: number; booleanField: boolean; floatField: number; integerField: number & tags.Type<"int64">; // Moose supports native tagged types so you can use Integers in typescript} // Create a standalone table named "example_table"export const exampleTable = new OlapTable<ExampleSchema>("example_table", { orderByFields: ["id", "dateField"] // Optional when using a primary key}); // For deduplication, use the ReplacingMergeTree factoryexport const dedupTable = OlapTable.withReplacingMergeTree<ExampleSchema>("example_table", { orderByFields: ["id", "dateField"], ver: "updatedAt", // Optional: version column (keeps highest value) isDeleted: "deleted" // Optional: soft delete flag (requires ver)}); // Now you can:// - Write to this table from streams// - Query it directly// - Use it as a source for materialized viewsFor end-to-end data flows, create tables as part of an ingestion pipeline:
import { IngestPipeline, Key } from "@514labs/moose-lib"; // Define your schemainterface UserEvent { id: Key<string>; userId: string; timestamp: Date; eventType: string;} // Create a complete ingestion pipeline with a tableconst eventsPipeline = new IngestPipeline<UserEvent>("user_events", { ingestApi: true, // Creates a REST API endpoint at POST localhost:4000/ingest/user_events stream: true, // Creates Kafka/Redpanda topic table: { // Creates and configures the table named "user_events" orderByFields: ["id", "timestamp"] }}); // Access the table component when neededconst eventsTable = eventsPipeline.table;import { Key, Decimal, ClickHouseDecimal, LowCardinality, ClickHouseNamedTuple, tags } from "@514labs/moose-lib"; export interface ClickHouseOptimizedExample { id: Key<string>; stringField: string; numberField: number; decimalField: Decimal<10, 2>; // Precise decimal storage // Alternative: decimalField: string & ClickHouseDecimal<10, 2>; // Verbose syntax still works lowCardinalityField: string & LowCardinality; // Faster queries for enum-like data nestedObject: { innerString: string; innerNumber: number; }; namedTupleField: { name: string; value: number; } & ClickHouseNamedTuple; // Optimized nested storage numberArray: number[]; mapField: Record<string, number>; literalField: "optionA" | "optionB"; optionalField?: string; // Nullable field dateField: Date;}Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL.
import { OlapTable, Key, ClickHouseDefault, Decimal, ClickHouseDecimal } from "@514labs/moose-lib"; interface Event { id: Key<string>; // Static defaults (ClickHouse expression as a string literal) status: string & ClickHouseDefault<"'pending'">; // DEFAULT 'pending' retries: number & ClickHouseDefault<"0">; // DEFAULT 0 // Server-side timestamps createdAt: Date & ClickHouseDefault<"now()">; // DEFAULT now() // Decimal with default amount: Decimal<10, 2> & ClickHouseDefault<"0">; // Alternative: amount: (string & ClickHouseDecimal<10, 2> & ClickHouseDefault<"0">); // Verbose syntax} export const events = new OlapTable<Event>("events", { orderByFields: ["id", "createdAt"],});The value passed into the ClickHouseDefault<""> tag can either be a string literal or a stringified ClickHouse SQL expression. If you run into typing issues specifically on Date fields with ClickHouseDefault, use WithDefault<Date, "now()"> as a fallback workaround.
If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.
field?: number) → ClickHouse Nullable type.field?: number & ClickHouseDefault<"18"> or WithDefault<number, "18">) → non-nullable column with default 18.This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.
By default, tables are created in the database specified in your moose.config.toml ClickHouse configuration. You can override this on a per-table basis using the database field:
import { OlapTable } from "@514labs/moose-lib"; interface UserData { id: Key<string>; name: string; email: string;} // Table in default database (from moose.config.toml)const defaultTable = new OlapTable<UserData>("users"); // Table in specific database (e.g., "analytics")const analyticsTable = new OlapTable<UserData>("users", { database: "analytics", orderByFields: ["id"]});To use custom databases, configure them in your moose.config.toml:
[clickhouse_config]db_name = "local"additional_databases = ["analytics", "staging"]The databases in additional_databases will be created automatically when you start your Moose application.
You must configure table indexing using one of these approaches:
Key in your table schemaorderByFields in the table configKey fields must come first in the orderByFields array)import { OlapTable, Key } from '@514labs/moose-lib'; // Approach 1: Using primary key onlyinterface Record1 { id: Key<string>; // Primary key field field1: string; field2: number;} const table1 = new OlapTable<Record1>("table1"); // id is the primary key// Approach 2: Using orderByFields onlyinterface SchemaWithoutPrimaryKey { field1: string; field2: number; field3: Date;} const tableWithOrderByFieldsOnly = new OlapTable<SchemaWithoutPrimaryKey>("table2", { orderByFields: ["field1", "field2"] // Specify ordering without primary key});Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely.
// Use a ClickHouse expression for ORDER BYinterface Events { userId: string; createdAt: Date; eventType: string;} const tableWithOrderByExpression = new OlapTable<Events>("events", { // Equivalent to orderByFields: ["userId", "createdAt", "eventType"] orderByExpression: "(userId, createdAt, eventType)",}); // Advanced: functions inside expressionconst tableWithMonthBucketing = new OlapTable<Events>("events_by_month", { orderByExpression: "(userId, toYYYYMM(createdAt))",}); // No sorting (ClickHouse tuple() means empty ORDER BY)const unsortedTable = new OlapTable<Events>("events_unsorted", { orderByExpression: "tuple()",});Use a ClickHouse SQL expression to define the primary key explicitly. This is useful when:
cityHash64(id))Important: When primaryKeyExpression is specified, any Key<T> annotations on columns are ignored for PRIMARY KEY generation.
import { OlapTable, Key } from "@514labs/moose-lib"; // Example 1: Primary key with functioninterface UserEvents { userId: string; eventId: string; timestamp: Date;} const eventsTable = new OlapTable<UserEvents>("user_events", { // Use hash function in primary key for better distribution primaryKeyExpression: "(userId, cityHash64(eventId))", orderByExpression: "(userId, timestamp)",}); // Example 2: Different ordering in primary key vs columnsinterface Product { category: string; brand: string; productId: string; name: string;} const productsTable = new OlapTable<Product>("products", { // Primary key order optimized for uniqueness primaryKeyExpression: "productId", // Order by optimized for common queries orderByFields: ["category", "brand", "productId"],}); // Example 3: Override Key<T> annotationinterface Record { id: Key<string>; // This Key<T> annotation will be IGNORED otherId: string;} const recordTable = new OlapTable<Record>("records", { // This expression overrides the Key<T> annotation primaryKeyExpression: "(otherId, id)", orderByExpression: "(otherId, id)",});Rationale for Primary Key Expression:
Function Support: Primary keys can use ClickHouse functions like cityHash64() for better data distribution, which cannot be expressed through column-level annotations.
Flexible Ordering: The ordering of columns in the primary key can be different from the ordering in the schema definition, allowing optimization for both data uniqueness and query patterns.
Separation of Concerns: PRIMARY KEY and ORDER BY serve different purposes in ClickHouse:
Sometimes these need different column orderings for optimal performance.
Important Constraint:
⚠️ PRIMARY KEY must be a prefix of ORDER BY in ClickHouse. This means ORDER BY must start with all PRIMARY KEY columns in the same order.
Valid:
(userId) with ORDER BY (userId, timestamp) ✅(userId, cityHash64(eventId)) with ORDER BY (userId, cityHash64(eventId), timestamp) ✅Invalid:
(userId, eventId) with ORDER BY (userId, timestamp) ❌ (missing eventId)(userId, eventId) with ORDER BY (eventId, userId) ❌ (wrong order)// Approach 3: Using both (primary key must be first)interface SchemaWithKey { id: Key<string>; // Primary key field field1: string; field2: number;} const tableWithKeyAndOrderByFields = new OlapTable<SchemaWithKey>("table3", { orderByFields: ["id", "field1"] // Primary key must be first});interface MultiKeyRecord { key1: Key<string>; key2: Key<number>; field1: string;} const multiKeyTable = new OlapTable<MultiKeyRecord>("multi_key_table", { orderByFields: ["key1", "key2", "field1"] // Multiple keys must come first});By default, Moose will create tables with the MergeTree engine. You can use different engines by setting the engine in the table configuration.
import { OlapTable } from "@514labs/moose-lib"; // Default MergeTree engineconst table = new OlapTable<Record>("table", { orderByFields: ["id"]}); // Use engine configuration for other enginesconst dedupTable = new OlapTable<Record>("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "version", // Optional: keeps row with highest version isDeleted: "deleted" // Optional: soft delete when deleted=1});Use the ReplacingMergeTree engine to keep only the latest record for your designated sort key:
// Basic deduplicationconst table = new OlapTable<Record>("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"]}); // With version column (keeps record with highest version)const versionedTable = new OlapTable<Record>("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at" // Column that determines which version to keep}); // With soft deletes (requires ver parameter)const softDeleteTable = new OlapTable<Record>("table", { engine: ClickHouseEngines.ReplacingMergeTree, orderByFields: ["id"], ver: "updated_at", isDeleted: "deleted" // UInt8 column: 1 marks row for deletion});ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately.
Version Column (ver): When specified, ClickHouse keeps the row with the maximum version value for each unique sort key.
Soft Deletes (is_deleted): When specified along with ver, rows where this column equals 1 are deleted during merges. This column must be UInt8 type.
For more details, see the ClickHouse documentation.
Use the S3Queue engine to automatically ingest data from S3 buckets as files are added:
S3Queue tables only process new files added to S3 after table creation. When used as a source for materialized views, no backfill occurs - the MV will only start populating as new files arrive. See the Materialized Views documentation for more details.
import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // Use direct configuration (S3Queue does not support orderByFields)export const s3Events = new OlapTable<S3Event>("s3_events", { engine: ClickHouseEngines.S3Queue, s3Path: "s3://my-bucket/data/*.json", format: "JSONEachRow", settings: { mode: "unordered", keeper_path: "/clickhouse/s3queue/events" }});S3Queue is a streaming engine and does not support orderByFields or ORDER BY clauses. Configure only engine-specific parameters like s3Path, format, and settings.
Security Risk: Hardcoding credentials in your code embeds them in Docker images and deployment artifacts, creating serious security vulnerabilities.
Solution: Use mooseRuntimeEnv for runtime credential resolution:
import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // ✅ RECOMMENDED: Runtime environment variable resolutionexport const secureS3Events = new OlapTable<S3Event>("s3_events", { engine: ClickHouseEngines.S3Queue, s3Path: "s3://my-bucket/data/*.json", format: "JSONEachRow", awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"), awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"), settings: { mode: "unordered", keeper_path: "/clickhouse/s3queue/events" }});Then set environment variables:
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="your-secret-key"
moose prod upBenefits:
S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas.
Use the S3 engine for direct read/write access to S3 storage without streaming semantics:
import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // S3 table with credentials (recommended with mooseRuntimeEnv)export const s3Data = new OlapTable<DataRecord>("s3_data", { engine: ClickHouseEngines.S3, path: "s3://my-bucket/data/file.json", format: "JSONEachRow", awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"), awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"), compression: "gzip"}); // Public S3 bucket (no authentication)export const publicS3 = new OlapTable<DataRecord>("public_s3", { engine: ClickHouseEngines.S3, path: "s3://public-bucket/data/*.parquet", format: "Parquet", noSign: true // Use NOSIGN for public buckets});Both engines support the same credential management and format options.
The IcebergS3 engine provides read-only access to Iceberg tables stored in S3:
import { OlapTable, ClickHouseEngines, mooseRuntimeEnv } from '@514labs/moose-lib'; // Iceberg table with AWS credentials (recommended with mooseRuntimeEnv)export const icebergEvents = new OlapTable<Event>("iceberg_events", { engine: ClickHouseEngines.IcebergS3, path: "s3://my-bucket/warehouse/db/table/", format: "Parquet", // or "ORC" awsAccessKeyId: mooseRuntimeEnv.get("AWS_ACCESS_KEY_ID"), awsSecretAccessKey: mooseRuntimeEnv.get("AWS_SECRET_ACCESS_KEY"),});orderByFields, orderByExpression, partitionBy, and sampleByExpression are not supportedThe Buffer engine provides an in-memory buffer that flushes data to a destination table based on time, row count, or size thresholds:
import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // First create the destination tableexport const destinationTable = new OlapTable<Record>("destination", { engine: ClickHouseEngines.MergeTree, orderByFields: ["id", "timestamp"]}); // Then create buffer that writes to itexport const bufferTable = new OlapTable<Record>("buffer", { engine: ClickHouseEngines.Buffer, targetDatabase: "local", targetTable: "destination", numLayers: 16, minTime: 10, // Min 10 seconds before flush maxTime: 100, // Max 100 seconds before flush minRows: 10000, // Min 10k rows before flush maxRows: 1000000, // Max 1M rows before flush minBytes: 10485760, // Min 10MB before flush maxBytes: 104857600 // Max 100MB before flush});orderByFields, partitionBy, or sampleByExpression on buffer tablesFor more details, see the ClickHouse Buffer documentation.
The Distributed engine creates a distributed table across a ClickHouse cluster for horizontal scaling:
import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib'; // Distributed table across clusterexport const distributedTable = new OlapTable<Record>("distributed_data", { engine: ClickHouseEngines.Distributed, cluster: "my_cluster", targetDatabase: "default", targetTable: "local_table", shardingKey: "cityHash64(id)" // Optional: how to distribute data});orderByFields, partitionBy, or sampleByExpression on distributed tablescluster name must match a cluster defined in your ClickHouse configurationFor more details, see the ClickHouse Distributed documentation.
Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants:
ReplicatedMergeTree - Replicated version of MergeTreeReplicatedReplacingMergeTree - Replicated with deduplicationReplicatedAggregatingMergeTree - Replicated with aggregationReplicatedSummingMergeTree - Replicated with summationimport { OlapTable, ClickHouseEngines } from "@514labs/moose-lib"; // Basic replicated table with explicit pathsconst replicatedTable = new OlapTable<Record>("records", { engine: ClickHouseEngines.ReplicatedMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/records", replicaName: "{replica}", orderByFields: ["id"]}); // Replicated with deduplicationconst replicatedDedup = new OlapTable<Record>("dedup_records", { engine: ClickHouseEngines.ReplicatedReplacingMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/dedup_records", replicaName: "{replica}", ver: "updated_at", isDeleted: "deleted", orderByFields: ["id"]}); // For ClickHouse Cloud or Boreal (no parameters needed)const cloudReplicated = new OlapTable<Record>("cloud_records", { engine: ClickHouseEngines.ReplicatedMergeTree, orderByFields: ["id"]});Replicated engines support three configuration approaches. Choose the one that fits your deployment:
Omit all replication parameters. Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments:
const table = new OlapTable<Record>("my_table", { engine: ClickHouseEngines.ReplicatedMergeTree, orderByFields: ["id"] // No keeper_path, replica_name, or cluster needed});Moose auto-injects: /clickhouse/tables/{database}/{shard}/{table_name} and {replica} in local development. ClickHouse Cloud uses its own patterns automatically.
For multi-node deployments, specify a cluster name to use ON CLUSTER DDL operations:
const table = new OlapTable<Record>("my_table", { engine: ClickHouseEngines.ReplicatedMergeTree, orderByFields: ["id"], cluster: "default" // References cluster from moose.config.toml});Configuration in moose.config.toml:
[[clickhouse_config.clusters]]name = "default"Use when:
ON CLUSTER DDL for distributed operationsFor custom replication topology, specify both keeper_path and replica_name:
const table = new OlapTable<Record>("my_table", { engine: ClickHouseEngines.ReplicatedMergeTree, keeperPath: "/clickhouse/tables/{database}/{shard}/my_table", replicaName: "{replica}", orderByFields: ["id"]});Use when:
Cannot mix approaches: Specifying both cluster and explicit keeper_path/replica_name will cause an error. Choose one approach.
Cluster is a deployment directive: Changing cluster won't recreate your table—it only affects future DDL operations.
For more details, see the ClickHouse documentation on data replication.
Use the Kafka engine to consume data directly from Kafka compatible topics.
For more details on the Kafka engine, see the ClickHouse documentation on Kafka integration.
Kafka tables are streaming interfaces that don't persist data. Use a MaterializedView to continuously move data to a table.
import { OlapTable, ClickHouseEngines, MaterializedView, sql } from '@514labs/moose-lib'; interface KafkaEvent { eventId: string; userId: string; timestamp: number; // Unix seconds for JSONEachRow} // Kafka table - reads from topic, doesn't persistexport const kafkaSource = new OlapTable<KafkaEvent>("kafka_events", { engine: ClickHouseEngines.Kafka, brokerList: "redpanda:9092", topicList: "events", groupName: "my_consumer_group", format: "JSONEachRow", settings: { kafka_num_consumers: "1", }}); // MaterializedView moves data to MergeTree for persistenceconst cols = kafkaSource.columns;export const eventsMV = new MaterializedView<KafkaEvent>({ tableName: "events_dest", materializedViewName: "events_mv", orderByFields: ["eventId"], selectStatement: sql`SELECT ${cols.eventId}, ${cols.userId}, ${cols.timestamp} FROM ${kafkaSource}`, selectTables: [kafkaSource],});orderByFieldsIf you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to EXTERNALLY_MANAGED.
import { OlapTable, LifeCycle } from "@514labs/moose-lib"; // Table managed by external systemconst externalTable = new OlapTable<UserData>("external_users", { orderByFields: ["id", "timestamp"], lifeCycle: LifeCycle.EXTERNALLY_MANAGED // Moose won't create or modify this table});Learn more about the different lifecycle options and how to use them in the LifeCycle Management documentation.
// Error: No primary key or orderByFieldsinterface BadRecord1 { field1: string; field2: number;}const badTable1 = new OlapTable<BadRecord1>("bad_table1"); // Error: Primary key not first in orderByFieldsinterface BadRecord2 { id: Key<string>; field1: string;}const badTable2 = new OlapTable<BadRecord2>("bad_table2", { orderByFields: ["field1", "id"] // Wrong order - primary key must be first}); // Error: Nullable field in orderByFieldsinterface BadRecord3 { id: Key<string>; field1: string; field2?: number;}const badTable3 = new OlapTable<BadRecord3>("bad_table3", { orderByFields: ["id", "field2"] // Can't have nullable field in orderByFields});One of the powerful features of Moose is its integration with the local development server:
moose devOlapTable in your code and save the file:
For example, if you add a new field to your schema:
// Beforeinterface BasicSchema { id: Key<string>; name: string;} // After adding a fieldinterface BasicSchema { id: Key<string>; name: string; createdAt: Date; // New field}The Moose framework will:
You can verify your tables were created correctly using:
# List all tables in your local environment
moose lsYou can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your moose.config.toml file:
[clickhouse_config]db_name = "local"user = "panda"password = "pandapass"use_ssl = falsehost = "localhost"host_port = 18123native_port = 9000// Beforeinterface BasicSchema { id: Key<string>; name: string;} // After adding a fieldinterface BasicSchema { id: Key<string>; name: string; createdAt: Date; // New field}