Modeling Tables

Viewing:

Overview

Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing.

You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more.

FirstTable.ts

interface MyFirstTable {
  id: Key<string>;
  name: string;
  age: number;
}
 
// Create a table named "first_table"
export const myTable = new OlapTable<MyFirstTable>("first_table");

FirstTable.py

from pydantic import BaseModel
from moose_lib import Key, OlapTable
from pydantic import BaseModel
 
class MyFirstTable(BaseModel):
    id: Key[str]
    name: str
    age: int
 
# Create a table named "first_table"
my_table = OlapTable[MyFirstTable]("first_table")
 
# No export needed - Python modules are automatically discovered

Benefits:

Boilerplate CREATE/ALTER TABLE statements handled for you

Automatic type mapping to ClickHouse types

Built-in type validation on insert

Version-controlled schema management

Basic Usage

Standalone Tables

Create a table directly for custom data flows or when you need fine-grained control:

StandaloneTable.ts

import { OlapTable, Key } from "@514labs/moose-lib";
 
// Define your schema
interface ExampleSchema {
  id: Key<string>;
  dateField: Date;
  numericField: number;
  booleanField: boolean;
  floatField: number;
  integerField: number & tags.Type<"int64">; // Moose supports native tagged types so you can use Integers in typescript
}
 
// Create a standalone table named "example_table"
export const exampleTable = new OlapTable<ExampleSchema>("example_table", {
  orderByFields: ["id", "dateField"] // Optional when using a primary key
});
 
// For deduplication, use the ReplacingMergeTree factory
export const dedupTable = OlapTable.withReplacingMergeTree<ExampleSchema>("example_table", {
  orderByFields: ["id", "dateField"],
  ver: "updatedAt",  // Optional: version column (keeps highest value)
  isDeleted: "deleted"  // Optional: soft delete flag (requires ver)
});
 
// Now you can:
// - Write to this table from streams
// - Query it directly
// - Use it as a source for materialized views

StandaloneTable.py

from moose_lib import Key, OlapTable
from pydantic import BaseModel
 
class ExampleSchema(BaseModel):
    id: Key[str]
    date_field: Date
    numeric_field: float
    boolean_field: bool
 
# Create a standalone table named "example_table"
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
 
example_table = OlapTable[ExampleSchema]("example_table", OlapConfig(
  order_by_fields=["id", "date_field"],
  engine=ReplacingMergeTreeEngine()
))

Use Standalone Tables When:

Use when you need to do a bulk import of data

Use when you have in-memory ETL/ELT workflows that need to write directly to a table as opposed to a streaming ingestion pipeline

Use when you have some external service that is maintaining and writing to the table, like a CDC or other external ETL service

Creating Tables in Ingestion Pipelines

For end-to-end data flows, create tables as part of an ingestion pipeline:

PipelineTable.ts

import { IngestPipeline, Key } from "@514labs/moose-lib";
 
// Define your schema
interface UserEvent {
  id: Key<string>;
  userId: string;
  timestamp: Date;
  eventType: string;
}
 
// Create a complete ingestion pipeline with a table
const eventsPipeline = new IngestPipeline<UserEvent>("user_events", {
  ingest: true,    // Creates a REST API endpoint at POST localhost:4000/ingest/user_events
  stream: true,    // Creates Kafka/Redpanda topic
  table: {         // Creates and configures the table named "user_events"
    orderByFields: ["id", "timestamp"]
  }
});
 
// Access the table component when needed
const eventsTable = eventsPipeline.table;

PipelineTable.py

from moose_lib import IngestPipeline, Key, OlapTable
from pydantic import BaseModel
 
class UserEvent(BaseModel):
    id: Key[str]
    user_id: str
    timestamp: Date
    event_type: str
 
from moose_lib import IngestPipeline, IngestPipelineConfig, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
 
events_pipeline = IngestPipeline[UserEvent]("user_events", IngestPipelineConfig(
  ingest=True, # Creates a REST API endpoint at POST localhost:4000/ingest/user_events
  stream=True, # Creates a Kafka/Redpanda topic
  table=OlapConfig(      # Creates and configures the table named "user_events"
    order_by_fields=["id", "timestamp"],
    engine=ReplacingMergeTreeEngine()
  )
))
 
# Access the table component when needed:
events_table = events_pipeline.get_table()

Data Modeling

Special ClickHouse Types (LowCardinality, Nullable, etc)

ClickHouseTypes.ts

import { Key, Decimal, ClickHouseDecimal, LowCardinality, ClickHouseNamedTuple, tags } from "@514labs/moose-lib";
 
export interface ClickHouseOptimizedExample {
  id: Key<string>;
  stringField: string;
  numberField: number;
  decimalField: Decimal<10, 2>;                         // Precise decimal storage
  // Alternative: decimalField: string & ClickHouseDecimal<10, 2>; // Verbose syntax still works
  lowCardinalityField: string & LowCardinality;         // Faster queries for enum-like data
  nestedObject: {
    innerString: string;
    innerNumber: number;
  };
  namedTupleField: {
    name: string;
    value: number;
  } & ClickHouseNamedTuple;                             // Optimized nested storage
  numberArray: number[];
  mapField: Record<string, number>;
  literalField: "optionA" | "optionB";
  optionalField?: string;
  dateField: Date;
}

ClickHouseTypes.py

from moose_lib import Key, clickhouse_decimal, ClickHouseNamedTuple
from typing import Annotated
from pydantic import BaseModel
from datetime import datetime
 
class Customer(BaseModel):
    name: str
    address: str
 
class Order(BaseModel):
    order_id: Key[str]
    amount: clickhouse_decimal(10, 2)
    status: Literal["Paid", "Shipped", "Delivered"] # translated to LowCardinality(String) in ClickHouse
    created_at: datetime
    customer: Annotated[Customer, "ClickHouseNamedTuple"]

Default values

Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL.

Defaults.ts

import { OlapTable, Key, ClickHouseDefault, Decimal, ClickHouseDecimal } from "@514labs/moose-lib";
 
interface Event {
  id: Key<string>;
  // Static defaults (ClickHouse expression as a string literal)
  status: string & ClickHouseDefault<"'pending'">;   // DEFAULT 'pending'
  retries: number & ClickHouseDefault<"0">;          // DEFAULT 0
  // Server-side timestamps
  createdAt: Date & ClickHouseDefault<"now()">;      // DEFAULT now()
  // Decimal with default
  amount: Decimal<10, 2> & ClickHouseDefault<"0">;
  // Alternative: amount: (string & ClickHouseDecimal<10, 2> & ClickHouseDefault<"0">); // Verbose syntax
}
 
export const events = new OlapTable<Event>("events", {
  orderByFields: ["id", "createdAt"],
});

The value passed into the ClickHouseDefault<""> tag can either be a string literal or a stringified ClickHouse SQL expression. If you run into typing issues specifically on Date fields with ClickHouseDefault, use WithDefault<Date, "now()"> as a fallback workaround.

Defaults.py

from typing import Annotated
from pydantic import BaseModel
from moose_lib import OlapTable, Key, clickhouse_default, clickhouse_decimal
from datetime import datetime
 
class Event(BaseModel):
    id: Key[str]
    # Static defaults
    status: Annotated[str, clickhouse_default("'pending'")]  # DEFAULT 'pending'
    retries: Annotated[int, clickhouse_default("0")]         # DEFAULT 0
    # Server-side timestamps
    created_at: Annotated[datetime, clickhouse_default("now()")]
    # Decimal with default
    amount: Annotated[float, clickhouse_decimal(10, 2)] = 0
 
events = OlapTable[Event]("events", {
    "orderByFields": ["id", "created_at"],
})

The value passed into the clickhouse_default function can either be a string literal or a stringified ClickHouse SQL expression.

ClickHouse defaults with optional fields

If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.

Optional without default (e.g., field?: number) → ClickHouse Nullable type.
Optional with default (e.g., field?: number & ClickHouseDefault<"18"> or WithDefault<number, "18">) → non-nullable column with default 18.

This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.

Primary Keys and Sorting

You must configure table indexing using one of these approaches:

Define at least one Key in your table schema
Specify orderByFields in the table config
Use both (all Key fields must come first in the orderByFields array)

PrimaryKeyConfig.ts

import { OlapTable, Key } from '@514labs/moose-lib';
 
// Approach 1: Using primary key only
interface Record1 {
  id: Key<string>;  // Primary key field
  field1: string;
  field2: number;
}
 
const table1 = new OlapTable<Record1>("table1");  // id is the primary key

PrimaryKeyConfig.py

from moose_lib import Key, OlapTable
from pydantic import BaseModel
 
class Record1(BaseModel):
    id: Key[str] # Primary key field
    field1: str
    field2: int
 
table1 = OlapTable[Record1]("table1") # id is the primary key

Order By Fields Only

OrderByFieldsOnly.ts

// Approach 2: Using orderByFields only
interface SchemaWithoutPrimaryKey {
  field1: string;
  field2: number;
  field3: Date;
}
 
const tableWithOrderByFieldsOnly = new OlapTable<SchemaWithoutPrimaryKey>("table2", {
  orderByFields: ["field1", "field2"]  // Specify ordering without primary key
});

Leverage the OlapTableConfig class to configure your table:

OrderByFieldsOnly.py

from moose_lib import Key, OlapTable, OlapTableConfig
from pydantic import BaseModel
from datetime import datetime
 
class SchemaWithoutPrimaryKey(BaseModel):
    field1: str
    field2: int
    field3: datetime
 
table2 = OlapTable[SchemaWithoutPrimaryKey]("table2", OlapTableConfig(
  orderByFields=["field1", "field2"] # Specify ordering without primary key
))

Using Both Primary Key and Order By Fields

ComboKeyAndOrderByFields.ts

// Approach 3: Using both (primary key must be first)
interface SchemaWithKey {
  id: Key<string>;  // Primary key field
  field1: string;
  field2: number;
}
 
const tableWithKeyAndOrderByFields = new OlapTable<SchemaWithKey>("table3", {
  orderByFields: ["id", "field1"]  // Primary key must be first
});

ComboKeyAndOrderByFields.py

from moose_lib import Key, OlapTable, OlapTableConfig
from pydantic import BaseModel
 
class SchemaWithKey(BaseModel):
    id: Key[str]
    field1: str
    field2: int
 
table3 = OlapTable[SchemaWithKey]("table3", OlapTableConfig(
  orderByFields=["id", "field1"] # Primary key must be first
))

Using Multiple Primary Keys

MultiKeyTable.ts

interface MultiKeyRecord {
  key1: Key<string>;
  key2: Key<number>;
  field1: string;
}
 
const multiKeyTable = new OlapTable<MultiKeyRecord>("multi_key_table", {
  orderByFields: ["key1", "key2", "field1"]  // Multiple keys must come first
});

MultiKeyTable.py

from moose_lib import Key, OlapTable, OlapTableConfig
from pydantic import BaseModel
 
class MultiKeyRecord(BaseModel):
    key1: Key[str]
    key2: Key[int]
    field1: str
 
multi_key_table = OlapTable[MultiKeyRecord]("multi_key_table", OlapTableConfig(
  orderByFields=["key1", "key2", "field1"] # Multiple keys must come first
))

Table engines

By default, Moose will create tables with the MergeTree engine. You can use different engines by setting the engine in the table configuration.

TableEngine.ts

import { OlapTable } from "@514labs/moose-lib";
 
// Default MergeTree engine
const table = new OlapTable<Record>("table", {
  orderByFields: ["id"]
});
 
// Use engine configuration for other engines
const dedupTable = new OlapTable<Record>("table", {
  engine: ClickHouseEngines.ReplacingMergeTree,
  orderByFields: ["id"],
  ver: "version",  // Optional: keeps row with highest version
  isDeleted: "deleted"  // Optional: soft delete when deleted=1
});

TableEngine.py

from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import MergeTreeEngine, ReplacingMergeTreeEngine
 
# Default MergeTree engine
table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"]
))
 
# Explicitly specify engine
dedup_table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"],
  engine=ReplacingMergeTreeEngine()
))

Deduplication (`ReplacingMergeTree`)

Use the ReplacingMergeTree engine to keep only the latest record for your designated sort key:

DeduplicatedTable.ts

// Basic deduplication
const table = new OlapTable<Record>("table", {
  engine: ClickHouseEngines.ReplacingMergeTree,
  orderByFields: ["id"]
});
 
// With version column (keeps record with highest version)
const versionedTable = new OlapTable<Record>("table", {
  engine: ClickHouseEngines.ReplacingMergeTree,
  orderByFields: ["id"],
  ver: "updated_at"  // Column that determines which version to keep
});
 
// With soft deletes (requires ver parameter)
const softDeleteTable = new OlapTable<Record>("table", {
  engine: ClickHouseEngines.ReplacingMergeTree,
  orderByFields: ["id"],
  ver: "updated_at",
  isDeleted: "deleted"  // UInt8 column: 1 marks row for deletion
});

DeduplicatedTable.py

from moose_lib import Key, OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
 
class Record(BaseModel):
    id: Key[str]
    updated_at: str  # Version column
    deleted: int = 0  # Soft delete marker (UInt8)
 
# Basic deduplication
table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"],
  engine=ReplacingMergeTreeEngine()
))
 
# With version column (keeps record with highest version)
versioned_table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"],
  engine=ReplacingMergeTreeEngine(ver="updated_at")
))
 
# With soft deletes (requires ver parameter)
soft_delete_table = OlapTable[Record]("table", OlapConfig(
  order_by_fields=["id"],
  engine=ReplacingMergeTreeEngine(
    ver="updated_at",
    is_deleted="deleted"  # UInt8 column: 1 marks row for deletion
  )
))

Deduplication Caveats

ClickHouse’s ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately.

Version Column (ver): When specified, ClickHouse keeps the row with the maximum version value for each unique sort key.

Soft Deletes (is_deleted): When specified along with ver, rows where this column equals 1 are deleted during merges. This column must be UInt8 type.

For more details, see the ClickHouse documentation.

Streaming from S3 (`S3Queue`)

Use the S3Queue engine to automatically ingest data from S3 buckets as files are added:

S3StreamingTable.ts

import { OlapTable, ClickHouseEngines } from '@514labs/moose-lib';
 
// Use direct configuration (S3Queue does not support orderByFields)
export const s3Events = new OlapTable<S3Event>("s3_events", {
  engine: ClickHouseEngines.S3Queue,
  s3Path: "s3://my-bucket/data/*.json",
  format: "JSONEachRow",
  settings: {
    mode: "unordered",
    keeper_path: "/clickhouse/s3queue/events"
  }
});

S3Queue ORDER BY not supported

S3Queue is a streaming engine and does not support orderByFields or ORDER BY clauses. Configure only engine-specific parameters like s3Path, format, and settings.

S3StreamingTable.py

from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import S3QueueEngine
 
class S3Event(BaseModel):
    id: str
    timestamp: datetime
    data: dict
 
# Modern API using engine configuration
s3_events = OlapTable[S3Event]("s3_events", OlapConfig(
    engine=S3QueueEngine(
        s3_path="s3://my-bucket/data/*.json",
        format="JSONEachRow",
        aws_access_key_id="AKIA...",
        aws_secret_access_key="secret..."
    ),
    settings={
        "mode": "unordered",
        "keeper_path": "/clickhouse/s3queue/events"
    }
))

S3Queue Requirements

S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas.

Irregular column names and Python Aliases

If a ClickHouse column name isn’t a valid Python identifier or starts with an underscore, you can use a safe Python field name and set a Pydantic alias to the real column name. MooseOLAP then uses the alias for ClickHouse DDL and data mapping, so your model remains valid while preserving the true column name.

from pydantic import BaseModel, Field
 
class CHUser(BaseModel):
    # ClickHouse: "_id" → safe Python attribute with alias
    UNDERSCORE_PREFIXED_id: str = Field(alias="_id")
 
    # ClickHouse: "user name" → replace spaces, keep alias
    user_name: str = Field(alias="user name")

Externally Managed Tables

If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to EXTERNALLY_MANAGED.

ExternallyManagedTable.ts

import { OlapTable, LifeCycle } from "@514labs/moose-lib";
 
// Table managed by external system
const externalTable = new OlapTable<UserData>("external_users", {
  orderByFields: ["id", "timestamp"],
  lifeCycle: LifeCycle.EXTERNALLY_MANAGED  // Moose won't create or modify this table
});

ExternallyManagedTable.py

from moose_lib import OlapTable, OlapConfig, LifeCycle
 
# Table managed by external system
external_table = OlapTable[UserData]("external_users", OlapConfig(
    order_by_fields=["id", "timestamp"],
    life_cycle=LifeCycle.EXTERNALLY_MANAGED  # Moose won't create or modify this table in prod mode
))

Learn More About LifeCycle Management

Learn more about the different lifecycle options and how to use them in the LifeCycle Management documentation.

Invalid Configurations

InvalidConfig.ts

// Error: No primary key or orderByFields
interface BadRecord1 {
  field1: string;
  field2: number;
}
const badTable1 = new OlapTable<BadRecord1>("bad_table1");
 
// Error: Primary key not first in orderByFields
interface BadRecord2 {
  id: Key<string>;
  field1: string;
}
const badTable2 = new OlapTable<BadRecord2>("bad_table2", {
  orderByFields: ["field1", "id"]  // Wrong order - primary key must be first
});
 
// Error: Nullable field in orderByFields
interface BadRecord3 {
  id: Key<string>;
  field1: string;
  field2?: number;
}
const badTable3 = new OlapTable<BadRecord3>("bad_table3", {
  orderByFields: ["id", "field2"]  // Can't have nullable field in orderByFields
});

InvalidConfig.py

from moose_lib import Key, OlapTable, OlapTableConfig
from typing import Optional
 
class BadRecord1(BaseModel):
    field1: str
    field2: int
 
bad_table1 = OlapTable[BadRecord1]("bad_table1") ## No primary key or orderByFields
 
class BadRecord2(BaseModel):
    id: Key[str]
    field1: str
 
bad_table2 = OlapTable[BadRecord2]("bad_table2", OlapTableConfig(
  orderByFields=["field1", "id"] # Wrong order - primary key must be first
))
 
class BadRecord3(BaseModel):
    id: Key[str]
    field1: str
    field2: Optional[int]
 
bad_table3 = OlapTable[BadRecord3]("bad_table3", OlapTableConfig(
  orderByFields=["id", "field2"] # Can't have nullable field in orderByFields
))

Development Workflow

Local Development with Hot Reloading

One of the powerful features of Moose is its integration with the local development server:

Start your local development server with moose dev
When you define or modify an OlapTable in your code and save the file:
- The changes are automatically detected
- The TypeScript compiler plugin processes your schema definitions
- The infrastructure is updated in real-time to match your code changes
- Your tables are immediately available for testing

For example, if you add a new field to your schema:

HotReloading.ts

// Before
interface BasicSchema {
  id: Key<string>;
  name: string;
}
 
// After adding a field
interface BasicSchema {
  id: Key<string>;
  name: string;
  createdAt: Date;  // New field
}

The Moose framework will:

Detect the change when you save the file
Update the table schema in the local ClickHouse instance
Make the new field immediately available for use

Verifying Your Tables

You can verify your tables were created correctly using:

Terminal

# List all tables in your local environment
moose ls

Connecting to your local ClickHouse instance

You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your moose.config.toml file:

moose.config.toml

[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000

Moose OLAP

Modeling Materialized Views