Modeling Tables
Viewing:
Overview
Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing.
You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more.
interface MyFirstTable {
id: Key<string>;
name: string;
age: number;
}
// Create a table named "first_table"
const myTable = new OlapTable<MyFirstTable>("first_table");
from pydantic import BaseModel
from moose_lib import Key, OlapTable
from pydantic import BaseModel
class MyFirstTable(BaseModel):
id: Key[str]
name: str
age: int
# Create a table named "first_table"
my_table = OlapTable[MyFirstTable]("first_table")
Benefits:
Boilerplate CREATE/ALTER TABLE statements handled for you
Automatic type mapping to ClickHouse types
Built-in type validation on insert
Version-controlled schema management
Basic Usage
Standalone Tables
Create a table directly for custom data flows or when you need fine-grained control:
import { OlapTable, Key } from "@514labs/moose-lib";
// Define your schema
interface ExampleSchema {
id: Key<string>;
dateField: Date;
numericField: number;
booleanField: boolean;
floatField: number;
integerField: number & tags.Type<"int64">; // Moose supports native tagged types so you can use Integers in typescript
}
// Create a standalone table named "example_table"
const exampleTable = new OlapTable<ExampleSchema>("example_table", {
orderByFields: ["id", "dateField"], // Optional when using a primary key
deduplicate: true // Optional: keep only the latest version of each record
});
// Now you can:
// - Write to this table from streams
// - Query it directly
// - Use it as a source for materialized views
from moose_lib import Key, OlapTable
from pydantic import BaseModel
class ExampleSchema(BaseModel):
id: Key[str]
date_field: Date
numeric_field: float
boolean_field: bool
# Create a standalone table named "example_table"
example_table = OlapTable[ExampleSchema]("example_table", {
orderByFields: ["id", "date_field"],
deduplicate: True
})
Use Standalone Tables When:
Use when you need to do a bulk import of data
Use when you have in-memory ETL/ELT workflows that need to write directly to a table as opposed to a streaming ingestion pipeline
Use when you have some external service that is maintaining and writing to the table, like a CDC or other external ETL service
Creating Tables in Ingestion Pipelines
For end-to-end data flows, create tables as part of an ingestion pipeline:
import { IngestPipeline, Key } from "@514labs/moose-lib";
// Define your schema
interface UserEvent {
id: Key<string>;
userId: string;
timestamp: Date;
eventType: string;
}
// Create a complete ingestion pipeline with a table
const eventsPipeline = new IngestPipeline<UserEvent>("user_events", {
ingest: true, // Creates a REST API endpoint at POST localhost:4000/ingest/user_events
stream: true, // Creates Kafka/Redpanda topic
table: { // Creates and configures the table named "user_events"
orderByFields: ["id", "timestamp"],
deduplicate: true
}
});
// Access the table component when needed
const eventsTable = eventsPipeline.table;
from moose_lib import IngestPipeline, Key, OlapTable
from pydantic import BaseModel
class UserEvent(BaseModel):
id: Key[str]
user_id: str
timestamp: Date
event_type: str
events_pipeline = IngestPipeline[UserEvent]("user_events", {
ingest: True, # Creates a REST API endpoint at POST localhost:4000/ingest/user_events
stream: True, # Creates a Kafka/Redpanda topic
table: { # Creates and configures the table named "user_events"
orderByFields: ["id", "timestamp"],
deduplicate: True
}
})
# Access the table component when needed:
events_table = events_pipeline.get_table()
Data Modeling
Special ClickHouse Types (LowCardinality, Nullable, etc)
import { Key, ClickHouseDecimal, LowCardinality, ClickHouseNamedTuple, tags } from "@514labs/moose-lib";
export interface ClickHouseOptimizedExample {
id: Key<string>;
stringField: string;
numberField: number;
decimalField: string & ClickHouseDecimal<10, 2>; // Precise decimal storage
lowCardinalityField: string & LowCardinality; // Faster queries for enum-like data
nestedObject: {
innerString: string;
innerNumber: number;
};
namedTupleField: {
name: string;
value: number;
} & ClickHouseNamedTuple; // Optimized nested storage
numberArray: number[];
mapField: Record<string, number>;
literalField: "optionA" | "optionB";
optionalField?: string;
dateField: Date;
}
from moose_lib import Key, clickhouse_decimal, ClickHouseNamedTuple
from typing import Annotated
from pydantic import BaseModel
from datetime import datetime
class Customer(BaseModel):
name: str
address: str
class Order(BaseModel):
order_id: Key[str]
amount: clickhouse_decimal(10, 2)
status: str # Use Literal[...] for LowCardinality
created_at: datetime
customer: Annotated[Customer, "ClickHouseNamedTuple"]
Primary Keys and Sorting
You must configure table indexing using one of these approaches:
- Define at least one
Key
in your table schema - Specify
orderByFields
in the table config - Use both (all
Key
fields must come first in theorderByFields
array)
import { OlapTable, Key } from '@514labs/moose-lib';
// Approach 1: Using primary key only
interface Record1 {
id: Key<string>; // Primary key field
field1: string;
field2: number;
}
const table1 = new OlapTable<Record1>("table1"); // id is the primary key
from moose_lib import Key, OlapTable
from pydantic import BaseModel
class Record1(BaseModel):
id: Key[str] # Primary key field
field1: str
field2: int
table1 = OlapTable[Record1]("table1") # id is the primary key
Order By Fields Only
// Approach 2: Using orderByFields only
interface SchemaWithoutPrimaryKey {
field1: string;
field2: number;
field3: Date;
}
const tableWithOrderByFieldsOnly = new OlapTable<SchemaWithoutPrimaryKey>("table2", {
orderByFields: ["field1", "field2"] // Specify ordering without primary key
});
Leverage the OlapTableConfig
class to configure your table:
from moose_lib import Key, OlapTable, OlapTableConfig
from pydantic import BaseModel
from datetime import datetime
class SchemaWithoutPrimaryKey(BaseModel):
field1: str
field2: int
field3: datetime
table2 = OlapTable[SchemaWithoutPrimaryKey]("table2", OlapTableConfig(
orderByFields=["field1", "field2"] # Specify ordering without primary key
))
Using Both Primary Key and Order By Fields
// Approach 3: Using both (primary key must be first)
interface SchemaWithKey {
id: Key<string>; // Primary key field
field1: string;
field2: number;
}
const tableWithKeyAndOrderByFields = new OlapTable<SchemaWithKey>("table3", {
orderByFields: ["id", "field1"] // Primary key must be first
});
from moose_lib import Key, OlapTable, OlapTableConfig
from pydantic import BaseModel
class SchemaWithKey(BaseModel):
id: Key[str]
field1: str
field2: int
table3 = OlapTable[SchemaWithKey]("table3", OlapTableConfig(
orderByFields=["id", "field1"] # Primary key must be first
))
Using Multiple Primary Keys
interface MultiKeyRecord {
key1: Key<string>;
key2: Key<number>;
field1: string;
}
const multiKeyTable = new OlapTable<MultiKeyRecord>("multi_key_table", {
orderByFields: ["key1", "key2", "field1"] // Multiple keys must come first
});
from moose_lib import Key, OlapTable, OlapTableConfig
from pydantic import BaseModel
class MultiKeyRecord(BaseModel):
key1: Key[str]
key2: Key[int]
field1: str
multi_key_table = OlapTable[MultiKeyRecord]("multi_key_table", OlapTableConfig(
orderByFields=["key1", "key2", "field1"] # Multiple keys must come first
))
Table engines
By default, Moose will create tables with the MergeTree
engine. You can change the engine by setting the engine
property in the table config. To use a different engine, you can set the engine
property in the table config.
import { OlapTable, ClickHouseEngines } from "@514labs/moose-lib";
const table = new OlapTable<Record>("table", {
orderByFields: ["id"],
engine: ClickHouseEngines.MergeTree // Default engine
});
from moose_lib import OlapTable, OlapTableConfig, ClickHouseEngines
table = OlapTable[Record]("table", OlapTableConfig(
orderByFields=["id"],
engine=ClickHouseEngines.MergeTree # Default engine
))
Deduplication (ReplacingMergeTree
)
Use the ReplacingMergeTree
engine to keep only the latest record for your designated sort key:
const table = new OlapTable<Record>("table", {
orderByFields: ["id"],
engine: ClickHouseEngines.ReplacingMergeTree
});
from moose_lib import Key, OlapTable, OlapTableConfig
class Record(BaseModel):
id: Key[str]
table = OlapTable[Record]("table", OlapTableConfig(
orderByFields=["id"],
engine=ClickHouseEngines.ReplacingMergeTree
))
Deduplication Caveats
ClickHouse’s ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately. For more details, see the ClickHouse documentation.
Externally Managed Tables
If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to EXTERNALLY_MANAGED
.
import { OlapTable, LifeCycle } from "@514labs/moose-lib";
// Table managed by external system
const externalTable = new OlapTable<UserData>("external_users", {
orderByFields: ["id", "timestamp"],
lifeCycle: LifeCycle.EXTERNALLY_MANAGED // Moose won't create or modify this table
});
from moose_lib import OlapTable, OlapConfig, LifeCycle
# Table managed by external system
external_table = OlapTable[UserData]("external_users", OlapConfig(
order_by_fields=["id", "timestamp"],
life_cycle=LifeCycle.EXTERNALLY_MANAGED # Moose won't create or modify this table
))
Learn More About LifeCycle Management
Learn more about the different lifecycle options and how to use them in the LifeCycle Management documentation.
Invalid Configurations
// Error: No primary key or orderByFields
interface BadRecord1 {
field1: string;
field2: number;
}
const badTable1 = new OlapTable<BadRecord1>("bad_table1");
// Error: Primary key not first in orderByFields
interface BadRecord2 {
id: Key<string>;
field1: string;
}
const badTable2 = new OlapTable<BadRecord2>("bad_table2", {
orderByFields: ["field1", "id"] // Wrong order - primary key must be first
});
// Error: Nullable field in orderByFields
interface BadRecord3 {
id: Key<string>;
field1: string;
field2?: number;
}
const badTable3 = new OlapTable<BadRecord3>("bad_table3", {
orderByFields: ["id", "field2"] // Can't have nullable field in orderByFields
});
from moose_lib import Key, OlapTable, OlapTableConfig
from typing import Optional
class BadRecord1(BaseModel):
field1: str
field2: int
bad_table1 = OlapTable[BadRecord1]("bad_table1") ## No primary key or orderByFields
class BadRecord2(BaseModel):
id: Key[str]
field1: str
bad_table2 = OlapTable[BadRecord2]("bad_table2", OlapTableConfig(
orderByFields=["field1", "id"] # Wrong order - primary key must be first
))
class BadRecord3(BaseModel):
id: Key[str]
field1: str
field2: Optional[int]
bad_table3 = OlapTable[BadRecord3]("bad_table3", OlapTableConfig(
orderByFields=["id", "field2"] # Can't have nullable field in orderByFields
))
Development Workflow
Local Development with Hot Reloading
One of the powerful features of Moose is its integration with the local development server:
- Start your local development server with
moose dev
- When you define or modify an
OlapTable
in your code and save the file:- The changes are automatically detected
- The TypeScript compiler plugin processes your schema definitions
- The infrastructure is updated in real-time to match your code changes
- Your tables are immediately available for testing
For example, if you add a new field to your schema:
// Before
interface BasicSchema {
id: Key<string>;
name: string;
}
// After adding a field
interface BasicSchema {
id: Key<string>;
name: string;
createdAt: Date; // New field
}
# Before
class BasicSchema(BaseModel):
id: Key[str]
name: str
# After adding a field
class BasicSchema(BaseModel):
id: Key[str]
name: str
created_at: datetime
The Moose framework will:
- Detect the change when you save the file
- Update the table schema in the local ClickHouse instance
- Make the new field immediately available for use
Verifying Your Tables
You can verify your tables were created correctly using:
# List all tables in your local environment
moose ls
Connecting to your local ClickHouse instance
You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your moose.config.toml
file:
[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000