Data Modeling
Viewing
Overview
Data models in Moose are type definitions that generate your data infrastructure. You write them as TypeScript interfacesPydantic models, export them from your app’s root index.tsmain.py file, and Moose creates the corresponding database tables, APIs, and streams from your code.
Working with Data Models
Define your schema
Create a type definition with typed fields and a primary key using your language's type system
Create infrastructure
Use your model as a type parameter to create tables, APIs, streams, or views
Export from root
Export all models and infrastructure from your app's root file (index.ts/main.py) - Moose reads these exports to generate infrastructure
Run dev server
When you create or modify data models, Moose automatically creates or updates all dependent infrastructure components with your latest code changes.
Quick Start
// 1. Define your schema (WHAT your data looks like)
interface MyDataModel {
primaryKey: Key<string>;
someString: string;
someNumber: number;
someDate: Date;
}
// 2. YOU control which infrastructure to create (HOW to handle your data)
const pipeline = new IngestPipeline<MyDataModel>("MyDataPipeline", {
ingest: true, // Optional: Create API endpoint
stream: true, // Optional: Create topic
table: { // Optional: Create and configure table
orderByFields: ["primaryKey", "someDate"],
deduplicate: true
}
});
# 1. Define your schema (WHAT your data looks like)
class MyFirstDataModel(BaseModel):
id: Key[str] = Field(..., description="Primary key")
some_string: str
some_number: int
some_boolean: bool
some_date: datetime
# 2. YOU control which infrastructure to create (HOW to handle your data)
my_first_pipeline = IngestPipeline[MyFirstDataModel]("my_first_pipeline", IngestPipelineConfig(
ingest=True, # Create API endpoint
stream=True, # Create stream topic
table=True # Create database table
))
Benefits:
End-to-end type safety across your code and infrastructure
Full control over your infrastructure with code
Zero schema drift - change your types in one place, automatically update your infrastructure
Schema Definition
The WHAT
This section covers how to define your data models - the structure and types of your data.
Basic Types
import { Key } from "@514labs/moose-lib";
import { tags } from "typia";
import { ClickHouseDecimal, ClickHousePrecision, LowCardinality } from "@514labs/moose-lib";
export interface BasicDataModel {
// Required: Primary key for your data model
primaryKey: Key<string>; // string key
// or
numericKey: Key<number>; // numeric key
// Common types
someString: string; // Text
someNumber: number; // Numbers (Float64 by default)
someBoolean: boolean; // Boolean
someDate: Date; // Timestamps (DateTime)
someArray: string[]; // Arrays
someJsonObject: Record<string, any>; // JSON object
// Explicit integer types
intField: number & tags.Type<"int64">; // Int64
uintField: number & tags.Type<"uint32">; // UInt32
// Decimal with precision/scale
price: ClickHouseDecimal<10, 2>; // Decimal(10,2)
// DateTime64 with precision
eventTime: Date & ClickHousePrecision<6>; // DateTime64(6)
// UUID
userId: string & tags.Type<"uuid">; // UUID
// Nullable fields
nullableField?: string; // Optional field
// LowCardinality
lowCardinalityField: string & LowCardinality; // LowCardinality(String)
}
You use Pydantic to define your schemas:
from moose_lib import Key, clickhouse_decimal, clickhouse_datetime64
from datetime import datetime, date
from typing import List, Optional, Any, Annotated
from uuid import UUID
from pydantic import BaseModel, Field
from decimal import Decimal
class BasicDataModel(BaseModel):
# Required: Primary key for your data model
primary_key: Key[str]
# or
numeric_key: Key[int]
# Common types
some_string: str # Text
some_number: int # Numbers (Int64 by default)
some_boolean: bool # Boolean
some_date: datetime # Timestamps (DateTime)
some_array: List[str] # Arrays
some_json_object: Any # JSON object
# Explicit integer types
small_int: Annotated[int, "int16"] # Int16
big_uint: Annotated[int, "uint64"] # UInt64
# Decimal with precision/scale
price: clickhouse_decimal(10, 2) # Decimal(10,2)
# DateTime64 with precision
event_time: clickhouse_datetime64(6) # DateTime64(6)
# UUID
user_id: UUID # UUID
# Optional fields
optional_field: Optional[str] = None # May not be present in all records
# LowCardinality
low_cardinality_field: Annotated[str, "LowCardinality"]
Advanced Schema Patterns
Nested Objects
import { Key } from "@514labs/moose-lib";
// Define nested object separately
interface NestedObject {
nestedNumber: number;
nestedBoolean: boolean;
nestedArray: number[];
}
export interface DataModelWithNested {
primaryKey: Key<string>;
// Reference nested object
nestedData: NestedObject;
// Or define inline
inlineNested: {
someValue: string;
someOtherValue: number;
};
}
from moose_lib import Key
from typing import List
from pydantic import BaseModel, Field
class NestedObject(BaseModel):
nested_number: int
nested_boolean: bool
nested_array: List[int]
class DataModelWithNested(BaseModel):
primary_key: Key[str]
nested_data: NestedObject
Using Enums
import { Key } from "@514labs/moose-lib";
enum OrderStatus {
PENDING = "pending",
PROCESSING = "processing",
COMPLETED = "completed"
}
export interface Order {
orderId: Key<string>;
status: OrderStatus; // Type-safe status values
createdAt: Date;
}
from enum import Enum
from moose_lib import Key
from datetime import datetime
from pydantic import BaseModel
class OrderStatus(str, Enum):
PENDING = "pending"
PROCESSING = "processing"
COMPLETED = "completed"
class Order(BaseModel):
order_id: Key[str]
status: OrderStatus # Type-safe status values
created_at: datetime
Type Mapping
TypeScript | ClickHouse | Description |
---|---|---|
string | String | Text values |
number | Float64 | Numeric values |
number & tags.Type<"int64"> | Int64 | 64-bit integer |
number & tags.Type<"uint32"> | UInt32 | 32-bit unsigned integer |
boolean | Boolean | True/false values |
Date | DateTime | Timestamp values |
Date & ClickHousePrecision<P> | DateTime64(P) | Timestamp with precision |
ClickHouseDecimal<P, S> | Decimal(P, S) | Decimal with precision/scale |
string & tags.Type<"uuid"> | UUID | Universally unique identifier |
Array | Array | Lists of values |
object | Nested | Nested structures |
Enum | Enum | Enumerated values |
any | JSON | JSON object (discouraged, see below) |
Optional<T> | Nullable | Optional/nullable fields |
Python | ClickHouse | Description |
---|---|---|
str | String | Text values |
int | Int64 | Integer values |
Annotated[int, "int16"] | Int16 | 16-bit integer |
Annotated[int, "uint64"] | UInt64 | 64-bit unsigned integer |
float | Float64 | Decimal values |
Decimal | Decimal | Arbitrary-precision decimal |
clickhouse_decimal(p, s) | Decimal(p, s) | Decimal with precision/scale |
bool | Boolean | True/false values |
datetime | DateTime | Timestamp values |
clickhouse_datetime64(p) | DateTime64(p) | Timestamp with precision |
date | Date | Date values |
List[T] | Array | Lists of values |
UUID | UUID | Universally unique identifier |
Any | JSON | Any value, use with caution |
Enum | Enum | Enumerated values |
Optional[T] | Nullable | Values may be present or not |
Nested Pydantic | Nested | Nested structures |
Data Modeling Dos and Don’ts
Do not use unknown
types for flexible fields (prefer explicit types or any
for JSON):
interface BadDataModel {
unknownField: unknown;
}
// DO -> Use a specific type or a JSON type if truly needed
interface GoodDataModel {
jsonField: any; // Only if you need to store arbitrary JSON
}
Do not use union types for flexible or nullable fields:
// DO NOT -> Use union types for conditional fields
interface BadDataModel {
conditionalField: string | number;
}
// DO -> break out into multiple optional fields
interface GoodDataModel {
conditionalString?: string; // Optional field
conditionalNumber?: number; // Optional field
}
// DO NOT -> Use union types for nullable fields
interface BadDataModel {
nullableField: string | null;
}
// DO -> Use Optional type
interface GoodDataModel {
nullableField?: string;
}
# DO NOT -> Use union types for flexible fields
class BadDataModel(BaseModel):
flexible_field: Union[str, int]
# DO -> Use Any type for flexible fields
class FlexibleDataModel(BaseModel):
flexible_field: Any # Maps to JSON type in ClickHouse
# DO NOT -> Use union types for nullable fields
class BadDataModel(BaseModel):
nullable_field: Union[str, None]
# DO -> Use Optional type
class GoodDataModel(BaseModel):
nullable_field: Optional[str]
# DO NOT -> Use dict types for nested fields
class BadDataModel(BaseModel):
dict_field: dict[str, Any]
class NestedObject(BaseModel):
nested_field: str
# DO -> Use Nested type (Pydantic model)
class GoodDataModel(BaseModel):
dict_field: NestedObject
Infrastructure Configuration
The HOW
This section covers how to apply your data models to infrastructure components.
Getting Data Into Your Database
IngestPipeline
The most common pattern - combines ingestion, streaming, and storage into a single component:
import { IngestPipeline } from "@514labs/moose-lib";
const myPipeline = new IngestPipeline<MyDataModel>("my_pipeline", {
ingest: true,
stream: true,
table: true
});
from moose_lib import IngestPipeline, IngestPipelineConfig
my_pipeline = IngestPipeline[MyDataModel]("my_pipeline", IngestPipelineConfig(
ingest=True,
stream=True,
table=True
))
What gets created?
An HTTP POST endpoint that accepts and validates incoming data against your data model
A typed Redpanda topic that buffers validated data from the ingest API
A ClickHouse table with the same schema as your data model
A Rust process that syncs data from the stream to the table
Standalone Components
If you don’t need all the components, you can create them individually and wire them together yourself:
OlapTable
Creates a ClickHouse table with the same schema as your data model:
import { OlapTable } from "@514labs/moose-lib";
// Basic table
const myTable = new OlapTable<MyDataModel>("TableName");
from moose_lib import OlapTable
# Basic table
my_table = OlapTable[MyDataModel]("TableName")
Olap Tables
You might use an OlapTable if you do not need streaming ingest capabilities for your data. Learn more about Olap Tables
Stream
Creates a Redpanda topic that can be configured to sync data to a ClickHouse table with the same schema:
import { Stream } from "@514labs/moose-lib";
// Basic stream
const myStream = new Stream<MyDataModel>("TopicName");
from moose_lib import Stream
# Basic stream
my_stream = Stream[MyDataModel]("TopicName")
Streams
Standalone streams may make sense if you want to transform data on the fly before it is written to the table. Learn more about stream processing
IngestAPI
Creates an HTTP POST
endpoint at “/ingest/api-route-name” that accepts and validates incoming data against your data model:
import { IngestAPI } from "@514labs/moose-lib";
const myIngestAPI = new IngestAPI<MyDataModel>("api-route-name"); // Creates an HTTP `POST` endpoint at "/ingest/api-route-name"
from moose_lib import IngestAPI
my_ingest_api = IngestAPI[MyDataModel]("api-route-name")
Ingest APIs
Ingest APIs are almost always preferred as part of an IngestPipeline
instead of being used standalone. Learn more about Ingestion APIs
Getting Data Out of Your Database
Data models also power your downstream data processing workflows after data is stored, enabling you to create materialized views and typed APIs that prepare and expose your data for consumption:
MaterializedView
Materialized views are a way to pre-compute and store the results of complex queries on your data. This allows you to query the materialized view directly for faster results, or use it as the source for another derived table for cascading transformations:
import { MaterializedView } from "@514labs/moose-lib";
const myMaterializedView = new MaterializedView<MyDataModel>({
selectStatement: sql`SELECT * FROM my_table`,
tableName: "my_table",
materializedViewName: "my_materialized_view"
});
from moose_lib import MaterializedView
my_materialized_view = MaterializedView[MyDataModel](
select_statement="SELECT * FROM my_table",
table_name="my_table",
materialized_view_name="my_materialized_view"
)
Materialized Views
ConsumptionAPI
Consumption APIs are a way to expose your data to external consumers. They are typed and validateagainst your data models, ensuring that the client request parameters and response types are correct at runtime:
import { ConsumptionAPI } from "@514labs/moose-lib";
const myConsumptionAPI = new ConsumptionAPI<RequestDataModel, ResponseDataModel>("MyConsumptionAPI"async({request: RequestDataModel}, {client, sql}) => {
// Do something with the request
return new ResponseDataModel();
});
from moose_lib import ConsumptionAPI
def handle(request: RequestDataModel) -> ResponseDataModel:
# Do something with the request
return ResponseDataModel()
my_consumption_api = ConsumptionAPI[RequestDataModel, ResponseDataModel]("api_route_name"query_function=handle)
Consumption APIs
Validation
Validation at runtime takes place at the following points:
- The body of a
POST
request to anIngestAPI
endpoint - Learn more about Ingestion APIs - The query parameters sent in a
GET
request to a Consumption API - Learn more about Consumption APIs - The data being synced from a streaming topic to an
OlapTable
- Learn more about Olap Tables