Data Modeling

Viewing:

Overview

Data models in Moose are type definitions that generate your data infrastructure. You write them as TypeScript interfacesPydantic models, export them from your app’s root index.tsmain.py file, and Moose creates the corresponding database tables, APIs, and streams from your code.

Working with Data Models

Define your schema

Create a type definition with typed fields and a primary key using your language's type system

Create infrastructure

Use your model as a type parameter to create tables, APIs, streams, or views

Export from root

Export all models and infrastructure from your app's root file (index.ts/main.py) - Moose reads these exports to generate infrastructure

Run dev server

When you create or modify data models, Moose automatically creates or updates all dependent infrastructure components with your latest code changes.

Quick Start

app/index.ts

// 1. Define your schema (WHAT your data looks like)
interface MyDataModel {
  primaryKey: Key<string>;
  someString: string;
  someNumber: number;
  someDate: Date;
}
 
// 2. YOU control which infrastructure to create (HOW to handle your data)
const pipeline = new IngestPipeline<MyDataModel>("MyDataPipeline", {
  ingest: true,    // Optional: Create API endpoint
  stream: true,    // Optional: Create topic
  table: {         // Optional: Create and configure table
    orderByFields: ["primaryKey", "someDate"],
    deduplicate: true
  }
});

app/main.py

# 1. Define your schema (WHAT your data looks like)
class MyFirstDataModel(BaseModel):
    id: Key[str] = Field(..., description="Primary key")
    some_string: str
    some_number: int
    some_boolean: bool
    some_date: datetime
 
 
# 2. YOU control which infrastructure to create (HOW to handle your data)
my_first_pipeline = IngestPipeline[MyFirstDataModel]("my_first_pipeline", IngestPipelineConfig(
    ingest=True,  # Create API endpoint
    stream=True,  # Create stream topic
    table=True    # Create database table
))

Benefits:

End-to-end type safety across your code and infrastructure

Full control over your infrastructure with code

Zero schema drift - change your types in one place, automatically update your infrastructure

Schema Definition

The WHAT

This section covers how to define your data models - the structure and types of your data.

Basic Types

app/datamodels/BasicDataModel.ts

import { Key } from "@514labs/moose-lib";
import { tags } from "typia";
import { ClickHouseDecimal, ClickHousePrecision, LowCardinality } from "@514labs/moose-lib";
 
export interface BasicDataModel {
  // Required: Primary key for your data model
  primaryKey: Key<string>;    // string key
  // or
  numericKey: Key<number>;    // numeric key
 
  // Common types
  someString: string;         // Text
  someNumber: number;         // Numbers (Float64 by default)
  someBoolean: boolean;       // Boolean
  someDate: Date;             // Timestamps (DateTime)
  someArray: string[];        // Arrays
  someJsonObject: Record<string, any>;        // JSON object 
 
  // Explicit integer types
  intField: number & tags.Type<"int64">;   // Int64
  uintField: number & tags.Type<"uint32">; // UInt32
 
  // Decimal with precision/scale
  price: ClickHouseDecimal<10, 2>;         // Decimal(10,2)
 
  // DateTime64 with precision
  eventTime: Date & ClickHousePrecision<6>; // DateTime64(6)
 
  // UUID
  userId: string & tags.Type<"uuid">;      // UUID
 
  // Nullable fields
  nullableField?: string;  // Optional field
 
  // LowCardinality
  lowCardinalityField: string & LowCardinality; // LowCardinality(String)
}

You use Pydantic to define your schemas:

app/datamodels/BasicDataModel.py

from moose_lib import Key, clickhouse_decimal, clickhouse_datetime64
from datetime import datetime, date
from typing import List, Optional, Any, Annotated
from uuid import UUID
from pydantic import BaseModel, Field
from decimal import Decimal
 
class BasicDataModel(BaseModel):
    # Required: Primary key for your data model
    primary_key: Key[str]
    # or
    numeric_key: Key[int]
    
    # Common types
    some_string: str             # Text
    some_number: int             # Numbers (Int64 by default)
    some_boolean: bool           # Boolean
    some_date: datetime          # Timestamps (DateTime)
    some_array: List[str]        # Arrays
    some_json_object: Any        # JSON object
 
    # Explicit integer types
    small_int: Annotated[int, "int16"]   # Int16
    big_uint: Annotated[int, "uint64"]   # UInt64
 
    # Decimal with precision/scale
    price: clickhouse_decimal(10, 2)     # Decimal(10,2)
 
    # DateTime64 with precision
    event_time: clickhouse_datetime64(6) # DateTime64(6)
 
    # UUID
    user_id: UUID                        # UUID
 
    # Optional fields
    optional_field: Optional[str] = None  # May not be present in all records
 
    # LowCardinality
    low_cardinality_field: Annotated[str, "LowCardinality"]

Advanced Schema Patterns

Nested Objects

app/datamodels/NestedDataModel.ts

import { Key } from "@514labs/moose-lib";
 
// Define nested object separately
interface NestedObject {
  nestedNumber: number;
  nestedBoolean: boolean;
  nestedArray: number[];
}
 
export interface DataModelWithNested {
  primaryKey: Key<string>;
  
  // Reference nested object
  nestedData: NestedObject;
 
  // Or define inline
  inlineNested: {
    someValue: string;
    someOtherValue: number;
  };
}

app/datamodels/NestedDataModel.py

from moose_lib import Key
from typing import List
from pydantic import BaseModel, Field
 
class NestedObject(BaseModel):
    nested_number: int
    nested_boolean: bool
    nested_array: List[int]
 
class DataModelWithNested(BaseModel):
    primary_key: Key[str]
    nested_data: NestedObject

Using Enums

app/datamodels/EnumDataModel.ts

import { Key } from "@514labs/moose-lib";
 
enum OrderStatus {
  PENDING = "pending",
  PROCESSING = "processing",
  COMPLETED = "completed"
}
 
export interface Order {
  orderId: Key<string>;
  status: OrderStatus;  // Type-safe status values
  createdAt: Date;
}

app/datamodels/EnumDataModel.py

from enum import Enum
from moose_lib import Key
from datetime import datetime
from pydantic import BaseModel
 
class OrderStatus(str, Enum):
    PENDING = "pending"
    PROCESSING = "processing"
    COMPLETED = "completed"
 
class Order(BaseModel):
    order_id: Key[str]
    status: OrderStatus  # Type-safe status values
    created_at: datetime

Type Mapping

TypeScript	ClickHouse	Description
`string`	String	Text values
`number`	Float64	Numeric values
`number & tags.Type<"int64">`	Int64	64-bit integer
`number & tags.Type<"uint32">`	UInt32	32-bit unsigned integer
`boolean`	Boolean	True/false values
`Date`	DateTime	Timestamp values
`Date & ClickHousePrecision<P>`	DateTime64(P)	Timestamp with precision
`ClickHouseDecimal<P, S>`	Decimal(P, S)	Decimal with precision/scale
`string & tags.Type<"uuid">`	UUID	Universally unique identifier
`Array`	Array	Lists of values
`object`	Nested	Nested structures
`Enum`	Enum	Enumerated values
`any`	JSON	JSON object (discouraged, see below)
`Optional<T>`	Nullable	Optional/nullable fields
`object & ClickHouseNamedTuple`	NamedTuple	Named tuple structures
`Record<K, V>`	Map	Map of key-value pairs
`string & LowCardinality`	LowCardinality(String)	Low cardinality string

Python	ClickHouse	Description
`str`	String	Text values
`int`	Int64	Integer values
`Annotated[int, "int16"]`	Int16	16-bit integer
`Annotated[int, "uint64"]`	UInt64	64-bit unsigned integer
`float`	Float64	Decimal values
`Decimal`	Decimal	Arbitrary-precision decimal
`clickhouse_decimal(p, s)`	Decimal(p, s)	Decimal with precision/scale
`bool`	Boolean	True/false values
`datetime`	DateTime	Timestamp values
`clickhouse_datetime64(p)`	DateTime64(p)	Timestamp with precision
`date`	Date	Date values
`List[T]`	Array	Lists of values
`UUID`	UUID	Universally unique identifier
`Any`	JSON	Any value, use with caution
`Enum`	Enum	Enumerated values
`Optional[T]`	Nullable	Values may be present or not
Nested Pydantic	Nested	Nested structures

Data Modeling Dos and Don’ts

Do not use unknown types for flexible fields (prefer explicit types or any for JSON):

app/datamodels/AnyFields.ts

interface BadDataModel {
  unknownField: unknown;
}

// DO -> Use a specific type or a JSON type if truly needed
interface GoodDataModel {
  jsonField: any; // Only if you need to store arbitrary JSON
}

Do not use union types for flexible or nullable fields:

app/datamodels/FlexibleFields.ts

// DO NOT -> Use union types for conditional fields
interface BadDataModel {
  conditionalField: string | number;
}
 
// DO -> break out into multiple optional fields
interface GoodDataModel {
  conditionalString?: string; // Optional field
  conditionalNumber?: number; // Optional field
}

app/datamodels/NullableFields.ts

// DO NOT -> Use union types for nullable fields
interface BadDataModel {
  nullableField: string | null;
}
 
// DO -> Use Optional type
interface GoodDataModel { 
  nullableField?: string;
}

app/datamodels/FlexibleFields.py

# DO NOT -> Use union types for flexible fields
class BadDataModel(BaseModel):
    flexible_field: Union[str, int]
 
# DO -> Use Any type for flexible fields
class FlexibleDataModel(BaseModel):
    flexible_field: Any # Maps to JSON type in ClickHouse

app/datamodels/NullableFields.py

# DO NOT -> Use union types for nullable fields
class BadDataModel(BaseModel):
    nullable_field: Union[str, None]
 
# DO -> Use Optional type
class GoodDataModel(BaseModel):
    nullable_field: Optional[str]

app/datamodels/DictFields.py

# DO NOT -> Use dict types for nested fields
class BadDataModel(BaseModel):
    dict_field: dict[str, Any]
 
class NestedObject(BaseModel):
    nested_field: str
 
# DO -> Use Nested type (Pydantic model)
class GoodDataModel(BaseModel):
    dict_field: NestedObject

Infrastructure Configuration

The HOW

This section covers how to apply your data models to infrastructure components.

Getting Data Into Your Database

`IngestPipeline`

The most common pattern - combines ingestion, streaming, and storage into a single component:

app/index.ts

import { IngestPipeline } from "@514labs/moose-lib";
 
const myPipeline = new IngestPipeline<MyDataModel>("my_pipeline", {
    ingest: true,
    stream: true,
    table: true
});

app/index.py

from moose_lib import IngestPipeline, IngestPipelineConfig
 
my_pipeline = IngestPipeline[MyDataModel]("my_pipeline", IngestPipelineConfig(
    ingest=True,
    stream=True,
    table=True
))

What gets created?

An HTTP POST endpoint that accepts and validates incoming data against your data model

A typed Redpanda topic that buffers validated data from the ingest API

A ClickHouse table with the same schema as your data model

A Rust process that syncs data from the stream to the table

Standalone Components

If you don’t need all the components, you can create them individually and wire them together yourself:

`OlapTable`

Creates a ClickHouse table with the same schema as your data model:

app/index.ts

import { OlapTable } from "@514labs/moose-lib";
 
// Basic table
const myTable = new OlapTable<MyDataModel>("TableName");

Olap Tables

You might use an OlapTable if you do not need streaming ingest capabilities for your data. Learn more about Olap Tables

`Stream`

Creates a Redpanda topic that can be configured to sync data to a ClickHouse table with the same schema:

app/index.ts

import { Stream } from "@514labs/moose-lib";
 
// Basic stream
const myStream = new Stream<MyDataModel>("TopicName");

Streams

Standalone streams may make sense if you want to transform data on the fly before it is written to the table. Learn more about stream processing

`IngestAPI`

Creates an HTTP POST endpoint at “/ingest/api-route-name” that accepts and validates incoming data against your data model:

app/index.ts

import { IngestAPI } from "@514labs/moose-lib";
 
const myIngestAPI = new IngestAPI<MyDataModel>("api-route-name"); // Creates an HTTP `POST` endpoint at "/ingest/api-route-name"

Ingest APIs

Ingest APIs are almost always preferred as part of an IngestPipeline instead of being used standalone. Learn more about Ingestion APIs

Getting Data Out of Your Database

Data models also power your downstream data processing workflows after data is stored, enabling you to create materialized views and typed APIs that prepare and expose your data for consumption:

`MaterializedView`

Materialized views are a way to pre-compute and store the results of complex queries on your data. This allows you to query the materialized view directly for faster results, or use it as the source for another derived table for cascading transformations:

app/index.ts

import { MaterializedView } from "@514labs/moose-lib";
 
const myMaterializedView = new MaterializedView<MyDataModel>({
  selectStatement: sql`SELECT * FROM my_table`,
  tableName: "my_table",
  materializedViewName: "my_materialized_view"
});

app/index.py

from moose_lib import MaterializedView
 
my_materialized_view = MaterializedView[MyDataModel](
  select_statement="SELECT * FROM my_table",
  table_name="my_table",
  materialized_view_name="my_materialized_view"
)

Materialized Views

Learn more about Materialized Views

`ConsumptionAPI`

Consumption APIs are a way to expose your data to external consumers. They are typed and validateagainst your data models, ensuring that the client request parameters and response types are correct at runtime:

app/index.ts

import { ConsumptionAPI } from "@514labs/moose-lib";
 
const myConsumptionAPI = new ConsumptionAPI<RequestDataModel, ResponseDataModel>("MyConsumptionAPI"async({request: RequestDataModel}, {client, sql}) => {
      // Do something with the request
  return new ResponseDataModel();
});

app/index.py

from moose_lib import ConsumptionAPI
 
def handle(request: RequestDataModel) -> ResponseDataModel:
    # Do something with the request
    return ResponseDataModel()
 
my_consumption_api = ConsumptionAPI[RequestDataModel, ResponseDataModel]("api_route_name"query_function=handle)

Consumption APIs

Learn more about Consumption APIs

Validation

Validation at runtime takes place at the following points:

The body of a POST request to an IngestAPI endpoint - Learn more about Ingestion APIs
The query parameters sent in a GET request to a Consumption API - Learn more about Consumption APIs
The data being synced from a streaming topic to an OlapTable - Learn more about Olap Tables

Next Steps

Project Organization

Ingest Pipelines & APIs

Data Modeling

Viewing:

Overview

Working with Data Models

Define your schema

Create infrastructure

Export from root

Run dev server

Quick Start

Benefits:

End-to-end type safety across your code and infrastructure

Full control over your infrastructure with code

Zero schema drift - change your types in one place, automatically update your infrastructure

Schema Definition

Basic Types

Advanced Schema Patterns

Nested Objects

Using Enums

Type Mapping

Data Modeling Dos and Don’ts

Infrastructure Configuration

Getting Data Into Your Database

IngestPipeline

What gets created?

An HTTP POST endpoint that accepts and validates incoming data against your data model

A typed Redpanda topic that buffers validated data from the ingest API

A ClickHouse table with the same schema as your data model

A Rust process that syncs data from the stream to the table

Standalone Components

OlapTable

Stream

IngestAPI

Getting Data Out of Your Database

MaterializedView

ConsumptionAPI

Validation

Next Steps

Streaming Deep Dive

Working with Tables

Ingesting Data

`IngestPipeline`

`OlapTable`

`Stream`

`IngestAPI`

`MaterializedView`

`ConsumptionAPI`