Inserting Data

Inserting data into your database is a common task. MooseStack provides a few different ways to insert data into your database.

Optional fields with ClickHouse defaults

If a table column is modeled as optional in your app type but has a ClickHouse default, Moose treats incoming records as optional at the API/stream boundary, but the ClickHouse table stores the column as required with a DEFAULT clause. If you omit the field in the payload, ClickHouse fills it with the default at insert time.

Annotated[int, clickhouse_default("18")]

From a Stream (Streaming Ingest)

When you need to stream data into your ClickHouse tables, you can set the Stream.destination as a reference to the OlapTable you want to insert into. This will automatically provision a synchronization process that batches and inserts data into the table.

StreamInsert.ts

import { Stream } from "@514labs/moose-lib"; interface Event {    id: Key<string>;    userId: string;    timestamp: Date;    eventType: string;} const eventsTable = new OlapTable<Event>("Events"); const stream = new Stream<Event>("Events", {    destination: eventsTable // automatically syncs the stream to the table in ClickHouse-optimized batches});

StreamInsert.py

from moose_lib import Stream, StreamConfig, Keyfrom pydantic import BaseModelfrom datetime import datetime class Event(BaseModel):    id: Key[str]    user_id: str    timestamp: datetime    event_type: str events_table = OlapTable[Event]("user_events") events_pipeline = Stream[Event]("user_events", StreamConfig(    destination=events_table # Automatically syncs the stream to the table in ClickHouse-optimized batches))

WorkflowInsert.py

from moose_lib import OlapTable, Key, InsertOptionsfrom pydantic import BaseModelfrom datetime import datetime class UserEvent(BaseModel):    id: Key[str]    user_id: str    timestamp: datetime    event_type: str events_table = OlapTable[UserEvent]("user_events") # Direct insertion for ETL workflowsresult = events_table.insert([    {"id": "evt_1", "user_id": "user_123", "timestamp": datetime.now(), "event_type": "click"},    {"id": "evt_2", "user_id": "user_456", "timestamp": datetime.now(), "event_type": "view"}]) print(f"Successfully inserted: {result.successful} records")print(f"Failed: {result.failed} records")

DirectInsert.py

from moose_lib import OlapTable, Key, InsertOptionsfrom pydantic import BaseModelfrom datetime import datetime class UserEvent(BaseModel):    id: Key[str]    user_id: str    timestamp: datetime    event_type: str events_table = OlapTable[UserEvent]("user_events") # Insert single record or array of recordsresult = events_table.insert([    {"id": "evt_1", "user_id": "user_123", "timestamp": datetime.now(), "event_type": "click"},    {"id": "evt_2", "user_id": "user_456", "timestamp": datetime.now(), "event_type": "view"}]) print(f"Successfully inserted: {result.successful} records")print(f"Failed: {result.failed} records")

For large datasets, use Python generators for memory-efficient processing:

StreamInsert.py

def user_event_generator():    """Generate user events for memory-efficient processing."""    for i in range(10000):        yield {            "id": f"evt_{i}",            "user_id": f"user_{i % 100}",            "timestamp": datetime.now(),            "event_type": "click" if i % 2 == 0 else "view"        } # Insert from generator (validation not available for streams)result = events_table.insert(user_event_generator(), InsertOptions(strategy="fail-fast"))

ValidationMethods.py

from moose_lib import OlapTable, Keyfrom pydantic import BaseModel class UserEvent(BaseModel):    id: Key[str]    user_id: str    event_type: str events_table = OlapTable[UserEvent]("user_events") # Validate a single recordvalidated_data, error = events_table.validate_record(unknown_data)if validated_data is not None:    print("Valid data:", validated_data)else:    print("Validation error:", error) # Validate multiple records with detailed error reportingvalidation_result = events_table.validate_records(data_array)print(f"Valid records: {len(validation_result.valid)}")print(f"Invalid records: {len(validation_result.invalid)}")for error in validation_result.invalid:    print(f"Record {error.index} failed: {error.error}")

Discard.py

from moose_lib import InsertOptions # Discards invalid records, continues with valid onesresult = events_table.insert(data, InsertOptions(    strategy="discard",    allow_errors=10,           # Allow up to 10 failed records    allow_errors_ratio=0.05    # Allow up to 5% failure rate))

Isolate.py

from moose_lib import InsertOptions # Retries individual records to isolate failuresresult = events_table.insert(data, InsertOptions(    strategy="isolate",    allow_errors_ratio=0.1)) # Access detailed failure informationif result.failed_records:    for failed in result.failed_records:        print(f"Record {failed.index} failed: {failed.error}")

Performance.py

from moose_lib import InsertOptions # For high-throughput scenariosresult = events_table.insert(large_dataset, InsertOptions(    validate=False,  # Skip validation for performance    strategy="discard")) # Clean up when completely done (optional)events_table.close_client()

Inserting Data

Optional fields with ClickHouse defaults

From a Stream (Streaming Ingest)

ClickHouse Requires Batched Inserts

From a Workflow (Batch Insert)

From a Client App

Via REST API

OpenAPI Client Integration

Coming Soon: MooseClient

Want to get involved?

Direct Data Insertion

Inserting Arrays of Records

Best Practice: Use Batching

Handling Large Batch Inserts

Validation Methods

Error Handling Strategies

Fail-Fast Strategy (Default)

Discard Strategy

Isolate Strategy

Performance Optimization

Best Practices

Insert Best Practices

Use streams for real-time data

Use direct insertion for batch processing

Validate data before insertion

Choose appropriate error handling