Sync to Table

Overview

Moose automatically handles batch writes between streams and OLAP tables through a destination configuration. When you specify a destination OLAP table for a stream, Moose provisions a background synchronization process that batches and writes data from the stream to the table.

Basic Usage

SyncToTable.py

from moose_lib import Stream, OlapTable, Key class Event(BaseModel):    id: Key[str]    user_id: str    timestamp: datetime    event_type: str events_table = OlapTable[Event]("events") events_stream = Stream[Event]("events", StreamConfig(    destination=events_table  # This configures automatic batching))

What is Automatic Sync?

ClickHouse Optimized Batching

Automatically batches inserts according to ClickHouse-recommended best practices

AutoSync.py

from moose_lib import IngestPipeline, IngestPipelineConfig, Keyfrom pydantic import BaseModelfrom datetime import datetime class Event(BaseModel):    id: Key[str]    user_id: str    timestamp: datetime    event_type: str # Creates stream, table, API, and automatic syncevents_pipeline = IngestPipeline[Event]("events", IngestPipelineConfig(    ingest_api=True,       stream=True,   # Creates stream      table=True     # Creates destination table + auto-sync process))

ManualSync.py

from moose_lib import Stream, OlapTable, IngestApi, StreamConfig, Keyfrom pydantic import BaseModelfrom datetime import datetime class Event(BaseModel):    id: Key[str]    user_id: str    timestamp: datetime    event_type: str # Create table firstevents_table = OlapTable[Event]("events") # Create stream with destination table (enables auto-sync)events_stream = Stream[Event]("events", StreamConfig(    destination=events_table  # This configures automatic batching)) # Create API that writes to the streamevents_api = IngestApi[Event]("events", {    "destination": events_stream})

DataFlow.py

# 1. Data sent to ingestion API  requests.post('http://localhost:4000/ingest/events', json={    "id": "evt_123",    "user_id": "user_456",    "timestamp": "2024-01-15T10:30:00Z",     "event_type": "click"}) # 2. API validates and writes to stream# 3. Background sync process batches stream data# 4. Batch automatically written to ClickHouse table when:#    - Batch reaches 100,000 records, OR#    - 1 second has elapsed since last flush # 5. Data available for queries in events table# SELECT * FROM events WHERE user_id = 'user_456';

Sync to Table

Overview

Basic Usage

What is Automatic Sync?

ClickHouse Optimized Batching

At-least-once Delivery

1-line Setup

Setting Up Automatic Sync

Using IngestPipeline (Easiest)

Standalone Components

How Automatic Syncing Works

ClickHouse Requires Batched Inserts

Data Flow Example

Monitoring and Observability