# Moose / Streaming / Sync To Table Documentation – TypeScript ## Included Files 1. moose/streaming/sync-to-table/sync-to-table.mdx ## Sync to Table Source: moose/streaming/sync-to-table/sync-to-table.mdx Automatically sync stream data to OLAP tables with intelligent batching # Sync to Table ## Overview Moose automatically handles batch writes between streams and OLAP tables through a **destination configuration**. When you specify a `destination` OLAP table for a stream, Moose provisions a background synchronization process that batches and writes data from the stream to the table. ### Basic Usage ```ts filename="SyncToTable.ts" copy {13} interface Event { id: Key; userId: string; timestamp: Date; eventType: string; } const eventsTable = new OlapTable("events"); const eventsStream = new Stream("events", { destination: eventsTable // This configures automatic batching }); ``` ## Setting Up Automatic Sync ### Using IngestPipeline (Easiest) The simplest way to set up automatic syncing is with an `IngestPipeline`, which creates all components and wires them together: ```ts filename="AutoSync.ts" copy interface Event { id: Key; userId: string; timestamp: Date; eventType: string; } // Creates stream, table, API, and automatic sync const eventsPipeline = new IngestPipeline("events", { ingestApi: true, // Creates HTTP endpoint at POST /ingest/events stream: true, // Creates buffering stream table: true // Creates destination table + auto-sync process }); ``` ### Standalone Components For more granular control, you can configure components individually: ```ts filename="ManualSync.ts" copy interface Event { id: Key; userId: string; timestamp: Date; eventType: string; } // Create table first const eventsTable = new OlapTable("events"); // Create stream with destination table (enables auto-sync) const eventsStream = new Stream("events", { destination: eventsTable // This configures automatic batching }); // Create API that writes to the stream const eventsApi = new IngestApi("events", { destination: eventsStream }); ``` ## How Automatic Syncing Works When you configure a stream with a `destination` table, Moose automatically handles the synchronization by managing a Rust process process in the background. Moose creates a **Rust background process** that: 1. **Consumes** messages from the stream (Kafka/Redpanda topic) 2. **Batches** records up to 100,000 or flushes every second (whichever comes first) 3. **Executes** optimized ClickHouse `INSERT` statements 4. **Commits** stream offsets after successful writes 5. **Retries** failed batches with exponential backoff Default batching parameters: | Parameter | Value | Description | |-----------|-------|-------------| | `MAX_BATCH_SIZE` | 100,000 records | Maximum records per batch insert | | `FLUSH_INTERVAL` | 1 second | Automatic flush regardless of batch size | Currently, you cannot configure the batching parameters, but we're interested in adding this feature. If you need this capability, let us know on slack! [ClickHouse inserts need to be batched for optimal performance](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance). Moose automatically handles this optimization internally, ensuring your data is efficiently written to ClickHouse without any configuration required. ## Data Flow Example Here's how data flows through the automatic sync process: ```ts filename="DataFlow.ts" copy // 1. Data sent to ingestion API fetch('http://localhost:4000/ingest/events', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ id: 'evt_123', userId: 'user_456', timestamp: '2024-01-15T10:30:00Z', eventType: 'click' }) }) // 2. API validates and writes to stream // 3. Background sync process batches stream data // 4. Batch automatically written to ClickHouse table when: // - Batch reaches 100,000 records, OR // - 1 second has elapsed since last flush // 5. Data available for queries in events table sql`SELECT * FROM events WHERE userId = 'user_456';` ``` ## Monitoring and Observability The sync process provides built-in observability within the Moose runtime: - **Batch Insert Logs**: Records successful batch insertions with sizes and offsets - **Error Handling**: Logs transient failures with retry information - **Metrics**: Tracks throughput, batch sizes, and error rates - **Offset Tracking**: Maintains Kafka consumer group offsets for reliability