# Moose Documentation – Python
## Included Files
1. moose/apis.mdx
2. moose/apis/admin-api.mdx
3. moose/apis/analytics-api.mdx
4. moose/apis/auth.mdx
5. moose/apis/ingest-api.mdx
6. moose/apis/openapi-sdk.mdx
7. moose/apis/trigger-api.mdx
8. moose/changelog.mdx
9. moose/configuration.mdx
10. moose/data-modeling.mdx
11. moose/deploying.mdx
12. moose/deploying/configuring-moose-for-cloud.mdx
13. moose/deploying/deploying-on-an-offline-server.mdx
14. moose/deploying/deploying-on-ecs.mdx
15. moose/deploying/deploying-on-kubernetes.mdx
16. moose/deploying/deploying-with-docker-compose.mdx
17. moose/deploying/monitoring.mdx
18. moose/deploying/packaging-moose-for-deployment.mdx
19. moose/deploying/preparing-clickhouse-redpanda.mdx
20. moose/getting-started/from-clickhouse.mdx
21. moose/getting-started/quickstart.mdx
22. moose/help/minimum-requirements.mdx
23. moose/help/troubleshooting.mdx
24. moose/in-your-stack.mdx
25. moose/index.mdx
26. moose/llm-docs.mdx
27. moose/local-dev.mdx
28. moose/mcp-dev-server.mdx
29. moose/metrics.mdx
30. moose/migrate.mdx
31. moose/migrate/lifecycle.mdx
32. moose/migrate/migration-types.mdx
33. moose/moose-cli.mdx
34. moose/olap.mdx
35. moose/olap/apply-migrations.mdx
36. moose/olap/db-pull.mdx
37. moose/olap/external-tables.mdx
38. moose/olap/indexes.mdx
39. moose/olap/insert-data.mdx
40. moose/olap/model-materialized-view.mdx
41. moose/olap/model-table.mdx
42. moose/olap/model-view.mdx
43. moose/olap/planned-migrations.mdx
44. moose/olap/read-data.mdx
45. moose/olap/schema-change.mdx
46. moose/olap/schema-optimization.mdx
47. moose/olap/schema-versioning.mdx
48. moose/olap/supported-types.mdx
49. moose/olap/ttl.mdx
50. moose/reference/py-moose-lib.mdx
51. moose/reference/ts-moose-lib.mdx
52. moose/streaming.mdx
53. moose/streaming/connect-cdc.mdx
54. moose/streaming/consumer-functions.mdx
55. moose/streaming/create-stream.mdx
56. moose/streaming/dead-letter-queues.mdx
57. moose/streaming/from-your-code.mdx
58. moose/streaming/schema-registry.mdx
59. moose/streaming/sync-to-table.mdx
60. moose/streaming/transform-functions.mdx
61. moose/workflows.mdx
62. moose/workflows/cancel-workflow.mdx
63. moose/workflows/define-workflow.mdx
64. moose/workflows/retries-and-timeouts.mdx
65. moose/workflows/schedule-workflow.mdx
66. moose/workflows/trigger-workflow.mdx
## Moose APIs
Source: moose/apis.mdx
Create type-safe ingestion and analytics APIs for data access and integration
# Moose APIs
## Overview
The APIs module provides standalone HTTP endpoints for data ingestion and analytics. Unlike other modules of the MooseStack, APIs are meant to be paired with other MooseStack modules like OLAP tables and streams.
## Core Capabilities
## Basic Examples
### Ingestion API
```py filename="IngestApi.py" copy
from moose_lib import IngestApi
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: str
user_id: str
timestamp: datetime
event_type: str
# Create a standalone ingestion API
user_events_api = IngestApi[UserEvent]("user-events", IngestConfig(destination=event_stream))
# No export needed - Python modules are automatically discovered
```
### Analytics API
```py filename="AnalyticsApi.py" copy
from moose_lib import Api, MooseClient
from pydantic import BaseModel
class Params(BaseModel):
user_id: str
limit: int
class ResultData(BaseModel):
id: str
name: str
email: str
def query_function(client: MooseClient, params: QueryParams) -> list[UserData]:
# Query external service or custom logic using parameter binding
query = "SELECT * FROM user_data WHERE user_id = {user_id} LIMIT {limit}"
return client.query.execute(query, {"user_id": params.user_id, "limit": params.limit})
user_data_api = Api[Params, ResultData]("get-data", query_function)
# No export needed - Python modules are automatically discovered
```
---
## apis/admin-api
Source: moose/apis/admin-api.mdx
# Coming Soon
---
## APIs
Source: moose/apis/analytics-api.mdx
APIs for Moose
# APIs
## Overview
APIs are functions that run on your server and automatically exposed as HTTP `GET` endpoints.
They are designed to read data from your OLAP database. Out of the box, these APIs provide:
- Automatic type validation and type conversion for your query parameters, which are sent in the URL, and response body
- Managed database client connection
- Automatic OpenAPI documentation generation
Common use cases include:
- Powering user-facing analytics, dashboards and other front-end components
- Enabling AI tools to interact with your data
- Building custom APIs for your internal tools
### Enabling APIs
Analytics APIs are enabled by default. To explicitly control this feature in your `moose.config.toml`:
```toml filename="moose.config.toml" copy
[features]
apis = true
```
### Basic Usage
`execute` is the recommended way to execute queries. It provides a thin wrapper around the ClickHouse Python client so that you can safely pass `OlapTable` and `Column` objects to your query without needing to worry about ClickHouse identifiers:
```python filename="ExampleApi.py" copy
from moose_lib import Api, MooseClient
from pydantic import BaseModel
## Import the source pipeline
from app.path.to.SourcePipeline import SourcePipeline
# Define the query parameters
class QueryParams(BaseModel):
filter_field: str
max_results: int
# Define the response body
class ResponseBody(BaseModel):
id: int
name: str
value: float
SourceTable = SourcePipeline.get_table()
# Define the route handler function (parameterized)
def run(client: MooseClient, params: QueryParams) -> list[ResponseBody]:
query = """
SELECT
id,
name,
value
FROM {table}
WHERE category = {category}
LIMIT {limit}
"""
return client.query.execute(query, {"table": SourceTable, "category": params.filter_field, "limit": params.max_results})
# Create the API
example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run)
```
Use `execute_raw` with parameter binding for safe, typed queries:
```python filename="ExampleApi.py" copy
from moose_lib import Api, MooseClient
from pydantic import BaseModel
## Import the source pipeline
from app.path.to.SourcePipeline import SourcePipeline
# Define the query parameters
class QueryParams(BaseModel):
filterField: str
maxResults: int
# Define the response body
class ResponseBody(BaseModel):
id: int
name: str
value: float
SourceTable = SourcePipeline.get_table()
# Define the route handler function (using execute_raw with typed parameters)
def run(client: MooseClient, params: QueryParams) -> list[ResponseBody]:
query = """
SELECT
id,
name,
value
FROM Source
WHERE category = {category:String}
LIMIT {limit:UInt32}
"""
return client.query.execute_raw(query, {"category": params.filterField, "limit": params.maxResults})
# Create the API
example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run)
```
```python filename="SourcePipeline.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig, Key
from pydantic import BaseModel
class SourceSchema(BaseModel):
id: Key[int]
name: str
value: float
SourcePipeline = IngestPipeline[SourceSchema]("Source", IngestPipelineConfig(
ingest_api=False,
stream=True,
table=True,
))
```
The `Api` class takes:
- Route name: The URL path to access your API (e.g., `"example_endpoint"`)
- Handler function: Processes requests with typed parameters and returns the result
The generic type parameters specify:
- `QueryParams`: The structure of accepted URL parameters
- `ResponseBody`: The exact shape of your API's response data
You can name these types anything you want. The first type generates validation for query parameters, while the second defines the response structure for OpenAPI documentation.
## Type Validation
You can also model the query parameters and response body as Pydantic models, which Moose will use to provide automatic type validation and type conversion for your query parameters, which are sent in the URL, and response body.
### Modeling Query Parameters
Define your API's parameters as a Pydantic model:
```python filename="ExampleQueryParams.py" copy
from pydantic import BaseModel
from typing import Optional
class QueryParams(BaseModel):
filterField: str = Field(..., description="The field to filter by")
maxResults: int = Field(..., description="The maximum number of results to return")
optionalParam: Optional[str] = Field(None, description="An optional parameter")
```
Moose automatically handles:
- Runtime validation
- Clear error messages for invalid parameters
- OpenAPI documentation generation
Complex nested objects and arrays are not supported. Analytics APIs are `GET` endpoints designed to be simple and lightweight.
### Adding Advanced Type Validation
Moose uses Pydantic for runtime validation. Use Pydantic's `Field` class for more complex validation:
```python filename="ExampleQueryParams.py" copy
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
filterField: str = Field(pattern=r"^(id|name|email)$", description="The field to filter by") ## Only allow valid column names from the UserTable
maxResults: int = Field(gt=0, description="The maximum number of results to return") ## Positive integer
```
### Common Validation Options
```python filename="ValidationExamples.py" copy
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
# Numeric validations
id: int = Field(..., gt=0)
age: int = Field(..., gt=0, lt=120)
price: float = Field(..., gt=0, lt=1000)
discount: float = Field(..., gt=0, multiple_of=0.5)
# String validations
username: str = Field(..., min_length=3, max_length=20)
email: str = Field(..., format="email")
zipCode: str = Field(..., pattern=r"^[0-9]{5}$")
uuid: str = Field(..., format="uuid")
ipAddress: str = Field(..., format="ipv4")
# Date validations
startDate: str = Field(..., format="date")
# Enum validation
status: str = Field(..., enum=["active", "pending", "inactive"])
# Optional parameters
limit: int = Field(None, gt=0, lt=100)
```
For a full list of validation options, see the [Pydantic documentation](https://docs.pydantic.dev/latest/concepts/types/#customizing-validation-with-fields).
### Setting Default Values
You can set default values for parameters by setting values for each parameter in your Pydantic model:
```python filename="ExampleQueryParams.py" copy {9}
from pydantic import BaseModel
class QueryParams(BaseModel):
filterField: str = "example"
maxResults: int = 10
optionalParam: str | None = "default"
```
## Implementing Route Handler
API route handlers are regular functions, so you can implement whatever arbitrary logic you want inside these functions. Most of the time you will be use APIs to expose your data to your front-end applications or other tools:
### Connecting to the Database
Moose provides a managed `MooseClient` to your function execution context. This client provides access to the database and other Moose resources, and handles connection pooling/lifecycle management for you:
```python filename="ExampleApi.py" copy
from moose_lib import MooseClient
from app.UserTable import UserTable
def run(client: MooseClient, params: QueryParams):
# You can use a formatted string for simple static query
query = """
SELECT COUNT(*) FROM {table}
"""
## You can optionally pass the table object to the query
return client.query.execute(query, {"table": UserTable})
## Create the API
example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run)
```
Use `execute_raw` with parameter binding:
```python filename="ExampleApi.py" copy
from moose_lib import MooseClient
def run(params: QueryParams, client: MooseClient):
# Using execute_raw for safe queries
query = """
SELECT COUNT(*) FROM {table: Identifier}
"""
## Must be the name of the table, not the table object
return client.query.execute_raw(query, {"table": UserTable.name})
## Create the API
example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run)
```
### Constructing Safe SQL Queries
```python filename="SafeQueries.py" copy
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
min_age: int = Field(ge=0, le=150)
status: str = Field(pattern=r"^(active|inactive)$")
limit: int = Field(default=10, ge=1, le=1000)
search_text: str = Field(pattern=r'^[a-zA-Z0-9\s]*$')
def run(client: MooseClient, params: QueryParams):
query = """
SELECT *
FROM users
WHERE age >= {min_age}
AND status = '{status}'
AND name ILIKE '%{search_text}%'
LIMIT {limit}
"""
return client.query.execute(query, {"min_age": params.min_age, "status": params.status, "search_text": params.search_text, "limit": params.limit})
```
```python filename="SafeQueries.py" copy
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
min_age: int = Field(ge=0, le=150)
status: str = Field(pattern=r"^(active|inactive)$")
limit: int = Field(default=10, ge=1, le=1000)
search_text: str = Field(pattern=r'^[a-zA-Z0-9\s]*$')
def run(client: MooseClient, params: QueryParams):
query = """
SELECT *
FROM users
WHERE age >= {minAge:UInt32}
AND status = {status:String}
AND name ILIKE {searchPattern:String}
LIMIT {limit:UInt32}
"""
return client.query.execute_raw(query, {
"minAge": params.min_age,
"status": params.status,
"searchPattern": f"%{params.search_text}%",
"limit": params.limit
})
```
#### Basic Query Parameter Interpolation
#### Table and Column References
```python filename="ValidatedQueries.py" copy
from moose_lib import Api, MooseClient
from pydantic import BaseModel, Field, constr
from typing import Literal, Optional
from enum import Enum
from app.UserTable import UserTable
class QueryParams(BaseModel):
# When using f-strings, we need extremely strict validation
column: str = Field(pattern=r"^(id|name|email)$", description="Uses a regex pattern to only allow valid column names")
search_term: str = Field(
pattern=r'^[\w\s\'-]{1,50}$', # Allows letters, numbers, spaces, hyphens, apostrophes; Does not allow special characters that could be used in SQL injection
strip_whitespace=True,
min_length=1,
max_length=50
)
limit: int = Field(
default=10,
ge=1,
le=100,
description="Number of results to return"
)
def run(client: MooseClient, params: QueryParams):
query = """
SELECT {column}
FROM {table}
WHERE name ILIKE '%{search_term}%'
LIMIT {limit}
"""
return client.query.execute(query, {"column": UserTable.cols[params.column], "table": UserTable, "search_term": params.search_term, "limit": params.limit})
```
```python filename="UserTable.py" copy
from moose_lib import OlapTable, Key
from pydantic import BaseModel
class UserSchema(BaseModel):
id: Key[int]
name: str
email: str
UserTable = OlapTable[UserSchema]("users")
```
### Advanced Query Patterns
#### Dynamic Column & Table Selection
```python filename="DynamicColumns.py" copy
from app.UserTable import UserTable
class QueryParams(BaseModel):
colName: str = Field(pattern=r"^(id|name|email)$", description="Uses a regex pattern to only allow valid column names from the UserTable")
class QueryResult(BaseModel):
id: Optional[int]
name: Optional[str]
email: Optional[str]
def run(client: MooseClient, params: QueryParams):
# Put column and table in the dict for variables
query = "SELECT {column} FROM {table}"
return client.query.execute(query, {"column": UserTable.cols[params.colName], "table": UserTable})
## Create the API
bar = Api[QueryParams, QueryResult](name="bar", query_function=run)
## Call the API
## HTTP Request: GET http://localhost:4000/api/bar?colName=id
## EXECUTED QUERY: SELECT id FROM users
```
```python filename="UserTable.py" copy
from moose_lib import OlapTable, Key
from pydantic import BaseModel
class UserSchema(BaseModel):
id: Key[int]
name: str
email: str
UserTable = OlapTable[UserSchema]("users")
```
#### Conditional `WHERE` Clauses
Build `WHERE` clauses based on provided parameters:
```python filename="ConditionalColumns.py" copy
class FilterParams(BaseModel):
min_age: Optional[int]
status: Optional[str] = Field(pattern=r"^(active|inactive)$")
search_text: Optional[str] = Field(pattern=r"^[a-zA-Z0-9\s]+$", description="Alphanumeric search text without special characters to prevent SQL injection")
class QueryResult(BaseModel):
id: int
name: str
email: str
def build_query(client: MooseClient, params: FilterParams) -> QueryResult:
# Using f-strings with validated parameters
conditions = []
if params.min_age:
conditions.append("age >= {min_age}")
parameters["min_age"] = params.min_age
if params.status:
conditions.append("status = {status}")
parameters["status"] = params.status
if params.search_text:
conditions.append("(name ILIKE {search_text} OR email ILIKE {search_text})")
parameters["search_text"] = params.search_text
where_clause = f" WHERE {' AND '.join(conditions)}" if conditions else ""
query = f"""SELECT * FROM users {where_clause} ORDER BY created_at DESC"""
return client.query.execute(query, parameters)
## Create the API
bar = Api[FilterParams, QueryResult](name="bar", query_function=build_query)
## Call the API
## HTTP Request: GET http://localhost:4000/api/bar?min_age=20&status=active&search_text=John
## EXECUTED QUERY: SELECT * FROM users WHERE age >= 20 AND status = 'active' AND (name ILIKE '%John%' OR email ILIKE '%John%') ORDER BY created_at DESC
```
### Adding Authentication
Moose supports authentication via JSON web tokens (JWTs). When your client makes a request to your Analytics API, Moose will automatically parse the JWT and pass the **authenticated** payload to your handler function as the `jwt` object:
```python filename="Authentication.py" copy
def run(client: MooseClient, params: QueryParams, jwt: dict):
# Use parameter binding with JWT data
query = """SELECT * FROM userReports WHERE user_id = {user_id} LIMIT 5"""
return client.query.execute(query, {"user_id": jwt["userId"]})
```
Moose validates the JWT signature and ensures the JWT is properly formatted. If the JWT authentication fails, Moose will return a `401 Unauthorized error`.
## Understanding Response Codes
Moose automatically provides standard HTTP responses:
| Status Code | Meaning | Response Body |
|-------------|-------------------------|---------------------------------|
| 200 | Success | Your API's result data |
| 400 | Validation error | `{ "error": "Detailed message"}`|
| 401 | Unauthorized | `{ "error": "Unauthorized"}` |
| 500 | Internal server error | `{ "error": "Internal server error"}` |
## Post-Processing Query Results
After executing your database query, you can transform the data before returning it to the client. This allows you to:
```python filename="PostProcessingExample.py" copy
from datetime import datetime
from moose_lib import Api
from pydantic import BaseModel
class QueryParams(BaseModel):
category: str
max_results: int = 10
class ResponseItem(BaseModel):
itemId: int
displayName: str
formattedValue: str
isHighValue: bool
date: str
def run(client: MooseClient, params: QueryParams):
# 1. Fetch raw data using parameter binding
query = """
SELECT id, name, value, timestamp
FROM data_table
WHERE category = {category}
LIMIT {limit}
"""
raw_results = client.query.execute(query, {"category": params.category, "limit": params.max_results})
# 2. Post-process the results
processed_results = []
for row in raw_results:
processed_results.append(ResponseItem(
# Transform field names
itemId=row['id'],
displayName=row['name'].upper(),
# Add derived fields
formattedValue=f"${row['value']:.2f}",
isHighValue=row['value'] > 1000,
# Format dates
date=datetime.fromisoformat(row['timestamp']).date().isoformat()
))
return processed_results
# Create the API
process_data_api = Api[QueryParams, ResponseItem](name="process_data_endpoint", query_function=run)
```
### Best Practices
While post-processing gives you flexibility, remember that database operations are typically more efficient for heavy data manipulation. Reserve post-processing for transformations that are difficult to express in SQL or that involve application-specific logic.
## Client Integration
By default, all API endpoints are automatically integrated with OpenAPI/Swagger documentation. You can integrate your OpenAPI SDK generator of choice to generate client libraries for your APIs.
Please refer to the [OpenAPI](/moose/apis/open-api-sdk) page for more information on how to integrate your APIs with OpenAPI.
---
## API Authentication & Security
Source: moose/apis/auth.mdx
Secure your Moose API endpoints with JWT tokens or API keys
# API Authentication & Security
Moose supports two authentication mechanisms for securing your API endpoints:
- **[API Keys](#api-key-authentication)** - Simple, static authentication for internal applications and getting started
- **[JWT (JSON Web Tokens)](#jwt-authentication)** - Token-based authentication for integration with existing identity providers
Choose the method that fits your use case, or use both together with custom configuration.
## Do you want to use API Keys?
API keys are the simplest way to secure your Moose endpoints. They're ideal for:
- Internal applications and microservices
- Getting started quickly with authentication
- Scenarios where you control both client and server
### How API Keys Work
API keys use PBKDF2 HMAC SHA256 hashing for secure storage. You generate a token pair (plain-text and hashed) using the Moose CLI, store the hashed version in environment variables, and send the plain-text version in your request headers.
### Step 1: Generate API Keys
Generate tokens and hashed keys using the Moose CLI:
```bash
moose generate hash-token
```
**Output:**
- **ENV API Keys**: Hashed key for environment variables (use this in your server configuration)
- **Bearer Token**: Plain-text token for client applications (use this in `Authorization` headers)
Use the **hashed key** for environment variables and `moose.config.toml`. Use the **plain-text token** in your `Authorization: Bearer token` headers.
### Step 2: Configure API Keys with Environment Variables
Set environment variables with the **hashed** API keys you generated:
```bash
# For ingest endpoints
export MOOSE_INGEST_API_KEY='your_pbkdf2_hmac_sha256_hashed_key'
# For analytics endpoints
export MOOSE_CONSUMPTION_API_KEY='your_pbkdf2_hmac_sha256_hashed_key'
# For admin endpoints
export MOOSE_ADMIN_TOKEN='your_plain_text_token'
```
Or set the admin API key in `moose.config.toml`:
```toml filename="moose.config.toml"
[authentication]
admin_api_key = "your_pbkdf2_hmac_sha256_hashed_key"
```
Storing the `admin_api_key` (which is a PBKDF2 HMAC SHA256 hash) in your `moose.config.toml` file is an acceptable practice, even if the file is version-controlled. This is because the actual plain-text Bearer token (the secret) is not stored. The hash is computationally expensive to reverse, ensuring that your secret is not exposed in the codebase.
### Step 3: Make Authenticated Requests
All authenticated requests require the `Authorization` header with the **plain-text token**:
```bash
# Using curl
curl -H "Authorization: Bearer your_plain_text_token_here" \
https://your-moose-instance.com/ingest/YourDataModel
# Using JavaScript
fetch('https://your-moose-instance.com/api/endpoint', {
headers: {
'Authorization': 'Bearer your_plain_text_token_here'
}
})
```
## Do you want to use JWTs?
JWT authentication integrates with existing identity providers and follows standard token-based authentication patterns. Use JWTs when:
- You have an existing identity provider (Auth0, Okta, etc.)
- You need user-specific authentication and authorization
- You want standard OAuth 2.0 / OpenID Connect flows
### How JWT Works
Moose validates JWT tokens using RS256 algorithm with your identity provider's public key. You configure the expected issuer and audience, and Moose handles token verification automatically.
### Step 1: Configure JWT Settings
#### Option A: Configure in `moose.config.toml`
```toml filename=moose.config.toml
[jwt]
# Your JWT public key (PEM-formatted RSA public key)
secret = """
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAy...
-----END PUBLIC KEY-----
"""
# Expected JWT issuer
issuer = "https://my-auth-server.com/"
# Expected JWT audience
audience = "my-moose-app"
```
The `secret` field should contain your JWT **public key** used to verify signatures using RS256 algorithm.
#### Option B: Configure with Environment Variables
You can also set these values as environment variables:
```bash filename=".env" copy
MOOSE_JWT_PUBLIC_KEY=your_jwt_public_key # PEM-formatted RSA public key (overrides `secret` in `moose.config.toml`)
MOOSE_JWT_ISSUER=your_jwt_issuer # Expected JWT issuer (overrides `issuer` in `moose.config.toml`)
MOOSE_JWT_AUDIENCE=your_jwt_audience # Expected JWT audience (overrides `audience` in `moose.config.toml`)
```
### Step 2: Make Authenticated Requests
Send requests with the JWT token in the `Authorization` header:
```bash
# Using curl
curl -H "Authorization: Bearer your_jwt_token_here" \
https://your-moose-instance.com/ingest/YourDataModel
# Using JavaScript
fetch('https://your-moose-instance.com/api/endpoint', {
headers: {
'Authorization': 'Bearer your_jwt_token_here'
}
})
```
## Want to use both? Here's the caveats
You can configure both JWT and API Key authentication simultaneously. When both are configured, Moose's authentication behavior depends on the `enforce_on_all_*` flags.
### Understanding Authentication Priority
#### Default Behavior (No Enforcement)
By default, when both JWT and API Keys are configured, Moose tries JWT validation first, then falls back to API Key validation:
```toml filename="moose.config.toml"
[jwt]
# JWT configuration
secret = "..."
issuer = "https://my-auth-server.com/"
audience = "my-moose-app"
# enforce flags default to false
```
```bash filename=".env"
# API Key configuration
MOOSE_INGEST_API_KEY='your_pbkdf2_hmac_sha256_hashed_key'
MOOSE_CONSUMPTION_API_KEY='your_pbkdf2_hmac_sha256_hashed_key'
```
**For Ingest Endpoints (`/ingest/*`)**:
- Attempts JWT validation first (RS256 signature check)
- Falls back to API Key validation (PBKDF2 HMAC SHA256) if JWT fails
**For Analytics Endpoints (`/api/*`)**:
- Same fallback behavior as ingest endpoints
This allows you to use either authentication method for your clients.
#### Enforcing JWT Only
If you want to **only** accept JWT tokens (no API key fallback), set the enforcement flags:
```toml filename="moose.config.toml"
[jwt]
secret = "..."
issuer = "https://my-auth-server.com/"
audience = "my-moose-app"
# Only accept JWT, no API key fallback
enforce_on_all_ingest_apis = true
enforce_on_all_consumptions_apis = true
```
**Result**: When enforcement is enabled, API Key authentication is disabled even if the environment variables are set. Only valid JWT tokens will be accepted.
### Common Use Cases
#### Use Case 1: Different Auth for Different Endpoints
Configure JWT for user-facing analytics endpoints, API keys for internal ingestion:
```toml filename="moose.config.toml"
[jwt]
secret = "..."
issuer = "https://my-auth-server.com/"
audience = "my-moose-app"
enforce_on_all_consumptions_apis = true # JWT only for /api/*
enforce_on_all_ingest_apis = false # Allow fallback for /ingest/*
```
```bash filename=".env"
MOOSE_INGEST_API_KEY='hashed_key_for_internal_services'
```
#### Use Case 2: Migration from API Keys to JWT
Start with both configured, no enforcement. Gradually migrate clients to JWT. Once complete, enable enforcement:
```toml filename="moose.config.toml"
[jwt]
secret = "..."
issuer = "https://my-auth-server.com/"
audience = "my-moose-app"
# Start with both allowed during migration
enforce_on_all_ingest_apis = false
enforce_on_all_consumptions_apis = false
# Later, enable to complete migration
# enforce_on_all_ingest_apis = true
# enforce_on_all_consumptions_apis = true
```
### Admin Endpoints
Admin endpoints use API key authentication exclusively (configured separately from ingest/analytics endpoints).
**Configuration precedence** (highest to lowest):
1. `--token` CLI parameter (plain-text token)
2. `MOOSE_ADMIN_TOKEN` environment variable (plain-text token)
3. `admin_api_key` in `moose.config.toml` (hashed token)
**Example:**
```bash
# Option 1: CLI parameter
moose remote plan --token your_plain_text_token
# Option 2: Environment variable
export MOOSE_ADMIN_TOKEN='your_plain_text_token'
moose remote plan
# Option 3: Config file
# In moose.config.toml:
# [authentication]
# admin_api_key = "your_pbkdf2_hmac_sha256_hashed_key"
```
## Security Best Practices
- **Never commit plain-text tokens to version control** - Always use hashed keys in configuration files
- **Use environment variables for production** - Keep secrets out of your codebase
- **Generate unique tokens for different environments** - Separate development, staging, and production credentials
- **Rotate tokens regularly** - Especially for long-running production deployments
- **Choose the right method for your use case**:
- Use **API Keys** for internal services and getting started
- Use **JWT** when integrating with identity providers or need user-level auth
- **Store hashed keys safely** - The PBKDF2 HMAC SHA256 hash in `moose.config.toml` is safe to version control, but the plain-text token should only exist in secure environment variables or secret management systems
Never commit plain-text tokens to version control. Use hashed keys in configuration files and environment variables for production.
---
## Ingestion APIs
Source: moose/apis/ingest-api.mdx
Ingestion APIs for Moose
# Ingestion APIs
## Overview
Moose Ingestion APIs are the entry point for getting data into your Moose application. They provide a fast, reliable, and type-safe way to move data from your sources into streams and tables for analytics and processing.
## When to Use Ingestion APIs
Ingestion APIs are most useful when you want to implement a push-based pattern for getting data from your data sources into your streams and tables. Common use cases include:
- Instrumenting external client applications
- Receiving webhooks from third-party services
- Integrating with ETL or data pipeline tools that push data
## Why Use Moose's APIs Over Your Own?
Moose's ingestion APIs are purpose-built for high-throughput data pipelines, offering key advantages over other more general-purpose frameworks:
- **Built-in schema validation:** Ensures only valid data enters your pipeline.
- **Direct connection to streams/tables:** Instantly link HTTP endpoints to Moose data infrastructure to route incoming data to your streams and tables without any glue code.
- **Dead Letter Queue (DLQ) support:** Invalid records are automatically captured for review and recovery.
- **OpenAPI auto-generation:** Instantly generate client SDKs and docs for all endpoints, including example data.
- **Rust-powered performance:** Far higher throughput and lower latency than typical Node.js or Python APIs.
## Validation
Moose validates all incoming data against your Pydantic model. If a record fails validation, Moose can automatically route it to a Dead Letter Queue (DLQ) for later inspection and recovery.
```python filename="ValidationExample.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig, IngestConfig
from pydantic import BaseModel
class Properties(BaseModel):
device: Optional[str]
version: Optional[int]
class ExampleModel(BaseModel):
id: str
userId: str
timestamp: datetime
properties: Properties
api = IngestApi[ExampleModel]("your-api-route", IngestConfig(
destination=Stream[ExampleModel]("your-stream-name"),
dead_letter_queue=DeadLetterQueue[ExampleModel]("your-dlq-name")
))
```
If your IngestPipeline’s schema marks a field as optional but annotates a ClickHouse default, Moose treats:
- API request and Stream message: field is optional (you may omit it)
- ClickHouse table storage: field is required with a DEFAULT clause
Behavior: When the API/stream inserts into ClickHouse and the field is missing, ClickHouse sets it to the configured default value. This keeps request payloads simple while avoiding Nullable columns in storage.
Example:
`Annotated[int, clickhouse_default("18")]` (or equivalent annotation)
Send a valid event - routed to the destination stream
```python filename="ValidEvent.py" copy
requests.post("http://localhost:4000/ingest/your-api-route", json={
"id": "event1",
"userId": "user1",
"timestamp": "2023-05-10T15:30:00Z"
})
# ✅ Accepted and routed to the destination stream
# API returns 200 and { success: true }
```
Send an invalid event (missing required field) - routed to the DLQ
```python filename="InvalidEventMissingField.py" copy
requests.post("http://localhost:4000/ingest/your-api-route", json={
"id": "event1",
})
# ❌ Routed to DLQ, because it's missing a required field
# API returns 400 response
```
Send an invalid event (bad date format) - routed to the DLQ
```python filename="InvalidEventBadDate.py" copy
requests.post("http://localhost:4000/ingest/your-api-route", json={
"id": "event1",
"userId": "user1",
"timestamp": "not-a-date"
})
# ❌ Routed to DLQ, because the timestamp is not a valid date
# API returns 400 response
```
## Creating Ingestion APIs
You can create ingestion APIs in two ways:
- **High-level:** Using the `IngestPipeline` class (recommended for most use cases)
- **Low-level:** Manually configuring the `IngestApi` component for more granular control
### High-level: IngestPipeline (Recommended)
The `IngestPipeline` class provides a convenient way to set up ingestion endpoints, streams, and tables with a single declaration:
```python filename="IngestPipeline.py" copy
from moose_lib import Key, IngestPipeline, IngestPipelineConfig
from pydantic import BaseModel
class ExampleSchema(BaseModel):
id: Key[str]
name: str
value: int
timestamp: datetime
example_pipeline = IngestPipeline[ExampleSchema](
name="example-name",
config=IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True
)
)
```
### Low-level: Standalone IngestApi
For more granular control, you can manually configure the `IngestApi` component:
The types of the destination `Stream` and `Table` must match the type of the `IngestApi`.
## Configuration Reference
Configuration options for both high-level and low-level ingestion APIs are provided below.
```python filename="IngestPipelineConfig.py" copy
class IngestPipelineConfig(BaseModel):
table: bool | OlapConfig = True
stream: bool | StreamConfig = True
ingest_api: bool | IngestConfig = True
dead_letter_queue: bool | StreamConfig = True
version: Optional[str] = None
metadata: Optional[dict] = None
life_cycle: Optional[LifeCycle] = None
```
```python filename="IngestConfig.py" copy
@dataclass
class IngestConfigWithDestination[T: BaseModel]:
destination: Stream[T]
dead_letter_queue: Optional[DeadLetterQueue[T]] = None
version: Optional[str] = None
metadata: Optional[dict] = None
```
---
## OpenAPI SDK Generation
Source: moose/apis/openapi-sdk.mdx
Generate type-safe client SDKs from your Moose APIs
# OpenAPI SDK Generation
Moose automatically generates OpenAPI specifications for all your APIs, enabling you to create type-safe client SDKs in any language. This allows you to integrate your Moose APIs into any application with full type safety and IntelliSense support.
## Overview
While `moose dev` is running, Moose emits an OpenAPI spec at `.moose/openapi.yaml` covering:
- **Ingestion endpoints** with request/response schemas
- **Analytics APIs** with query parameters and response types
Every time you make a change to your Moose APIs, the OpenAPI spec is updated automatically.
## Generating Typed SDKs from OpenAPI
You can use your preferred generator to create a client from that spec. Below are minimal, tool-agnostic examples you can copy into your project scripts.
### Setup
The following example uses `openapi-python-client` to generate the SDK. Follow the setup instructions here: [openapi-python-client on PyPI](https://pypi.org/project/openapi-python-client/).
Add a generation script in your repository:
```bash filename="scripts/generate_python_sdk.sh" copy
#!/usr/bin/env bash
set -euo pipefail
openapi-python-client generate --path .moose/openapi.yaml --output ./generated/python --overwrite
```
Then configure Moose to run it after each dev reload:
```toml filename="moose.config.toml" copy
[http_server_config]
on_reload_complete_script = "bash scripts/generate_python_sdk.sh"
```
This will regenerate the Python client from the live spec on every reload.
### Hooks for automatic SDK generation
The `on_reload_complete_script` hook is available in your `moose.config.toml` file. It runs after each dev server reload when code/infra changes have been fully applied. This allows you to keep your SDKs continuously up to date as you make changes to your Moose APIs.
Notes:
- The script runs in your project root using your `$SHELL` (falls back to `/bin/sh`).
- Paths like `.moose/openapi.yaml` and `./generated/...` are relative to the project root.
- You can combine multiple generators with `&&` (as shown) or split into a shell script if preferred.
These hooks only affect local development (`moose dev`). The reload hook runs after Moose finishes applying your changes, ensuring `.moose/openapi.yaml` is fresh before regeneration.
## Integration
Import from the output path your generator writes to (see your chosen example repo). The Moose side is unchanged: the spec lives at `.moose/openapi.yaml` during `moose dev`.
## Generators
Use any OpenAPI-compatible generator:
### TypeScript projects
- [OpenAPI Generator (typescript-fetch)](https://github.com/OpenAPITools/openapi-generator) — mature, broad options; generates Fetch-based client
- [Kubb](https://github.com/kubb-project/kubb) — generates types + fetch client with simple config
- [Orval](https://orval.dev/) — flexible output (client + schemas), good DX
- [openapi-typescript](https://github.com/openapi-ts/openapi-typescript) — generates types only (pair with your own client)
- [swagger-typescript-api](https://github.com/acacode/swagger-typescript-api) — codegen for TS clients from OpenAPI
- [openapi-typescript-codegen](https://github.com/ferdikoomen/openapi-typescript-codegen) — TS client + models
- [oazapfts](https://github.com/oazapfts/oazapfts) — minimal TS client based on fetch
- [openapi-zod-client](https://github.com/astahmer/openapi-zod-client) — Zod schema-first client generation
### Python projects
- [openapi-python-client](https://pypi.org/project/openapi-python-client/) — modern typed client for OpenAPI 3.0/3.1
- [OpenAPI Generator (python)](https://github.com/OpenAPITools/openapi-generator) — multiple Python generators (python, python-nextgen)
---
## Trigger APIs
Source: moose/apis/trigger-api.mdx
Create APIs that trigger workflows and other processes
# Trigger APIs
## Overview
You can create APIs to initiate workflows, data processing jobs, or other automated processes.
## Basic Usage
```python filename="app/apis/trigger_workflow.py" copy
from moose_lib import MooseClient, Api
from pydantic import BaseModel, Field
from datetime import datetime
class WorkflowParams(BaseModel):
input_value: str
priority: str = Field(default="normal")
class WorkflowResponse(BaseModel):
workflow_id: str
status: str
def run(params: WorkflowParams, client: MooseClient) -> WorkflowResponse:
# Trigger the workflow with input parameters
workflow_execution = client.workflow.execute(
workflow="data-processing",
params={
"input_value": params.input_value,
"priority": params.priority,
"triggered_at": datetime.now().isoformat()
}
)
return WorkflowResponse(
workflow_id=workflow_execution.id,
status="started"
)
api = Api[WorkflowParams, WorkflowResponse]("trigger-workflow", run)
```
## Using the Trigger API
Once deployed, you can trigger workflows via HTTP requests:
```bash filename="Terminal" copy
curl "http://localhost:4000/api/trigger-workflow?inputValue=process-user-data&priority=high"
```
Response:
```json
{
"workflowId": "workflow-12345",
"status": "started"
}
```
---
## changelog
Source: moose/changelog.mdx
ReleaseHighlights,
Added,
Changed,
Deprecated,
Fixed,
Security,
BreakingChanges
} from "@/components/changelog-category";
# Moose Changelog
## What is this page?
This changelog tracks all meaningful changes to Moose. Each entry includes the PR link and contributor credit, organized by date (newest first). Use this page to stay informed about new features, fixes, and breaking changes that might affect your projects.
## How to read this changelog:
Key features, enhancements, or fixes for each release.
New features and capabilities.
Updates to existing functionality or improvements.
Features that are no longer recommended for use and may be removed in the future.
Bug fixes and reliability improvements.
Changes that require user action or may break existing usage.
---
## 2025-08-21
- **Analytics API Standardization** — Standardized naming of classes and utilities for analytics APIs
- **Complete S3Queue Engine Support** — Full implementation of ClickHouse S3Queue engine with comprehensive parameter support, modular architecture, and generic settings framework.
- **S3Queue Engine Configuration** — Added complete support for all ClickHouse S3Queue engine parameters including authentication (`aws_access_key_id`, `aws_secret_access_key`), compression, custom headers, and NOSIGN for public buckets. [PR #2674](https://github.com/514-labs/moosestack/pull/2674)
- **Generic Settings Framework** — Introduced a flexible settings system that allows any engine to use configuration settings, laying groundwork for future engine implementations.
- **Enhanced Documentation** — Added comprehensive documentation for OlapTable S3Queue configuration in both TypeScript and Python SDKs.
- **Improved Architecture** — Moved ClickHouse-specific types from core infrastructure to ClickHouse module for better separation of concerns.
- **Settings Location** — Engine-specific settings are now properly encapsulated within their respective engine configurations (e.g., `s3QueueEngineConfig.settings` for S3Queue).
- **API Consistency** — Unified configuration APIs across TypeScript and Python SDKs for S3Queue engine.
- **Compilation Issues** — Fixed struct patterns to handle new S3Queue parameter structure correctly.
- **Diff Strategy** — Enhanced diff strategy to properly handle S3Queue parameter changes.
- `ConsumptionApi` renamed to `Api`
- `EgressConfig` renamed to `ApiConfig`
- `ConsumptionUtil` renamed to `ApiUtil`
- `ConsumptionHelpers` renamed to `ApiHelpers`
*[#2676](https://github.com/514-labs/moosestack/pull/2676) by [camelCasedAditya](https://github.com/camelCasedAditya)*
---
## 2025-08-20
- **Improved IngestPipeline API Clarity** — The confusing `ingest` parameter has been renamed to `ingestApi` (TypeScript) and `ingest_api` (Python) for better clarity. The old parameter names are still supported with deprecation warnings.
- **IngestPipeline Parameter Renamed** — The `ingest` parameter in IngestPipeline configurations has been renamed for clarity:
- **TypeScript**: `ingest: true` → `ingestApi: true`
- **Python**: `ingest=True` → `ingest_api=True`
The old parameter names continue to work with deprecation warnings to ensure backwards compatibility. *[Current PR]*
- **IngestPipeline `ingest` parameter** — The `ingest` parameter in IngestPipeline configurations is deprecated:
- **TypeScript**: Use `ingestApi` instead of `ingest`
- **Python**: Use `ingest_api` instead of `ingest`
The old parameter will be removed in a future major version. Please update your code to use the new parameter names. *[Current PR]*
None - Full backwards compatibility maintained
---
## 2025-06-12
- **Enhanced TypeScript Workflow Types** — Improved type safety for Tasks with optional input/output parameters, supporting `null` types for better flexibility.
- TypeScript workflow Task types now properly support optional input/output with `null` types, enabling more flexible task definitions like `Task` and `Task`.
*[#2442](https://github.com/514-labs/moose/pull/2442) by [DatGuyJonathan](https://github.com/DatGuyJonathan)*
None
---
## 2025-06-10
- **OlapTable Direct Insert API** — New comprehensive insert API with advanced error handling, typia validation, and multiple failure strategies. Enables direct data insertion into ClickHouse tables with production-ready reliability features.
- **Python Workflows V2** — Replaces static file-based routing with explicit `Task` and `Workflow` classes, enabling dynamic task composition and programmatic workflow orchestration. No more reliance on `@task` decorators or file naming conventions.
- OlapTable direct insert API with `insert()` method supporting arrays and Node.js streams. Features comprehensive typia-based validation, three error handling strategies (`fail-fast`, `discard`, `isolate`), configurable error thresholds, memoized ClickHouse connections, and detailed insertion results with failed record tracking.
*[#2437](https://github.com/514-labs/moose/pull/2437) by [callicles](https://github.com/callicles)*
- Enhanced typia validation integration for OlapTable and IngestPipeline with `validateRecord()`, `isValidRecord()`, and `assertValidRecord()` methods providing compile-time type safety and runtime validation.
*[#2437](https://github.com/514-labs/moose/pull/2437) by [callicles](https://github.com/callicles)*
- Python Workflows V2 with `Task[InputType, OutputType]` and `Workflow` classes for dynamic workflow orchestration. Replaces the legacy `@task` decorator approach with explicit task definitions, enabling flexible task composition, type-safe chaining via `on_complete`, retries, timeouts, and scheduling with cron expressions.
*[#2439](https://github.com/514-labs/moose/pull/2439) by [DatGuyJonathan](https://github.com/DatGuyJonathan)*
None
---
## 2025-06-06
- **TypeScript Workflows V2** — Replaces static file-based routing with explicit `Task` and `Workflow` classes, enabling dynamic task composition and programmatic workflow orchestration. No more reliance on file naming conventions for task execution order.
- TypeScript Workflows V2 with `Task` and `Workflow` classes for dynamic workflow orchestration. Replaces the legacy file-based routing approach with explicit task definitions, enabling flexible task composition, type-safe chaining via `onComplete`, configurable retries and timeouts, and flexible scheduling with cron expressions.
*[#2421](https://github.com/514-labs/moose/pull/2421) by [DatGuyJonathan](https://github.com/DatGuyJonathan)*
None
---
## 2025-05-23
- **TypeScript `DeadLetterQueue` support** — Handle failed streaming function messages with type-safe dead letter queues in TypeScript.
- **Improved Python `DeadLetterModel` API** — Renamed `as_t` to `as_typed` for better clarity.
- TypeScript `DeadLetterQueue` class with type guards and transform methods for handling failed streaming function messages with full type safety.
*[#2356](https://github.com/514-labs/moose/pull/2356) by [phiSgr](https://github.com/phiSgr)*
- Renamed `DeadLetterModel.as_t()` to `DeadLetterModel.as_typed()` in Python for better API clarity and consistency.
*[#2356](https://github.com/514-labs/moose/pull/2356) by [phiSgr](https://github.com/phiSgr)*
- `DeadLetterModel.as_t()` method renamed to `as_typed()` in Python. Update your code to use the new method name.
*[#2356](https://github.com/514-labs/moose/pull/2356) by [phiSgr](https://github.com/phiSgr)*
---
## 2025-05-22
- **Refactored CLI 'peek' command** — Now supports peeking into both tables and streams with unified parameters.
- **Simplified CLI experience** — Removed unused commands and routines for a cleaner interface.
- Updated CLI 'peek' command to use a unified 'name' parameter and new flags (`--table`, `--stream`) to specify resource type. Default is table. Documentation updated to match.
*[#2361](https://github.com/514-labs/moose/pull/2361) by [callicles](https://github.com/callicles)*
- Removed unused CLI commands and routines including `Function`, `Block`, `Consumption`, `DataModel`, and `Import`. CLI is now simpler and easier to maintain.
*[#2360](https://github.com/514-labs/moose/pull/2360) by [callicles](https://github.com/callicles)*
None
---
## 2025-05-21
- **Infrastructure state sync** — Auto-syncs DB state before changes, handling manual modifications and failed DDL runs.
- **Fixed nested data type support** — Use objects and arrays in your Moose models.
- State reconciliation for infrastructure planning — Moose now checks and updates its in-memory infra map to match the real database state before planning changes. Makes infra planning robust to manual DB changes and failed runs.
*[#2341](https://github.com/514-labs/moose/pull/2341) by [callicles](https://github.com/callicles)*
- Handling of nested data structures in Moose models for correct support of complex objects and arrays.
*[#2357](https://github.com/514-labs/moose/pull/2357) by [georgevanderson](https://github.com/georgevanderson)*
None
---
## 2025-05-27
- **IPv4 and IPv6 Type Support** — Added native support for IP address types in ClickHouse data models, enabling efficient storage and querying of network data.
- IPv4 and IPv6 data types for ClickHouse integration, supporting native IP address storage and operations.
*[#2373](https://github.com/514-labs/moose/pull/2373) by [phiSgr](https://github.com/phiSgr)*
- Enhanced type parser to handle IP address types across the Moose ecosystem.
*[#2374](https://github.com/514-labs/moose/pull/2374) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-20
- **ClickHouse `Date` type support** — Store and query native date values in your schemas.
- ClickHouse `Date` column support for native date types in Moose schemas and ingestion.
*[#2352](https://github.com/514-labs/moose/pull/2352), [#2351](https://github.com/514-labs/moose/pull/2351) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-19
- **Metadata map propagation** — Metadata is now tracked and available in the infra map for both Python and TypeScript. Improves LLM accuracy and reliability when working with Moose objects.
- Metadata map propagation to infra map for consistent tracking and availability in both Python and TypeScript.
*[#2326](https://github.com/514-labs/moose/pull/2326) by [georgevanderson](https://github.com/georgevanderson)*
None
---
## 2025-05-16
- **New `list[str]` support to Python `AggregateFunction`** — Enables more flexible aggregation logic in Materialized Views.
- **Python `DeadLetterQueue[T]` alpha release** — Automatically route exceptions to a dead letter queue in streaming functions.
- AggregateFunction in Python now accepts `list[str]` for more expressive and type-safe aggregations.
*[#2321](https://github.com/514-labs/moose/pull/2321) by [phiSgr](https://github.com/phiSgr)*
- Python dead letter queues for handling and retrying failed messages in Python streaming functions.
*[#2324](https://github.com/514-labs/moose/pull/2324) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-15
- **Hotfix — casing fix for `JSON` columns in TypeScript.
- TypeScript JSON columns to have consistent casing, avoiding confusion and errors in your code.
*[#2320](https://github.com/514-labs/moose/pull/2320) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-14
- **Introduced TypeScript JSON columns** — Use `Record` for type-safe JSON fields.
- **Ingestion config simplified** — Less config needed for ingestion setup.
- **Python `enum` support improved** — More robust data models.
- TypeScript ClickHouse JSON columns to use `Record` for type-safe JSON fields.
*[#2317](https://github.com/514-labs/moose/pull/2317) by [phiSgr](https://github.com/phiSgr)*
- Pydantic mixin for parsing integer enums by name for more robust Python data models.
*[#2316](https://github.com/514-labs/moose/pull/2316) by [phiSgr](https://github.com/phiSgr)*
- Better Python enum handling in data models for easier enum usage.
*[#2315](https://github.com/514-labs/moose/pull/2315) by [phiSgr](https://github.com/phiSgr)*
- `IngestionFormat` from `IngestApi` config for simpler ingestion setup.
*[#2306](https://github.com/514-labs/moose/pull/2306) by [georgevanderson](https://github.com/georgevanderson)*
None
---
## 2025-05-13
- **New `refresh` CLI command** — Quickly reload data and schema changes from changes applied directly to your database outside of Moose.
- **Python: `LowCardinality` type support** — Better performance for categorical data.
- `refresh` command to reload data and schema with a single command.
*[#2309](https://github.com/514-labs/moose/pull/2309) by [phiSgr](https://github.com/phiSgr)*
- Python support for `LowCardinality(T)` to improve performance for categorical columns.
*[#2313](https://github.com/514-labs/moose/pull/2313) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-10
- **Dependency-based execution order for Materialized Views** — Reduces migration errors and improves reliability.
- Order changes for materialized views based on dependency to ensure correct execution order for dependent changes.
*[#2294](https://github.com/514-labs/moose/pull/2294) by [callicles](https://github.com/callicles)*
None
---
## 2025-05-07
- **Python `datetime64` support** - Enables more precise datetime handling in Python data models.
- **Type mapping in Python `QueryClient`** - Automatically maps ClickHouse query result rows to the correct Pydantic model types.
- Row parsing in QueryClient with type mapping for Python.
*[#2299](https://github.com/514-labs/moose/pull/2299) by [phiSgr](https://github.com/phiSgr)*
- `datetime64` parsing and row parsing in QueryClient for more reliable data handling in Python.
*[#2299](https://github.com/514-labs/moose/pull/2299) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-06
- **`uint` type support in TypeScript** — Enables type safety for unsigned integer fields in Typescript data models.
- uint type support in TypeScript for unsigned integers in Moose models.
*[#2295](https://github.com/514-labs/moose/pull/2295) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-01
- **Explicit dependency tracking for materialized views** — Improves data lineage, migration reliability, and documentation.
- Explicit dependency tracking for materialized views to make migrations and data lineage more robust and easier to understand.
*[#2282](https://github.com/514-labs/moose/pull/2282) by [callicles](https://github.com/callicles)*
- Required `selectTables` field in `MaterializedView` config that must specify an array of `OlapTable` objects for the source tables.
*[#2282](https://github.com/514-labs/moose/pull/2282) by [callicles](https://github.com/callicles)*
---
## 2025-04-30
- **More flexible `JSON_ARRAY` configuration for `IngestApi`** — Now accepts both arrays and single elements. Default config is now `JSON_ARRAY`.
- **Python rich ClickHouse type support** — Added support for advanced types in Python models:
- `Decimal`: `clickhouse_decimal(precision, scale)`
- `datetime` with precision: `clickhouse_datetime64(precision)`
- `date`: `date`
- `int` with size annotations: `Annotated[int, 'int8']`, `Annotated[int, 'int32']`, etc.
- `UUID`: `UUID`
- `JSON_ARRAY` to allow both array and single element ingestion for more flexible data handling.
*[#2285](https://github.com/514-labs/moose/pull/2285) by [phiSgr](https://github.com/phiSgr)*
- Python rich ClickHouse type support with:
- `Decimal`: `clickhouse_decimal(precision, scale)`
- `datetime` with precision: `clickhouse_datetime64(precision)`
- `date`: `date`
- `int` with size annotations: `Annotated[int, 'int8']`, `Annotated[int, 'int32']`, etc.
- `UUID`: `UUID`
for more expressive data modeling.
*[#2284](https://github.com/514-labs/moose/pull/2284) by [phiSgr](https://github.com/phiSgr)*
None
---
## Configuration
Source: moose/configuration.mdx
Configuration for Moose
# Project Configuration
The `moose.config.toml` file is the primary way to configure all Moose infrastructure including ClickHouse, Redpanda, Redis, Temporal, and HTTP servers.
Do not use docker-compose overrides to modify Moose-managed services. See [Development Mode](/moose/local-dev#extending-docker-infrastructure) for guidelines on when to use docker-compose extensions.
```toml
# Programming language used in the project (`Typescript` or `Python`)
language = "Typescript"
# Map of supported old versions and their locations (Default: {})
# supported_old_versions = { "0.1.0" = "path/to/old/version" }
#Telemetry configuration for usage tracking and metrics
[telemetry]
# Whether telemetry collection is enabled
enabled = true
# Whether to export metrics to external systems
export_metrics = true
# Flag indicating if the user is a Moose developer
is_moose_developer = false
# Redpanda streaming configuration (also aliased as `kafka_config`)
[redpanda_config]
# Broker connection string (e.g., "host:port") (Default: "localhost:19092")
broker = "localhost:19092"
# Confluent Schema Registry URL (optional)
# schema_registry_url = "http://localhost:8081"
# Message timeout in milliseconds (Default: 1000)
message_timeout_ms = 1000
# Default retention period in milliseconds (Default: 30000)
retention_ms = 30000
# Replication factor for topics (Default: 1)
replication_factor = 1
# SASL username for authentication (Default: None)
# sasl_username = "user"
# SASL password for authentication (Default: None)
# sasl_password = "password"
# SASL mechanism (e.g., "PLAIN", "SCRAM-SHA-256") (Default: None)
# sasl_mechanism = "PLAIN"
# Security protocol (e.g., "SASL_SSL", "PLAINTEXT") (Default: None)
# security_protocol = "SASL_SSL"
# Namespace for topic isolation (Default: None)
# namespace = "my_namespace"
# ClickHouse database configuration
[clickhouse_config]
# Database name (Default: "local")
db_name = "local"
# ClickHouse user (Default: "panda")
user = "panda"
# ClickHouse password (Default: "pandapass")
password = "pandapass"
# Whether to use SSL for connection (Default: false)
use_ssl = false
# ClickHouse host (Default: "localhost")
host = "localhost"
# ClickHouse HTTP port (Default: 18123)
host_port = 18123
# ClickHouse native protocol port (Default: 9000)
native_port = 9000
# Optional host path to mount as the ClickHouse data volume (uses Docker volume if None) (Default: None)
# host_data_path = "/path/on/host/clickhouse_data"
# Optional list of additional databases to create on startup (Default: [])
# additional_databases = ["analytics", "staging"]
# HTTP server configuration for local development
[http_server_config]
# Host to bind the webserver to (Default: "localhost")
host = "localhost"
# Port for the main API server (Default: 4000)
port = 4000
# Port for the management server (Default: 5001)
management_port = 5001
# Optional path prefix for all routes (Default: None)
# path_prefix = "api"
# Number of worker processes for consumption API cluster (TypeScript only) (Default: Auto-calculated - 70% of CPU cores)
# Python projects always use 1 worker regardless of this setting
# api_workers = 2
# Redis configuration
[redis_config]
# Redis connection URL (Default: "redis://127.0.0.1:6379")
url = "redis://127.0.0.1:6379"
# Namespace prefix for all Redis keys (Default: "MS")
key_prefix = "MS"
# Git configuration
[git_config]
# Name of the main branch (Default: "main")
main_branch_name = "main"
# Temporal workflow configuration
[temporal_config]
# Temporal database user (Default: "temporal")
db_user = "temporal"
# Temporal database password (Default: "temporal")
db_password = "temporal"
# Temporal database port (Default: 5432)
db_port = 5432
# Temporal server host (Default: "localhost")
temporal_host = "localhost"
# Temporal server port (Default: 7233)
temporal_port = 7233
# Temporal server scheme - "http" or "https" (Default: auto-detect based on host)
# temporal_scheme = "https"
# Temporal server version (Default: "1.22.3")
temporal_version = "1.22.3"
# Temporal admin tools version (Default: "1.22.3")
admin_tools_version = "1.22.3"
# Temporal UI version (Default: "2.21.3")
ui_version = "2.21.3"
# Temporal UI port (Default: 8080)
ui_port = 8080
# Temporal UI CORS origins (Default: "http://localhost:3000")
ui_cors_origins = "http://localhost:3000"
# Temporal dynamic config path (Default: "config/dynamicconfig/development-sql.yaml")
config_path = "config/dynamicconfig/development-sql.yaml"
# PostgreSQL version for Temporal database (Default: "13")
postgresql_version = "13"
# Path to Temporal client certificate (mTLS) (Default: "")
client_cert = ""
# Path to Temporal client key (mTLS) (Default: "")
client_key = ""
# Path to Temporal CA certificate (mTLS) (Default: "")
ca_cert = ""
# API key for Temporal Cloud connection (Default: "")
api_key = ""
# JWT (JSON Web Token) authentication configuration (Optional)
[jwt]
# Enforce JWT on all consumption APIs (Default: false)
enforce_on_all_consumptions_apis = false
# Enforce JWT on all ingestion APIs (Default: false)
enforce_on_all_ingest_apis = false
# Secret key for JWT signing (Required if jwt section is present)
# secret = "your-jwt-secret"
# JWT issuer (Required if jwt section is present)
# issuer = "your-issuer-name"
# JWT audience (Required if jwt section is present)
# audience = "your-audience-name"
# General authentication configuration
[authentication]
# Optional hashed admin API key for auth (Default: None)
# admin_api_key = "hashed_api_key"
# Migration configuration
[migration_config]
# Operations to ignore during migration plan generation and drift detection
# Useful for managing TTL changes outside of Moose or when you don't want
# migration failures due to TTL drift
# ignore_operations = ["ModifyTableTtl", "ModifyColumnTtl"]
# Feature flags
[features]
# Enable the streaming engine (Default: true)
streaming_engine = true
# Enable Temporal workflows (Default: false)
workflows = false
# Enable OLAP database (Default: true)
olap = true
# Enable Analytics APIs server (Default: true)
apis = true
```
---
## Data Modeling
Source: moose/data-modeling.mdx
Data Modeling for Moose
# Data Modeling
## Overview
In Moose, data models are just Pydantic models that become the authoritative source for your infrastructure schemas.
Data Models are used to define:
- [OLAP Tables and Materialized Views](/moose/olap) (automatically generated DDL)
- [Redpanda/Kafka Streams](/moose/streaming) (schema registry and topic validation)
- [API Contracts](/moose/apis) (request/response validation and OpenAPI specs)
- [Workflow Task Input and Output Types](/moose/workflows) (typed function inputs/outputs)
## Philosophy
### Problem: Analytical Backends are Prone to Schema Drift
Analytical backends are unique in that they typically have to coordinate schemas across multiple systems that each have their own type systems and constraints.
Consider a typical pipeline for ingesting events into a ClickHouse table.
```python
# What you're building:
# API endpoint → Kafka topic → ClickHouse table → Analytics API
# Traditional approach: Define schema 4 times
# 1. API validation with Pydantic
class APIEvent(BaseModel):
user_id: str
event_type: Literal["click", "view", "purchase"]
timestamp: datetime
# 2. Kafka schema registration
kafka_schema = {
"type": "record",
"fields": [
{"name": "user_id", "type": "string"},
{"name": "event_type", "type": "string"},
{"name": "timestamp", "type": "string"}
]
}
# 3. ClickHouse DDL
# CREATE TABLE events (
# user_id String,
# event_type LowCardinality(String),
# timestamp DateTime
# ) ENGINE = MergeTree()
# 4. Analytics API response
class EventsResponse(BaseModel):
user_id: str
event_type: str
timestamp: datetime
```
**The Problem:** When you add a field or change a type, you must update it in multiple places. Miss one, and you get:
- Silent data loss (Kafka → ClickHouse sync fails)
- Runtime errors
- Data quality issues (validation gaps)
### Solution: Model In Code, Reuse Everywhere
With Moose you define your schemas in native language types with optional metadata. This lets you reuse your schemas across multiple systems:
```python filename="app/main.py"
# 1. Define your schema (WHAT your data looks like)
class MyFirstDataModel(BaseModel):
id: Key[str]
some_string: Annotated[str, "LowCardinality"]
some_number: int
some_date: datetime
some_json: Any
# This single model can be reused across multiple systems:
my_first_pipeline = IngestPipeline[MyFirstDataModel]("my_first_pipeline", IngestPipelineConfig(
ingest_api=True, # POST API endpoint
stream=True, # Kafka topic
table=True # ClickHouse table
))
```
## How It Works
The key idea is leveraging Annotated types to extend base Python types with "metadata" that represents specific optimizations and details on how to map that type in ClickHouse:
```python
from moose_lib import Key, clickhouse_decimal
from typing import Annotated
class Model(BaseModel):
# Base type: str
# ClickHouse: String with primary key
id: Key[str]
# Base type: Decimal
# ClickHouse: Decimal(10,2) for precise money
amount: clickhouse_decimal(10, 2)
# Base type: str
# ClickHouse: LowCardinality(String) for enums
status: Annotated[str, "LowCardinality"]
# Base type: datetime
# ClickHouse: DateTime
created_at: datetime
events = OlapTable[Event]("events")
# In your application code:
tx = Event(
id="id_123",
amount=Decimal("99.99"), # Regular Decimal in Python
status="completed", # Regular string in Python
created_at=datetime.now()
)
# In ClickHouse:
# CREATE TABLE events (
# id String,
# amount Decimal(10,2),
# status LowCardinality(String),
# created_at DateTime
# ) ENGINE = MergeTree()
# ORDER BY transaction_id
```
**The metadata annotations are compile-time only** - they don't affect your runtime code. Your application works with regular strings and numbers, while Moose uses the metadata to generate optimized infrastructure.
## Building Data Models: From Simple to Complex
Let's walk through how to model data for different infrastructure components and see how types behave across them.
### Simple Data Model Shared Across Infrastructure
A basic data model that works identically across all infrastructure components:
```python filename="app/datamodels/simple_shared.py"
from pydantic import BaseModel
from datetime import datetime
class SimpleShared(BaseModel):
id: str
name: str
value: float
timestamp: datetime
# This SAME model creates all infrastructure
pipeline = IngestPipeline[SimpleShared]("simple_shared", IngestPipelineConfig(
ingest_api=True, # Creates: POST /ingest/simple_shared
stream=True, # Creates: Kafka topic
table=True # Creates: ClickHouse table
))
# The exact same types work everywhere:
# - API validates: { "id": "123", "name": "test", "value": 42, "timestamp": "2024-01-01T00:00:00Z" }
# - Kafka stores: { "id": "123", "name": "test", "value": 42, "timestamp": "2024-01-01T00:00:00Z" }
# - ClickHouse table: id String, name String, value Float64, timestamp DateTime
```
**Key Point**: One model definition creates consistent schemas across all systems.
### Composite Types Shared Across Infrastructure
Complex types including nested objects, arrays, and enums work seamlessly across all components:
```python filename="app/datamodels/composite_shared.py"
from moose_lib import Key
from pydantic import BaseModel
from typing import List, Dict, Any, Optional, Literal
from datetime import datetime
class Metadata(BaseModel):
category: str
priority: float
tags: List[str]
class CompositeShared(BaseModel):
id: Key[str] # Primary key
status: Literal["active", "pending", "completed"] # Enum
# Nested object
metadata: Metadata
# Arrays and maps
values: List[float]
attributes: Dict[str, Any]
# Optional field
description: Optional[str] = None
created_at: datetime
# Using in IngestPipeline - all types preserved
pipeline = IngestPipeline[CompositeShared]("composite_shared", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True
))
# How the types map:
# - API validates nested structure and enum values
# - Kafka preserves the exact JSON structure
# - ClickHouse creates:
# - id String (with PRIMARY KEY)
# - status Enum8('active', 'pending', 'completed')
# - metadata.category String, metadata.priority Float64, metadata.tags Array(String)
# - values Array(Float64)
# - attributes String (JSON)
# - description Nullable(String)
# - created_at DateTime
```
**Key Point**: Complex types including nested objects and arrays work consistently across all infrastructure.
### ClickHouse-Specific Types (Standalone vs IngestPipeline)
ClickHouse type annotations optimize database performance but are **transparent to other infrastructure**:
```python filename="app/datamodels/clickhouse_optimized.py"
from moose_lib import Key, clickhouse_decimal, OlapTable, IngestPipeline, IngestPipelineConfig
from typing import Annotated
from pydantic import BaseModel
from datetime import datetime
class Details(BaseModel):
name: str
value: float
class ClickHouseOptimized(BaseModel):
id: Key[str]
# ClickHouse-specific type annotations
amount: clickhouse_decimal(10, 2) # Decimal(10,2) in ClickHouse
category: Annotated[str, "LowCardinality"] # LowCardinality(String) in ClickHouse
# Optimized nested type
details: Annotated[Details, "ClickHouseNamedTuple"] # NamedTuple in ClickHouse
timestamp: datetime
# SCENARIO 1: Standalone OlapTable - gets all optimizations
table = OlapTable[ClickHouseOptimized]("optimized_table", {
"order_by_fields": ["id", "timestamp"]
})
# Creates ClickHouse table with:
# - amount Decimal(10,2)
# - category LowCardinality(String)
# - details Tuple(name String, value Float64)
# SCENARIO 2: IngestPipeline - optimizations ONLY in ClickHouse
pipeline = IngestPipeline[ClickHouseOptimized]("optimized_pipeline", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True
))
# What happens at each layer:
# 1. API receives/validates: { "amount": "123.45", "category": "electronics", ... }
# - Sees amount as str, category as str (annotations ignored)
# 2. Kafka stores: { "amount": "123.45", "category": "electronics", ... }
# - Plain JSON, no ClickHouse types
# 3. ClickHouse table gets optimizations:
# - amount stored as Decimal(10,2)
# - category stored as LowCardinality(String)
# - details stored as NamedTuple
```
**Key Point**: ClickHouse annotations are metadata that ONLY affect the database schema. Your application code and other infrastructure components see regular TypeScript/Python types.
### API Contracts with Runtime Validators
APIs use runtime validation to ensure query parameters meet your requirements:
```python filename="app/apis/consumption_with_validation.py"
from moose_lib import Api
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional, List
# Query parameters with runtime validation
class SearchParams(BaseModel):
# Date range validation
start_date: str = Field(..., regex="^\\d{4}-\\d{2}-\\d{2}$") # Must be YYYY-MM-DD
end_date: str = Field(..., regex="^\\d{4}-\\d{2}-\\d{2}$")
# Numeric constraints
min_value: Optional[float] = Field(None, ge=0) # Optional, but if provided >= 0
max_value: Optional[float] = Field(None, le=1000) # Optional, but if provided <= 1000
# String validation
category: Optional[str] = Field(None, min_length=2, max_length=50)
# Pagination
page: Optional[int] = Field(None, ge=1)
limit: Optional[int] = Field(None, ge=1, le=100)
# Response data model
class SearchResult(BaseModel):
id: str
name: str
value: float
category: str
timestamp: datetime
# Create validated API endpoint
async def search_handler(params: SearchParams, client: MooseClient) -> List[SearchResult]:
# Params are already validated when this runs
# Build a parameterized query safely
clauses = [
"timestamp >= {startDate}",
"timestamp <= {endDate}"
]
params_dict = {
"startDate": params.start_date,
"endDate": params.end_date,
"limit": params.limit or 10,
"offset": ((params.page or 1) - 1) * (params.limit or 10)
}
if params.min_value is not None:
clauses.append("value >= {minValue}")
params_dict["minValue"] = params.min_value
if params.max_value is not None:
clauses.append("value <= {maxValue}")
params_dict["maxValue"] = params.max_value
if params.category is not None:
clauses.append("category = {category}")
params_dict["category"] = params.category
where_clause = " AND ".join(clauses)
query = f"""
SELECT * FROM data_table
WHERE {where_clause}
LIMIT {limit}
OFFSET {offset}
"""
results = await client.query.execute(query, params=params_dict)
return [SearchResult(**row) for row in results]
search_api = Api[SearchParams, List[SearchResult]](
"search",
handler=search_handler
)
# API Usage Examples:
# ✅ Valid: GET /api/search?start_date=2024-01-01&end_date=2024-01-31
# ✅ Valid: GET /api/search?start_date=2024-01-01&end_date=2024-01-31&min_value=100&limit=50
# ❌ Invalid: GET /api/search?start_date=Jan-1-2024 (wrong date format)
# ❌ Invalid: GET /api/search?start_date=2024-01-01&end_date=2024-01-31&limit=200 (exceeds max)
```
**Key Point**: Runtime validators ensure API consumers provide valid data, returning clear error messages for invalid requests before any database queries run.
## Additional Data Modeling Patterns
### Modeling for Stream Processing
When you need to process data in real-time before it hits the database:
```python filename="app/datamodels/stream_example.py"
from moose_lib import Key, Stream, OlapTable
from pydantic import BaseModel
from typing import Dict, Any, Annotated
from datetime import datetime
# Raw data from external source
class RawData(BaseModel):
id: Key[str]
timestamp: datetime
raw_payload: str
source_type: Annotated[str, "LowCardinality"]
# Processed data after transformation
class ProcessedData(BaseModel):
id: Key[str]
timestamp: datetime
field1: str
field2: Annotated[str, "LowCardinality"]
numeric_value: float
attributes: Dict[str, Any]
# Create streams
raw_stream = Stream[RawData]("raw-stream")
processed_table = OlapTable[ProcessedData]("processed_data", OlapConfig(
order_by_fields = ["id", "timestamp"]
))
processed_stream = Stream[ProcessedData]("processed-stream", StreamConfig(
destination=processed_table
))
# Transform raw data
async def process_data(raw: RawData):
parsed = json.loads(raw.raw_payload)
processed = ProcessedData(
id=raw.id,
timestamp=raw.timestamp,
field1=parsed["field_1"],
field2=parsed["field_2"],
numeric_value=float(parsed.get("value", 0)),
attributes=parsed.get("attributes", {})
)
raw_stream.add_transform(processed_stream, process_data)
```
### Modeling for Workflow Tasks
Define strongly-typed inputs and outputs for async jobs:
```python filename="app/workflows/task_example.py"
from moose_lib import Task, TaskContext
from pydantic import BaseModel, Field
from typing import Optional, List, Literal, Dict, Any
from datetime import datetime
# Input validation with constraints
class TaskOptions(BaseModel):
include_metadata: bool
max_items: Optional[int] = Field(None, ge=1, le=100)
class TaskInput(BaseModel):
id: str = Field(..., regex="^[0-9a-f-]{36}$")
items: List[str]
task_type: Literal["typeA", "typeB", "typeC"]
options: Optional[TaskOptions] = None
# Structured output
class ResultA(BaseModel):
category: str
score: float
details: Dict[str, Any]
class ResultB(BaseModel):
values: List[str]
metrics: List[float]
class ResultC(BaseModel):
field1: str
field2: str
field3: float
class TaskOutput(BaseModel):
id: str
processed_at: datetime
result_a: Optional[ResultA] = None
result_b: Optional[ResultB] = None
result_c: Optional[ResultC] = None
# Create workflow task
async def run_task(ctx: TaskContext[TaskInput]) -> TaskOutput:
# Process data based on task type
output = TaskOutput(
id=ctx.input.id,
processed_at=datetime.now()
)
if ctx.input.task_type == "typeA":
output.result_a = await process_type_a(ctx.input)
return output
example_task = Task[TaskInput, TaskOutput](
"example-task",
run_function=run_task,
retries=3,
timeout=30 # seconds
)
```
---
## Summary
Source: moose/deploying.mdx
Summary of deploying Moose into production
# Moose Deploy
## Overview
Once you've finished developing your Moose application locally, the next step is to deploy your Moose app into production. You have two options:
- Self-host your Moose application on your own servers
- Use the [Boreal Cloud hosting platform](https://www.fiveonefour.com/boreal) (from the makers of the Moose Stack)
Want this managed in production for you? Check out Boreal Cloud (from the makers of the Moose Stack).
## Getting Started With Self-Hosting
Moose makes it easy to package and deploy your applications, whether you're deploying to a server with or without internet access. The deployment process is designed to be flexible and can accommodate both containerized and non-containerized environments.
### Deployment Options
1. **Kubernetes Deployment**: Deploy your application to Kubernetes clusters (GKE, EKS, AKS, or on-premises)
2. **Standard Server Deployment**: Deploy your application to a server with internet access
3. **Containerized Cloud Deployment**: Deploy to cloud services like AWS ECS or Google Cloud Run
4. **Offline Server Deployment**: Deploy to an environment without internet access
### Key Deployment Steps
There are three main aspects to deploying a Moose application:
1. Setting up your build environment with Python and the Moose CLI
2. Building your application using `moose build`
3. Setting up your deployment environment with the necessary runtime dependencies (Python, Docker) and configuration
## Configuring Your Deployment
Based on our production experience, we recommend the following best practices for deploying Moose applications:
### Health Monitoring
Configure comprehensive health checks to ensure your application remains available:
- Startup probes to handle initialization
- Readiness probes for traffic management
- Liveness probes to detect and recover from deadlocks
### Zero-Downtime Deployments
Implement graceful termination and rolling updates:
- Pre-stop hooks to handle in-flight requests
- Appropriate termination grace periods
- Rolling update strategies that maintain service availability
### Resource Allocation
Properly size your deployments based on workload:
- CPU and memory requests tailored to your application
- Replicas scaled according to traffic patterns
- Horizontal scaling for high availability
### Environment Configuration
For any deployment type, you'll need to configure:
1. Runtime environment variables for logging, telemetry, and application settings
2. External service connections (ClickHouse, Redpanda, Redis)
3. Network settings and security configurations
4. Application-specific configurations
## Detailed Guides
The following pages provide detailed guides for each deployment scenario, including step-by-step instructions for both Python and TypeScript applications and production-ready configuration templates.
---
## Configuring Moose for cloud environments
Source: moose/deploying/configuring-moose-for-cloud.mdx
Configuring Moose for cloud environments
# Configuring Moose for cloud environments
In the [Packaging Moose for deployment](packaging-moose-for-deployment.mdx) page, we looked at how to package your moose
application into Docker containers (using the `moose build —-docker` command), and you've pushed them to your container repository.
We can connect and configure your container image with remote ClickHouse and Redis-hosted services. You can also optionally
use Redpanda for event streaming and Temporal for workflow orchestration.
The methods used to accomplish this are generally similar, but the specific details depend on your target cloud infrastructure.
So, we'll look at the overarching concepts and provide some common examples.
## Specifying your repository container
Earlier, we created two local containers and pushed them to a docker repository.
```txt filename="Terminal" copy
>docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
moose-df-deployment-aarch64-unknown-linux-gnu 0.3.175 c50674c7a68a About a minute ago 155MB
moose-df-deployment-x86_64-unknown-linux-gnu 0.3.175 e5b449d3dea3 About a minute ago 163MB
```
We pushed the containers to the `514labs` Docker Hub account. So, we have these two containers available for use:
```
514labs/moose-df-deployment-aarch64-unknown-linux-gnu:0.3.175
514labs/moose-df-deployment-x86_64-unknown-linux-gnu:0.3.175
```
In later examples, we'll use an AMD64 (x86_64) based machine, so we'll stick to using the following container image: `514labs/moose-df-deployment-x86_64-unknown-linux-gnu:0.3.175`
We'll also examine how the container image name can be used in various cloud providers and scenarios.
## General overview
The general approach is to use a cloud provider that supports specifying a container image to launch your application. Examples include the Google Kubernetes Engine (GKE), Amazon's Elastic Kubernetes Service (EKS), and Elastic Container Service (ECS). Each provider also offers a way of configuring container environment variables that your container application will have access to.
## Essential Environment Variables
Based on our production deployments, here are the essential environment variables you'll need to configure for your Moose application in cloud environments:
### Logging and Telemetry
```
# Logger configuration
MOOSE_LOGGER__LEVEL=Info
MOOSE_LOGGER__STDOUT=true
MOOSE_LOGGER__FORMAT=Json
# Telemetry configuration
MOOSE_TELEMETRY__ENABLED=false
MOOSE_TELEMETRY__EXPORT_METRICS=true
# For debugging
RUST_BACKTRACE=1
```
### HTTP Server Configuration
```
# HTTP server settings
MOOSE_HTTP_SERVER_CONFIG__HOST=0.0.0.0
MOOSE_HTTP_SERVER_CONFIG__PORT=4000
```
### External Service Connections
For detailed configuration of the external services, refer to the [Preparing ClickHouse and Redpanda](preparing-clickhouse-redpanda.mdx) page.
#### ClickHouse
```
MOOSE_CLICKHOUSE_CONFIG__DB_NAME=
MOOSE_CLICKHOUSE_CONFIG__USER=
MOOSE_CLICKHOUSE_CONFIG__PASSWORD=
MOOSE_CLICKHOUSE_CONFIG__HOST=
MOOSE_CLICKHOUSE_CONFIG__HOST_PORT=8443
MOOSE_CLICKHOUSE_CONFIG__USE_SSL=1
MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT=9440
```
#### Redis
Moose requires Redis for caching and message passing:
```
MOOSE_REDIS_CONFIG__URL=
MOOSE_REDIS_CONFIG__KEY_PREFIX=
```
#### Redpanda (Optional)
If you choose to use Redpanda for event streaming:
```
MOOSE_REDPANDA_CONFIG__BROKER=
MOOSE_REDPANDA_CONFIG__NAMESPACE=
MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS=10043
MOOSE_REDPANDA_CONFIG__SASL_USERNAME=
MOOSE_REDPANDA_CONFIG__SASL_PASSWORD=
MOOSE_REDPANDA_CONFIG__SASL_MECHANISM=SCRAM-SHA-256
MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL=SASL_SSL
MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR=3
```
#### Temporal (Optional)
If you choose to use Temporal for workflow orchestration:
```
MOOSE_TEMPORAL_CONFIG__CA_CERT=/etc/ssl/certs/ca-certificates.crt
MOOSE_TEMPORAL_CONFIG__API_KEY=
MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST=.tmprl.cloud
```
## Securing Sensitive Information
When deploying to cloud environments, it's important to handle sensitive information like passwords and API keys securely. Each cloud provider offers mechanisms for this:
- **Kubernetes**: Use Secrets to store sensitive data. See our [Kubernetes deployment guide](deploying-on-kubernetes.mdx) for examples.
- **Amazon ECS**: Use AWS Secrets Manager or Parameter Store to securely inject environment variables.
- **Other platforms**: Use the platform's recommended secrets management approach.
Never hardcode sensitive values directly in your deployment configuration files.
Please share your feedback about Moose monitoring capabilities through [our GitHub repository](https://github.com/514-labs/moose/issues/new?title=Feedback%20for%20%E2%80%9CMonitoring%E2%80%9D&labels=feedback).
---
## Deploying on an offline server
Source: moose/deploying/deploying-on-an-offline-server.mdx
Deploying on an offline server
# Building and Deploying Moose Applications
This guide will walk you through the process of building a Moose application and deploying it to a server that does not have internet access.
We'll cover both the build environment setup and the deployment environment requirements.
## Build Environment Setup
### Prerequisites
Before you can build a Moose application, you need to set up your build environment with the following dependencies:
OS:
- Debian 10+
- Ubuntu 18.10+
- Fedora 29+
- CentOS/RHEL 8+
- Amazon Linux 2023+
- Mac OS 13+
Common CLI utilities:
- zip
- curl (optional, for installing the Moose CLI)
Python build environment requirements:
1. Python 3.12 or later (we recommend using pyenv for Python version management)
2. pip
### Setting up the Build Environment
First, install the required system dependencies:
```bash
sudo apt update
sudo apt install build-essential libssl-dev zlib1g-dev libbz2-dev \
libreadline-dev libsqlite3-dev curl git libncursesw5-dev xz-utils \
tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
```
Install pyenv and configure your shell:
```bash
curl -fsSL https://pyenv.run | bash
```
Add the following to your `~/.bashrc` or `~/.zshrc`:
```bash
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
```
Install and set Python 3.12:
```bash
pyenv install 3.12
pyenv global 3.12
```
Verify the installation:
```bash
python --version
```
### Installing Moose CLI (Optional)
You can install the Moose CLI using the official installer:
```bash
curl -SfsL https://fiveonefour.com/install.sh | bash -s -- moose
source ~/.bashrc # Or restart your terminal
```
or
```bash
pip install moose-cli
```
## Building Your Application
### 1. Initialize a New Project (Optional)
This step is optional if you already have a Moose project.
Create a new Moose project:
```bash
moose init your-project-name py
cd your-project-name
```
### 2. Build the Application
Make sure you have the `zip` utility installed (`sudo apt install zip`) before building your application.
if you installed the moose cli to be available globally, you can build the application with the following command:
```bash
moose build
```
Or if you installed the moose cli to be available locally, you can build the application with the following command:
The build process will create a deployable package:
```bash
moose build
```
This will create a zip file in your project directory with a timestamp, for example: `your-project-name-YYYY-MM-DD.zip`
## Deployment Environment Setup
### Prerequisites
The deployment server requires:
1. Python 3.12 or later
3. Unzip utility
### Setting up the Deployment Environment
1. Install the runtime environment:
Follow the Python installation steps from the build environment setup section.
2. Install the unzip utility:
```bash
sudo apt install unzip
```
## Deploying Your Application
1. Copy your built application package to the deployment server
2. Extract the application:
```bash
unzip your-project-name-YYYY-MM-DD.zip -d ./app
cd ./app/packager
```
3. Start your application:
```bash
moose prod
```
Ensure all required environment variables and configurations are properly set before starting your application.
## Troubleshooting
- Verify that Python is properly installed using `python --version`
- Check that your application's dependencies are properly listed in `requirements.txt`
- If you encounter Python import errors, ensure your `PYTHONPATH` is properly set
---
## Deploying on Amazon ECS
Source: moose/deploying/deploying-on-ecs.mdx
Deploying on Amazon ECS
# Deploying on Amazon ECS
Moose can be deployed to Amazon's Elastic Container Service (ECS). ECS offers a managed container orchestrator at a fraction of the complexity of managing a Kubernetes cluster.
If you're relatively new to ECS we recommend the following resources:
- [Amazon Elastic Container Service (ECS) with a Load Balancer | AWS Tutorial with New ECS Experience](https://www.youtube.com/watch?v=rUgZNXKbsrY)
- [Tutorial: Deploy NGINX Containers On ECS Fargate with Load Balancer](https://bhaveshmuleva.hashnode.dev/tutorial-deploy-nginx-containers-on-ecs-fargate-with-load-balancer)
- [How to configure target groups ports with listeners and tasks](https://stackoverflow.com/questions/66275574/how-to-configure-target-groups-ports-with-listeners-and-tasks)
The first step is deciding whether you'll host your Moose container on Docker Hub or Amazon's Elastic Container Registry (ECR).
Amazon ECR is straightforward and is designed to work out of the box with ECS. Using Docker Hub works if your moose container is publicly available; however,
if your container is private, you'll need to do a bit more work to provide ECS with your Docker credentials.
> See: [Authenticating with Docker Hub for AWS Container Services](https://aws.amazon.com/blogs/containers/authenticating-with-docker-hub-for-aws-container-services/)
Here is an overview of the steps required:
1. You'll first need to create or use an existing ECS cluster.
2. Then, you'll need to create an ECS `Task definition.` This is where you'll specify whether you want to use AWS Fargate or AWS EC2 instances.
You'll also have options for selecting your OS and Architecture. Specify `Linux/X86-64` or `Linux/ARM-64`. This is important as you'll also need to
specify a matching moose container image, such as `moose-df-deployment-x86_64-unknown-linux-gnu:0.3.175` or `moose-df-deployment-aarch64-unknown-linux-gnu:0.3.175`
3. As with all AWS services, if you're using secrets to store credentials, you will need to specify an IAM role with an `AmazonECSTaskExecutionRolePolicy` and `SecretsManagerReadWrite`
policy.
4. Under the Container section, specify the name of your moose deployment and provide the container image name you're using.
5. Next, specify the Container Port as 4000.
## Configuring container environment variables
While still in the Amazon ECS Task definition section, you'll need to provide the environment variables on which your Moose application depends.
Scroll down to the Environment variables section and fill in each of the following variables.
ClickHouse and Redis are required components for Moose. Redpanda and Temporal are optional - configure them only if you're using these components in your application.
> Note: if you prefer, you can provide the environment variables below via an env file hosted on S3 or using AWS Secrets Manager for sensitive values.
### Core Configuration
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_LOGGER__LEVEL | Log level | Info |
| MOOSE_LOGGER__STDOUT | Enable stdout logging | true |
| MOOSE_LOGGER__FORMAT | Log format | Json |
| RUST_BACKTRACE | Enable backtraces for debugging | 1 |
### HTTP Server Configuration
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_HTTP_SERVER_CONFIG__HOST | Your moose network binding address | 0.0.0.0 |
| MOOSE_HTTP_SERVER_CONFIG__PORT | The network port your moose server is using | 4000 |
### ClickHouse Configuration (Required)
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_CLICKHOUSE_CONFIG__DB_NAME | The name of your ClickHouse database | moose_production |
| MOOSE_CLICKHOUSE_CONFIG__USER | The database user name | clickhouse_user |
| MOOSE_CLICKHOUSE_CONFIG__PASSWORD | The password to your ClickHouse database | (use AWS Secrets Manager) |
| MOOSE_CLICKHOUSE_CONFIG__HOST | The hostname for your ClickHouse database | your-clickhouse.cloud.example.com |
| MOOSE_CLICKHOUSE_CONFIG__HOST_PORT | The HTTPS port for your ClickHouse database | 8443 |
| MOOSE_CLICKHOUSE_CONFIG__USE_SSL | Whether your database connection requires SSL | 1 |
| MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT | The native port for your ClickHouse database | 9440 |
### Redis Configuration (Required)
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_REDIS_CONFIG__URL | Redis connection URL | redis://user:password@redis.example.com:6379 |
| MOOSE_REDIS_CONFIG__KEY_PREFIX | Prefix for Redis keys to isolate namespaces | moose_production |
### Redpanda Configuration (Optional)
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_REDPANDA_CONFIG__BROKER | The hostname for your Redpanda instance | seed-5fbcae97.example.redpanda.com:9092 |
| MOOSE_REDPANDA_CONFIG__NAMESPACE | Namespace for isolation | moose_production |
| MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS | The message timeout delay in milliseconds | 10043 |
| MOOSE_REDPANDA_CONFIG__SASL_USERNAME | Your Redpanda user name | redpanda_user |
| MOOSE_REDPANDA_CONFIG__SASL_PASSWORD | Your Redpanda password | (use AWS Secrets Manager) |
| MOOSE_REDPANDA_CONFIG__SASL_MECHANISM | SASL mechanism | SCRAM-SHA-256 |
| MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL | The Redpanda security protocol | SASL_SSL |
| MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR | Topic replication factor | 3 |
### Temporal Configuration (Optional)
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_TEMPORAL_CONFIG__CA_CERT | Path to CA certificate | /etc/ssl/certs/ca-certificates.crt |
| MOOSE_TEMPORAL_CONFIG__API_KEY | Temporal Cloud API key | (use AWS Secrets Manager) |
| MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST | Temporal Cloud namespace host | your-namespace.tmprl.cloud |
Consider using a value of greater than 1000ms (1 second) for the Redpanda message timeout delay if you're using a hosted Redpanda cloud service.
Review other options on the Task Creation page and press the `Create` button when ready.
## Using AWS Secrets Manager
For sensitive information like passwords and API keys, we recommend using AWS Secrets Manager. To configure a secret:
1. Go to AWS Secrets Manager and create a new secret
2. Choose "Other type of secret" and add key-value pairs for your secrets
3. Name your secret appropriately (e.g., `moose/production/credentials`)
4. In your ECS task definition, reference the secret:
- For environment variables, select "ValueFrom" and enter the ARN of your secret with the key name
- Example: `arn:aws:secretsmanager:region:account:secret:moose/production/credentials:MOOSE_CLICKHOUSE_CONFIG__PASSWORD::`
## Building an ECS Service
Once you've completed creating an ECS Task, you're ready to create an ECS Service. An ECS Service is a definition that allows you to specify how your cluster will be managed.
Navigate to your cluster's Service page and press the `Create` button to create your new Moose service.
The section we're interested in is the `Deployment configuration` section. There, you'll specify the Task Definition you created earlier. You can also specify the name
of your service—perhaps something creative like `moose-service`—and the number of tasks to launch.
Note at this time, we recommend that you only launch a single instance of
Moose in your cluster. We're currently developing for multi-instance
concurrent usage.
The remaining sections on the create service page allow you to specify networking considerations and whether you'll use a load balancer.
You can press the `Create` button to launch an instance of your new ECS Moose service.
## Setting up health checks
Your generated Moose containers include a health check endpoint at `/health` that should be configured in your ECS service. We recommend configuring the following health check settings:
### Container-level Health Check
In your task definition's container configuration:
```
healthCheck:
command: ["CMD-SHELL", "curl -f http://localhost:4000/health || exit 1"]
interval: 30
timeout: 5
retries: 3
startPeriod: 60
```
### Load Balancer Health Check
If you're using an Application Load Balancer:
1. Create a target group for your service
2. Set the health check path to `/health`
3. Configure appropriate health check settings:
- Health check protocol: HTTP
- Health check port: 4000
- Health check path: /health
- Healthy threshold: 2
- Unhealthy threshold: 2
- Timeout: 5 seconds
- Interval: 15 seconds
- Success codes: 200
These health check configurations ensure that your Moose service is properly monitored and that traffic is only routed to healthy containers.
---
## Deploying on Kubernetes
Source: moose/deploying/deploying-on-kubernetes.mdx
Deploying on Kubernetes
# Deploying on Kubernetes
Moose applications can be deployed to Kubernetes clusters, whether it's your own on-prem
cluster or through a cloud service like Google's Kubernetes Engine (GKE) or Amazon's
Elastic Kubernetes Service (EKS).
Note at this time, we recommend that you only launch a single instance of
moose in one cluster. We're currently developing for multi-instance concurrent
usage.
Essentially you'll need to create a moose-deployment YAML file. Here is an example:
```yaml filename="moose-deployment.yaml-fragment" copy
apiVersion: apps/v1
kind: Deployment
metadata:
name: moosedeployment
spec:
replicas: 1
selector:
matchLabels:
app: moosedeploy
template:
metadata:
labels:
app: moosedeploy
spec:
containers:
- name: moosedeploy
image: 514labs/moose-df-deployment-x86_64-unknown-linux-gnu:latest
ports:
- containerPort: 4000
```
> Make sure to update the image key above with the location of your repository and image tag.
You may also need to configure a load balancer to route external traffic to your moose ingest points.
```yaml filename="moose-lb-service.yaml" copy
apiVersion: v1
kind: Service
metadata:
name: moose-service
spec:
selector:
app: moosedeploy
ports:
- protocol: TCP
port: 4000
targetPort: 4000
type: LoadBalancer
```
Another approach would be to use a service type of `ClusterIP`:
```yaml filename="moose-service.yaml" copy
apiVersion: v1
kind: Service
metadata:
name: moose-service
spec:
selector:
app: moosedeploy
type: ClusterIP
ports:
- protocol: TCP
port: 4000
targetPort: 4000
```
The approach you decide on will depend on your specific Kubernetes networking requirements.
## Setting up health checks and probes
Your generated Moose docker containers feature a health check endpoint at `/health` that can be used by Kubernetes to monitor the health of your application. Based on our production deployment, we recommend configuring the following probes:
```yaml
# Startup probe - gives Moose time to initialize before accepting traffic
startupProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 60
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 30
successThreshold: 3
# Readiness probe - determines when the pod is ready to receive traffic
readinessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 3
failureThreshold: 2
successThreshold: 5
# Liveness probe - restarts the pod if it becomes unresponsive
livenessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 5
successThreshold: 1
```
## Zero-downtime deployments with lifecycle hooks
For production deployments, we recommend configuring a preStop lifecycle hook to ensure graceful pod termination during updates:
```yaml
lifecycle:
preStop:
exec:
command: ["/bin/sleep", "60"]
```
This gives the pod time to finish processing in-flight requests before termination. You should also set an appropriate
`terminationGracePeriodSeconds` value (we recommend 70 seconds) to work with this hook.
## Resource requirements
Based on our production deployments, we recommend the following resource allocation for a standard Moose deployment:
```yaml
resources:
requests:
cpu: "1000m"
memory: "8Gi"
```
You can adjust these values based on your application's specific needs and workload.
## Configuring container environment variables
Inside your `moose-deployment.yaml` file, you will need to add an `env` section for environment variables.
The example below includes actual sample values for clarity. In production deployments, you should use Kubernetes secrets for sensitive information as shown in the second example.
Note that both Redpanda and Temporal are optional. If you're not using these components, you can omit their respective configuration sections.
### Example with hardcoded values (for development/testing only):
The example below includes actual sample values for clarity. In production deployments, you should use Kubernetes secrets for sensitive information as shown in the second example.
Note that both Redpanda and Temporal are optional. If you're not using these components, you can omit their respective configuration sections.
```yaml filename="moose-deployment-dev.yaml" copy
apiVersion: apps/v1
kind: Deployment
metadata:
name: moosedeployment
spec:
# For zero-downtime deployments
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
replicas: 1
selector:
matchLabels:
app: moosedeploy
template:
metadata:
labels:
app: moosedeploy
spec:
# For graceful shutdowns
terminationGracePeriodSeconds: 70
containers:
- name: moosedeploy
image: 514labs/moose-df-deployment-x86_64-unknown-linux-gnu:latest
ports:
- containerPort: 4000
# Lifecycle hook to delay pod shutdown
lifecycle:
preStop:
exec:
command: ["/bin/sleep", "60"]
# Startup probe
startupProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 60
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 30
successThreshold: 3
# Readiness probe
readinessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 3
failureThreshold: 2
successThreshold: 5
# Liveness probe
livenessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 5
successThreshold: 1
# Resource requirements
resources:
requests:
cpu: "1000m"
memory: "8Gi"
env:
# Logger configuration
- name: MOOSE_LOGGER__LEVEL
value: "Info"
- name: MOOSE_LOGGER__STDOUT
value: "true"
- name: MOOSE_LOGGER__FORMAT
value: "Json"
# Telemetry configuration
- name: MOOSE_TELEMETRY__ENABLED
value: "true"
- name: MOOSE_TELEMETRY__EXPORT_METRICS
value: "true"
# Debugging
- name: RUST_BACKTRACE
value: "1"
# HTTP server configuration
- name: MOOSE_HTTP_SERVER_CONFIG__HOST
value: "0.0.0.0"
- name: MOOSE_HTTP_SERVER_CONFIG__PORT
value: "4000"
# ClickHouse configuration
- name: MOOSE_CLICKHOUSE_CONFIG__DB_NAME
value: "moose_production"
- name: MOOSE_CLICKHOUSE_CONFIG__USER
value: "clickhouse_user"
- name: MOOSE_CLICKHOUSE_CONFIG__PASSWORD
value: "clickhouse_password_example"
- name: MOOSE_CLICKHOUSE_CONFIG__HOST
value: "your-clickhouse.cloud.example.com"
- name: MOOSE_CLICKHOUSE_CONFIG__HOST_PORT
value: "8443"
- name: MOOSE_CLICKHOUSE_CONFIG__USE_SSL
value: "1"
- name: MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT
value: "9440"
# Redis configuration
- name: MOOSE_REDIS_CONFIG__URL
value: "redis://redis_user:redis_password_example@redis.example.com:6379"
- name: MOOSE_REDIS_CONFIG__KEY_PREFIX
value: "moose_production"
# Redpanda configuration (Optional)
- name: MOOSE_REDPANDA_CONFIG__BROKER
value: "seed-5fbcae97.example.redpanda.com:9092"
- name: MOOSE_REDPANDA_CONFIG__NAMESPACE
value: "moose_production"
- name: MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS
value: "10043"
- name: MOOSE_REDPANDA_CONFIG__SASL_USERNAME
value: "redpanda_user"
- name: MOOSE_REDPANDA_CONFIG__SASL_PASSWORD
value: "redpanda_password_example"
- name: MOOSE_REDPANDA_CONFIG__SASL_MECHANISM
value: "SCRAM-SHA-256"
- name: MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL
value: "SASL_SSL"
- name: MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR
value: "3"
# Temporal configuration (Optional)
- name: MOOSE_TEMPORAL_CONFIG__CA_CERT
value: "/etc/ssl/certs/ca-certificates.crt"
- name: MOOSE_TEMPORAL_CONFIG__API_KEY
value: "temporal_api_key_example"
- name: MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST
value: "your-namespace.tmprl.cloud"
imagePullSecrets:
- name: moose-docker-repo-credentials
```
---
## deploying/deploying-with-docker-compose
Source: moose/deploying/deploying-with-docker-compose.mdx
# Deploying with Docker Compose
Deploying a Moose application with all its dependencies can be challenging and time-consuming. You need to properly configure multiple services,
ensure they communicate with each other, and manage their lifecycle.
Docker Compose solves this problem by allowing you to deploy your entire stack with a single command.
This guide shows you how to set up a production-ready Moose environment on a single server using Docker Compose, with proper security,
monitoring, and maintenance practices.
This guide describes a single-server deployment. For high availability (HA) deployments, you'll need to:
- Deploy services across multiple servers
- Configure service replication and redundancy
- Set up load balancing
- Implement proper failover mechanisms
We are also offering an HA managed deployment option for Moose called [Boreal](https://fiveonefour.com/boreal).
## Prerequisites
Before you begin, you'll need:
- Ubuntu 24 or above (for this guide)
- Docker and Docker Compose (minimum version 2.23.1)
- Access to a server with at least 8GB RAM and 4 CPU cores
The Moose stack consists of:
- Your Moose Application
- [ClickHouse](https://clickhouse.com) (required)
- [Redis](https://redis.io) (required)
- [Redpanda](https://redpanda.com) (optional for event streaming)
- [Temporal](https://temporal.io) (optional for workflow orchestration)
## Setting Up a Production Server
### Installing Required Software
First, install Docker on your Ubuntu server:
```bash
# Update the apt package index
sudo apt-get update
# Install packages to allow apt to use a repository over HTTPS
sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
# Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# Set up the stable repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Update apt package index again
sudo apt-get update
# Install Docker Engine
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
```
Next, install Node.js or Python depending on your Moose application:
```bash
# For Node.js applications
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
# OR for Python applications
sudo apt-get install -y python3.12 python3-pip
```
### Configure Docker Log Size Limits
To prevent Docker logs from filling up your disk space, configure log rotation:
```bash
sudo mkdir -p /etc/docker
sudo vim /etc/docker/daemon.json
```
Add the following configuration:
```json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}
```
Restart Docker to apply the changes:
```bash
sudo systemctl restart docker
```
### Enable Docker Non-Root Access
To run Docker commands without sudo:
```bash
# Add your user to the docker group
sudo usermod -aG docker $USER
# Apply the changes (log out and back in, or run this)
newgrp docker
```
### Setting Up GitHub Actions Runner (Optional)
If you want to set up CI/CD automation, you can install a GitHub Actions runner:
1. Navigate to your GitHub repository
2. Go to Settings > Actions > Runners
3. Click "New self-hosted runner"
4. Select Linux and follow the instructions shown
To configure the runner as a service (to run automatically):
```bash
cd actions-runner
sudo ./svc.sh install
sudo ./svc.sh start
```
## Setting up a Foo Bar Moose Application (Optional)
If you already have a Moose application, you can skip this section.
You should copy the moose project to your server and then build the application with the `--docker` flag and get the built image
on the server.
### Install Moose CLI
```bash
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
### Create a new Moose Application
Please follow the initialization instructions for your language.
```bash
moose init test-ts typescript
cd test-ts
npm install
```
or
```bash
moose init test-py python
cd test-py
pip install -r requirements.txt
```
### Build the application on AMD64
```bash
moose build --docker --amd64
```
### Build the application on ARM64
```bash
moose build --docker --arm64
```
### Confirm the image was built
```bash
docker images
```
For more information on packaging Moose for deployment, see the full packaging guide.
## Preparing for Deployment
### Create Environment Configuration
First, create a file called `.env` in your project directory to specify component versions:
```bash
# Create and open the .env file
vim .env
```
Add the following content to the `.env` file:
```
# Version configuration for components
POSTGRESQL_VERSION=14.0
TEMPORAL_VERSION=1.22.0
TEMPORAL_UI_VERSION=2.20.0
REDIS_VERSION=7
CLICKHOUSE_VERSION=25.4
REDPANDA_VERSION=v24.3.13
REDPANDA_CONSOLE_VERSION=v3.1.0
```
Additionally, create a `.env.prod` file for your Moose application-specific secrets and configuration:
```bash
# Create and open the .env.prod file
vim .env.prod
```
Add your application-specific environment variables:
```
# Application-specific environment variables
APP_SECRET=your_app_secret
# Add other application variables here
```
## Deploying with Docker Compose
Create a file called `docker-compose.yml` in the same directory:
```bash
# Create and open the docker-compose.yml file
vim docker-compose.yml
```
Add the following content to the file:
```yaml file=./docker-compose.yml
name: moose-stack
volumes:
# Required volumes
clickhouse-0-data: null
clickhouse-0-logs: null
redis-0: null
# Optional volumes
redpanda-0: null
postgresql-data: null
configs:
temporal-config:
# Using the "content" property to inline the config
content: |
limit.maxIDLength:
- value: 255
constraints: {}
system.forceSearchAttributesCacheRefreshOnRead:
- value: true # Dev setup only. Please don't turn this on in production.
constraints: {}
services:
# REQUIRED SERVICES
# ClickHouse - Required analytics database
clickhouse-0:
container_name: clickhouse-0
restart: always
image: clickhouse/clickhouse-server:${CLICKHOUSE_VERSION}
volumes:
- clickhouse-0-data:/var/lib/clickhouse/
- clickhouse-0-logs:/var/log/clickhouse-server/
environment:
# Enable SQL-driven access control and user management
CLICKHOUSE_ALLOW_INTROSPECTION_FUNCTIONS: 1
# Default admin credentials
CLICKHOUSE_USER: admin
CLICKHOUSE_PASSWORD: adminpassword
# Disable default user
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
# Database setup
CLICKHOUSE_DB: moose
# Uncomment this if you want to access clickhouse from outside the docker network
# ports:
# - 8123:8123
# - 9000:9000
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8123/ping || exit 1
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
ulimits:
nofile:
soft: 262144
hard: 262144
networks:
- moose-network
# Redis - Required for caching and pub/sub
redis-0:
restart: always
image: redis:${REDIS_VERSION}
volumes:
- redis-0:/data
command: redis-server --save 20 1 --loglevel warning
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
networks:
- moose-network
# OPTIONAL SERVICES
# --- BEGIN REDPANDA SERVICES (OPTIONAL) ---
# Remove this section if you don't need event streaming
redpanda-0:
restart: always
command:
- redpanda
- start
- --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
# Address the broker advertises to clients that connect to the Kafka API.
# Use the internal addresses to connect to the Redpanda brokers'
# from inside the same Docker network.
# Use the external addresses to connect to the Redpanda brokers'
# from outside the Docker network.
- --advertise-kafka-addr internal://redpanda-0:9092,external://localhost:19092
- --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082
# Address the broker advertises to clients that connect to the HTTP Proxy.
- --advertise-pandaproxy-addr internal://redpanda-0:8082,external://localhost:18082
- --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081
# Redpanda brokers use the RPC API to communicate with each other internally.
- --rpc-addr redpanda-0:33145
- --advertise-rpc-addr redpanda-0:33145
# Mode dev-container uses well-known configuration properties for development in containers.
- --mode dev-container
# Tells Seastar (the framework Redpanda uses under the hood) to use 1 core on the system.
- --smp 1
- --default-log-level=info
image: docker.redpanda.com/redpandadata/redpanda:${REDPANDA_VERSION}
container_name: redpanda-0
volumes:
- redpanda-0:/var/lib/redpanda/data
networks:
- moose-network
healthcheck:
test: ["CMD-SHELL", "rpk cluster health | grep -q 'Healthy:.*true'"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
# Optional Redpanda Console for visualizing the cluster
redpanda-console:
restart: always
container_name: redpanda-console
image: docker.redpanda.com/redpandadata/console:${REDPANDA_CONSOLE_VERSION}
entrypoint: /bin/sh
command: -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console'
environment:
CONFIG_FILEPATH: /tmp/config.yml
CONSOLE_CONFIG_FILE: |
kafka:
brokers: ["redpanda-0:9092"]
# Schema registry config moved outside of kafka section
schemaRegistry:
enabled: true
urls: ["http://redpanda-0:8081"]
redpanda:
adminApi:
enabled: true
urls: ["http://redpanda-0:9644"]
ports:
- 8080:8080
depends_on:
- redpanda-0
healthcheck:
test: ["CMD", "wget", "--spider", "--quiet", "http://localhost:8080/admin/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
networks:
- moose-network
# --- END REDPANDA SERVICES ---
# --- BEGIN TEMPORAL SERVICES (OPTIONAL) ---
# Remove this section if you don't need workflow orchestration
# Temporal PostgreSQL database
postgresql:
container_name: temporal-postgresql
environment:
POSTGRES_PASSWORD: temporal
POSTGRES_USER: temporal
image: postgres:${POSTGRESQL_VERSION}
restart: always
volumes:
- postgresql-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U temporal"]
interval: 10s
timeout: 5s
retries: 3
networks:
- moose-network
# Temporal server
# For initial setup, use temporalio/auto-setup image
# For production, switch to temporalio/server after first run
temporal:
container_name: temporal
depends_on:
postgresql:
condition: service_healthy
environment:
# Database configuration
- DB=postgres12
- DB_PORT=5432
- POSTGRES_USER=temporal
- POSTGRES_PWD=temporal
- POSTGRES_SEEDS=postgresql
# Namespace configuration
- DEFAULT_NAMESPACE=moose-workflows
- DEFAULT_NAMESPACE_RETENTION=72h
# Auto-setup options - set to false after initial setup
- AUTO_SETUP=true
- SKIP_SCHEMA_SETUP=false
# Service configuration - all services by default
# For high-scale deployments, run these as separate containers
# - SERVICES=history,matching,frontend,worker
# Logging and metrics
- LOG_LEVEL=info
# Addresses
- TEMPORAL_ADDRESS=temporal:7233
- DYNAMIC_CONFIG_FILE_PATH=/etc/temporal/config/dynamicconfig/development-sql.yaml
# For initial deployment, use the auto-setup image
image: temporalio/auto-setup:${TEMPORAL_VERSION}
# For production, after initial setup, switch to server image:
# image: temporalio/server:${TEMPORAL_VERSION}
restart: always
ports:
- 7233:7233
# Volume for dynamic configuration - essential for production
configs:
- source: temporal-config
target: /etc/temporal/config/dynamicconfig/development-sql.yaml
mode: 0444
networks:
- moose-network
healthcheck:
test: ["CMD", "tctl", "--ad", "temporal:7233", "cluster", "health", "|", "grep", "-q", "SERVING"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
# Temporal Admin Tools - useful for maintenance and debugging
temporal-admin-tools:
container_name: temporal-admin-tools
depends_on:
- temporal
environment:
- TEMPORAL_ADDRESS=temporal:7233
- TEMPORAL_CLI_ADDRESS=temporal:7233
image: temporalio/admin-tools:${TEMPORAL_VERSION}
restart: "no"
networks:
- moose-network
stdin_open: true
tty: true
# Temporal Web UI
temporal-ui:
container_name: temporal-ui
depends_on:
- temporal
environment:
- TEMPORAL_ADDRESS=temporal:7233
- TEMPORAL_CORS_ORIGINS=http://localhost:3000
image: temporalio/ui:${TEMPORAL_UI_VERSION}
restart: always
ports:
- 8081:8080
networks:
- moose-network
healthcheck:
test: ["CMD", "wget", "--spider", "--quiet", "http://localhost:8080/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
# --- END TEMPORAL SERVICES ---
# Your Moose application
moose:
image: moose-df-deployment-x86_64-unknown-linux-gnu:latest # Update with your image name
depends_on:
# Required dependencies
- clickhouse-0
- redis-0
# Optional dependencies - remove if not using
- redpanda-0
- temporal
restart: always
environment:
# Logging and debugging
RUST_BACKTRACE: "1"
MOOSE_LOGGER__LEVEL: "Info"
MOOSE_LOGGER__STDOUT: "true"
# Required services configuration
# ClickHouse configuration
MOOSE_CLICKHOUSE_CONFIG__DB_NAME: "moose"
MOOSE_CLICKHOUSE_CONFIG__USER: "moose"
MOOSE_CLICKHOUSE_CONFIG__PASSWORD: "your_moose_password"
MOOSE_CLICKHOUSE_CONFIG__HOST: "clickhouse-0"
MOOSE_CLICKHOUSE_CONFIG__HOST_PORT: "8123"
# Redis configuration
MOOSE_REDIS_CONFIG__URL: "redis://redis-0:6379"
MOOSE_REDIS_CONFIG__KEY_PREFIX: "moose"
# Optional services configuration
# Redpanda configuration (remove if not using Redpanda)
MOOSE_REDPANDA_CONFIG__BROKER: "redpanda-0:9092"
MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS: "1000"
MOOSE_REDPANDA_CONFIG__RETENTION_MS: "30000"
MOOSE_REDPANDA_CONFIG__NAMESPACE: "moose"
# Temporal configuration (remove if not using Temporal)
MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST: "temporal:7233"
MOOSE_TEMPORAL_CONFIG__NAMESPACE: "moose-workflows"
# HTTP Server configuration
MOOSE_HTTP_SERVER_CONFIG__HOST: 0.0.0.0
ports:
- 4000:4000
env_file:
- path: ./.env.prod
required: true
networks:
- moose-network
healthcheck:
test: ["CMD-SHELL", "curl -s http://localhost:4000/health | grep -q '\"unhealthy\": \\[\\]' && echo 'Healthy'"]
interval: 30s
timeout: 5s
retries: 10
start_period: 60s
# Define the network for all services
networks:
moose-network:
driver: bridge
```
At this point, don't start the services yet. First, we need to configure the individual services for production use as described in the following sections.
## Configuring Services for Production
### Configuring ClickHouse Securely (Required)
For production ClickHouse deployment, we'll use environment variables to configure users and access control
(as recommended in the [official Docker image documentation](https://hub.docker.com/r/clickhouse/clickhouse-server)):
1. First, start the ClickHouse container:
```bash
# Start just the ClickHouse container
docker compose up -d clickhouse-0
```
2. After ClickHouse has started, connect to create additional users:
```bash
# Connect to ClickHouse with the admin user
docker exec -it clickhouse-0 clickhouse-client --user admin --password adminpassword
# Create moose application user
CREATE USER moose IDENTIFIED BY 'your_moose_password';
GRANT ALL ON moose.* TO moose;
# Create read-only user for BI tools (optional)
CREATE USER power_bi IDENTIFIED BY 'your_powerbi_password' SETTINGS PROFILE 'readonly';
GRANT SHOW TABLES, SELECT ON moose.* TO power_bi;
```
3. To exit the ClickHouse client, type `\q` and press Enter.
4. Update your Moose environment variables to use the new moose user:
```bash
vim docker-compose.yml
```
```yaml
MOOSE_CLICKHOUSE_CONFIG__USER: "moose"
MOOSE_CLICKHOUSE_CONFIG__PASSWORD: "your_moose_password"
```
5. Remove the following environement variables from the clickhouse service in the docker-compose.yml file:
```yaml
MOOSE_CLICKHOUSE_CONFIG__USER: "admin"
MOOSE_CLICKHOUSE_CONFIG__PASSWORD: "adminpassword"
```
6. For additional security in production, consider using Docker secrets for passwords.
7. Restart the ClickHouse container to apply the changes:
```bash
docker compose restart clickhouse-0
```
8. Verify that the new configuration works by connecting with the newly created user:
```bash
# Connect with the new moose user
docker exec -it moose-stack-clickhouse-0-1 clickhouse-client --user moose --password your_moose_password
# Test access by listing tables
SHOW TABLES FROM moose;
# Exit the clickhouse client
\q
```
If you can connect successfully and run commands with the new user, your ClickHouse configuration is working properly.
### Securing Redpanda (Optional)
For production, it's recommended to restrict external access to Redpanda:
1. Modify your Docker Compose file to remove external access:
- Use only internal network access for production
- If needed, use a reverse proxy with authentication for external access
2. For this simple deployment, we'll keep Redpanda closed to the external world with no authentication required,
as it's only accessible from within the Docker network.
### Configuring Temporal (Optional)
If your Moose application uses Temporal for workflow orchestration, the configuration above includes all necessary services based on the
[official Temporal Docker Compose examples](https://github.com/temporalio/docker-compose).
If you're not using Temporal, simply remove the Temporal-related services (postgresql, temporal, temporal-ui)
and environment variables from the docker-compose.yml file.
#### Temporal Deployment Process: From Setup to Production
Deploying Temporal involves a two-phase process: initial setup followed by production operation. Here are step-by-step instructions for each phase:
##### Phase 1: Initial Setup
1. **Start the PostgreSQL database**:
```bash
docker compose up -d postgresql
```
2. **Wait for PostgreSQL to be healthy** (check the status):
```bash
docker compose ps postgresql
```
Look for `healthy` in the output before proceeding.
3. **Start Temporal with auto-setup**:
```bash
docker compose up -d temporal
```
During this phase, Temporal's auto-setup will:
- Create the necessary PostgreSQL databases
- Initialize the schema tables
- Register the default namespace (moose-workflows)
4. **Verify Temporal server is running**:
```bash
docker compose ps temporal
```
5. **Start the Admin Tools and UI**:
```bash
docker compose up -d temporal-admin-tools temporal-ui
```
6. **Create the namespace manually**:
```bash
# Register the moose-workflows namespace with a 3-day retention period
docker compose exec temporal-admin-tools tctl namespace register --retention 72h moose-workflows
```
Verify that the namespace was created:
```bash
# List all namespaces
docker compose exec temporal-admin-tools tctl namespace list
# Describe your namespace
docker compose exec temporal-admin-tools tctl namespace describe moose-workflows
```
You should see details about the namespace including its retention policy.
##### Phase 2: Transition to Production
After successful initialization, modify your configuration for production use:
1. **Stop Temporal services**:
```bash
docker compose stop temporal temporal-ui temporal-admin-tools
```
2. **Edit your docker-compose.yml file** to:
- Change image from `temporalio/auto-setup` to `temporalio/server`
- Set `SKIP_SCHEMA_SETUP=true`
Example change:
```yaml
# From:
image: temporalio/auto-setup:${TEMPORAL_VERSION}
# To:
image: temporalio/server:${TEMPORAL_VERSION}
# And change:
- AUTO_SETUP=true
- SKIP_SCHEMA_SETUP=false
# To:
- AUTO_SETUP=false
- SKIP_SCHEMA_SETUP=true
```
3. **Restart services with production settings**:
```bash
docker compose up -d temporal temporal-ui temporal-admin-tools
```
4. **Verify services are running with new configuration**:
```bash
docker compose ps
```
## Starting and Managing the Service
### Starting the Services
Start all services with Docker Compose:
```bash
docker compose up -d
```
### Setting Up Systemd Service for Docker Compose
For production, create a systemd service to ensure Docker Compose starts automatically on system boot:
1. Create a systemd service file:
```bash
sudo vim /etc/systemd/system/moose-stack.service
```
2. Add the following configuration (adjust paths as needed):
```
[Unit]
Description=Moose Stack
Requires=docker.service
After=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/path/to/your/compose/directory
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
```
3. Enable and start the service:
```bash
sudo systemctl enable moose-stack.service
sudo systemctl start moose-stack.service
```
## Deployment Workflow
You get a smooth deployment process with these options:
### Automated Deployment with CI/CD
1. Set up a CI/CD pipeline using GitHub Actions (if runner is configured)
2. When code is pushed to your repository:
- The GitHub Actions runner builds your Moose application
- Updates the Docker image
- Deploys using Docker Compose
### Manual Deployment
Alternatively, for manual deployment:
1. Copy the latest version of the code to the machine
2. Run `moose build`
3. Update the Docker image tag in your docker-compose.yml
4. Restart the stack with `docker compose up -d`
## Monitoring and Maintenance
No more worrying about unexpected outages or performance issues. Set up proper monitoring:
- Set up log monitoring with a tool like [Loki](https://grafana.com/oss/loki/)
- Regularly backup your volumes (especially ClickHouse data)
- Monitor disk space usage
- Set up alerting for service health
---
## Monitoring your Moose App
Source: moose/deploying/monitoring.mdx
This content has moved to the unified Observability page
> This page has moved. See the unified [/moose/metrics](/moose/metrics) page for observability across development and production.
---
## Packaging Moose for deployment
Source: moose/deploying/packaging-moose-for-deployment.mdx
Packaging Moose for deployment
# Packaging Moose for Deployment
Once you've developed your Moose application locally, you can package it for deployment to your on-prem or cloud infrastructure.
The first step is to navigate (`cd`) to your moose project in your terminal.
```txt filename="Terminal" copy
cd my-moose-project
```
The Moose CLI you've used to build your Moose project also has a handy flag that will automate the packaging and building of your project into docker images.
```txt filename="Terminal" copy
moose build --docker
```
After the above command completes you can view your newly created docker files by running the `docker images` command:
```txt filename="Terminal" copy
>docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
moose-df-deployment-aarch64-unknown-linux-gnu latest c50674c7a68a About a minute ago 155MB
moose-df-deployment-x86_64-unknown-linux-gnu latest e5b449d3dea3 About a minute ago 163MB
```
> Notice that you get two `moose-df-deployment` containers, one for the `aarch64` (ARM64) architecture and another for the `x86_64` architecture. This is necessary to allow you to choose the version that matches your cloud or on-prem machine architecture.
You can then use standard docker commands to push your new project images to your container repository of choice.
First tag your local images:
```txt filename="Terminal" copy
docker tag moose-df-deployment-aarch64-unknown-linux-gnu:latest {your-repo-user-name}/moose-df-deployment-aarch64-unknown-linux-gnu:latest
docker tag moose-df-deployment-x86_64-unknown-linux-gnu:latest {your-repo-user-name}/moose-df-deployment-x86_64-unknown-linux-gnu:latest
```
Then `push` your files to your container repository.
```txt filename="Terminal" copy
docker push {your-repo-user-name}/moose-df-deployment-aarch64-unknown-linux-gnu:latest
docker push {your-repo-user-name}/moose-df-deployment-x86_64-unknown-linux-gnu:latest
```
You can also use the following handy shell script to automate the steps above.
```bash filename="push.sh" copy
#!/bin/bash
version=$2
if [ -z "$1" ]
then
echo "You must specify the dockerhub repository as an argument. Example: ./push.sh container-repo-name"
echo "Note: you can also provide a second argument to supply a specific version tag - otherwise this script will use the same version as the latest moose-cli on Github."
exit 1
fi
if [ -z "$2" ]
then
output=$(npx @514labs/moose-cli -V)
version=$(echo "$output" | sed -n '2p' | awk '{print $2}')
fi
echo "Using version: $version"
arch="moose-df-deployment-aarch64-unknown-linux-gnu"
docker tag $arch:$version $1/$arch:$version
docker push $1/$arch:$version
arch="moose-df-deployment-x86_64-unknown-linux-gnu"
docker tag $arch:$version $1/$arch:$version
docker push $1/$arch:$version
```
---
## Preparing access to ClickHouse, Redis, Temporal and Redpanda
Source: moose/deploying/preparing-clickhouse-redpanda.mdx
Preparing access to ClickHouse, Redis, Temporal and Redpanda
# Preparing access to ClickHouse, Redis, Temporal and Redpanda
Your hosted Moose application requires access to hosted ClickHouse and Redis service instances. You can also optionally use Redpanda for event streaming.
You can stand up open source versions of these applications within your environments or opt to use cloud-hosted versions available at:
- [ClickHouse Cloud](https://clickhouse.com)
- [Redis Cloud](https://redis.com)
- [Redpanda Cloud](https://redpanda.com)
- [Temporal Cloud](https://temporal.io)
## ClickHouse Configuration
If you're using `state_config.storage = "clickhouse"` in your config (serverless mode without Redis), your ClickHouse instance must support the **KeeperMap** table engine. This is used for migration state storage and distributed locking.
✅ **ClickHouse Cloud**: Supported by default
✅ **`moose dev` / `moose prod`**: Already configured in our Docker setup
⚠️ **Self-hosted ClickHouse**: See [ClickHouse KeeperMap documentation](https://clickhouse.com/docs/en/engines/table-engines/special/keeper-map) for setup requirements
If you're using Redis for state storage (`state_config.storage = "redis"`), you don't need KeeperMap.
For ClickHouse, you'll need the following information:
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| DB_NAME | Database name to use | Your branch or application ID |
| USER | Username for authentication | - |
| PASSWORD | Password for authentication | - |
| HOST | Hostname or IP address | - |
| HOST_PORT | HTTPS port | 8443 |
| USE_SSL | Whether to use SSL (1 for true, 0 for false) | 1 |
| NATIVE_PORT | Native protocol port | 9440 |
These values are used to configure the Moose application's connection to ClickHouse through environment variables following this pattern:
```
MOOSE_CLICKHOUSE_CONFIG__=
```
For example:
```
MOOSE_CLICKHOUSE_CONFIG__DB_NAME=myappdb
MOOSE_CLICKHOUSE_CONFIG__HOST=myclickhouse.example.com
MOOSE_CLICKHOUSE_CONFIG__USE_SSL=1
MOOSE_CLICKHOUSE_CONFIG__HOST_PORT=8443
MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT=9440
```
## Redis Configuration
Moose requires Redis for caching and as a message broker. You'll need the following configuration:
| Parameter | Description |
|-----------|-------------|
| URL | Redis connection URL |
| KEY_PREFIX | Prefix for Redis keys to isolate namespaces |
These values are configured through:
```
MOOSE_REDIS_CONFIG__URL=redis://username:password@redis.example.com:6379
MOOSE_REDIS_CONFIG__KEY_PREFIX=myapp
```
## Temporal Configuration (Optional)
Temporal is an optional workflow orchestration platform that can be used with Moose. If you choose to use Temporal, you'll need the following configuration:
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| CA_CERT | Path to CA certificate | /etc/ssl/certs/ca-certificates.crt |
| API_KEY | Temporal Cloud API key | - |
| TEMPORAL_HOST | Temporal Cloud namespace host | Your namespace + .tmprl.cloud |
These values are configured through:
```
MOOSE_TEMPORAL_CONFIG__CA_CERT=/etc/ssl/certs/ca-certificates.crt
MOOSE_TEMPORAL_CONFIG__API_KEY=your-temporal-api-key
MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST=your-namespace.tmprl.cloud
```
## Redpanda Configuration (Optional)
Redpanda is an optional component that can be used for event streaming. If you choose to use Redpanda, you'll need the following information:
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| BROKER | Bootstrap server address | - |
| NAMESPACE | Namespace for isolation (often same as branch or app ID) | - |
| MESSAGE_TIMEOUT_MS | Message timeout in milliseconds | 10043 |
| SASL_USERNAME | SASL username for authentication | - |
| SASL_PASSWORD | SASL password for authentication | - |
| SASL_MECHANISM | SASL mechanism | SCRAM-SHA-256 |
| SECURITY_PROTOCOL | Security protocol | SASL_SSL |
| REPLICATION_FACTOR | Topic replication factor | 3 |
These values are used to configure the Moose application's connection to Redpanda through environment variables following this pattern:
```
MOOSE_REDPANDA_CONFIG__=
```
For example:
```
MOOSE_REDPANDA_CONFIG__BROKER=seed-5fbcae97.example.redpanda.com:9092
MOOSE_REDPANDA_CONFIG__NAMESPACE=myapp
MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL=SASL_SSL
MOOSE_REDPANDA_CONFIG__SASL_MECHANISM=SCRAM-SHA-256
MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR=3
```
## Using Environment Variables in Deployment
When deploying your Moose application, you'll need to pass these configurations as environment variables.
Refer to the deployment guides for your specific platform (Kubernetes, ECS, etc.) for details on how to securely
provide these values to your application.
---
## getting-started/from-clickhouse
Source: moose/getting-started/from-clickhouse.mdx
# Use Moose with Your Existing ClickHouse
## What This Guide Does
This guide sets up a local ClickHouse development environment that mirrors your production database and enables code-first schema management:
1. **Introspect** your remote ClickHouse tables and generate TypeScript/Python data models
2. **Create** a local ClickHouse instance with your exact table schemas
3. **Seed** your local database with production data (optional)
4. **Build** APIs and pipelines on top of your ClickHouse data in code
## How It Works
**Local Development:**
- Your production ClickHouse remains untouched
- You get a local ClickHouse instance that copies your remote table schemas
- All development happens locally with hot-reload
**Production Deployment:**
- When you deploy your code, it connects to your remote ClickHouse
- Any new tables, materialized views, or schema changes you create in code are automatically migrated to your target database
- Your existing data and tables remain intact
## Prerequisites
## Step 1: Install Moose
Install the Moose CLI globally to your system:
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
After installation, you'll use `moose init` to create a new project that automatically connects to your ClickHouse and generates all the code you need.
## Step 2: Create Your Project
Use the ClickHouse Playground tab to try it out!
```bash filename="Initialize new project" copy
# Option 1: Provide connection string directly
moose init my-project --from-remote --language python
# Option 2: Run without connection string for interactive setup
moose init my-project --from-remote --language python
```
**Connection String Format:**
```
https://username:password@host:port/?database=database_name
```
If you don't provide a connection string, Moose will guide you through an interactive setup process where you'll be prompted to enter:
- **Host and port** (e.g., `https://your-service-id.region.clickhouse.cloud:8443`)
- **Username** (usually `default`)
- **Password** (your ClickHouse password)
- **Database name** (optional, defaults to `default`)
This is perfect if you're not sure about your connection details or prefer a guided experience.
Moose will create a complete project structure with:
- **Data models**: TypeScript/Python classes for every table in your ClickHouse
- **Type definitions**: Full type safety for all your data
- **Development environment**: Local ClickHouse instance that mirrors your production schema
- **Build tools**: Everything configured and ready to go
- Make sure you are using the `HTTPS` connection string, not the `HTTP` connection string.
- Make sure the port is correct. For `HTTPS` the default is `8443`
- The default username is `default`
See the section: Connect to your remote ClickHouse.
```bash filename="Initialize new project" copy
# Generate code models from your existing ClickHouse tables
moose init my-project --from-remote https://explorer:@play.clickhouse.com:443/?database=default --language python
```
```bash filename="Install dependencies" copy
cd my-project
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
You should see: `Successfully generated X models from ClickHouse tables`
### Explore Your Generated Models
Check what Moose created from your tables in the `app/main.py` file:
Your generated table models are imported here so Moose can detect them.
### If your database includes ClickPipes/PeerDB (CDC) tables
As noted above, when you use `moose init --from-remote`, Moose introspects your database. If it detects CDC‑managed tables (e.g., PeerDB/ClickPipes with fields like `_peerdb_synced_at`, `_peerdb_is_deleted`, `_peerdb_version`), it marks those as `EXTERNALLY_MANAGED` and writes them into a dedicated external models file. Your root file is updated to load these models automatically.
This separation is a best‑effort by the CLI to keep clearly CDC‑owned tables external. For other tables you don’t want Moose to manage, set the lifecycle to external and move them into the external file. See:
- [External Tables](/moose/olap/external-tables) documentation for more information on how external tables work.
- [DB Pull](/moose/olap/db-pull) for keeping models in sync with the remote schema.
## Step 3: Start Development
Start your development server. This spins up a local ClickHouse instance that perfectly mirrors your production schema:
```bash filename="Start your dev server" copy
moose dev
```
**What happens when you run `moose dev`:**
- 🏗️ Creates a local ClickHouse instance with your exact table schemas in your project code
- 🔄 Hot-reloads migrations to your local infrastructure as you save code changes
- 🚀 Starts a web server for building APIs
Your production ClickHouse remains completely untouched. This is a separate, local development environment.
```txt
Created docker compose file
⡗ Starting local infrastructure
Successfully started containers
Validated clickhousedb-1 docker container
Validated redpanda-1 docker container
Successfully validated red panda cluster
Validated temporal docker container
Successfully ran local infrastructure
Node Id: my-analytics-app::b15efaca-0c23-42b2-9b0c-642105f9c437
Starting development mode
Watching "/path/to/my-analytics-app/app"
Started Webserver.
Next Steps
💻 Run the moose 👉 `ls` 👈 command for a bird's eye view of your application and infrastructure
📥 Send Data to Moose
Your local development server is running at: http://localhost:4000/ingest
```
Don't see this output? [Check out the troubleshooting section](#troubleshooting)
### Seed Your Local Database (Optional)
Copy real data from your production ClickHouse to your local development environment. This gives you realistic data to work with during development.
**Why seed?** Your local database starts empty. Seeding copies real data so you can:
- Test with realistic data volumes
- Debug with actual production data patterns
- Develop features that work with real data structures
```bash filename="Terminal" copy
moose seed clickhouse --connection-string --limit 100000
```
**Connection String Format:**
The connection string must use ClickHouse native protocol:
```bash
# ClickHouse native protocol (secure connection)
clickhouse://username:password@host:9440/database
```
**Note:** Data transfer uses ClickHouse's native TCP protocol via `remoteSecure()`. The remote server must have the native TCP port accessible. The command automatically handles table mismatches gracefully.
```bash filename="Terminal" copy
moose seed clickhouse --connection-string clickhouse://explorer:@play.clickhouse.com:9440/default --limit 100000
```
```bash filename="Terminal" copy
# You can omit --connection-string by setting an env var
export MOOSE_SEED_CLICKHOUSE_URL='clickhouse://username:password@host:9440/database'
# copy a limited number of rows (batched under the hood)
moose seed clickhouse --limit 100000
```
- `--limit` and `--all` are mutually exclusive
- `--all` can be used to copy the entire table(s), use with caution as it can be very slow and computationally intensive.
- Large copies are automatically batched to avoid remote limits; you’ll see per-batch progress.
- If you stop with Ctrl+C, the current batch finishes and the command exits gracefully.
**Expected Output:**
```bash
✓ Database seeding completed
Seeded 'local_db' from 'remote_db'
✓ table1: copied from remote
⚠️ table2: skipped (not found on remote)
✓ table3: copied from remote
```
**Troubleshooting:**
- Tables that don't exist on remote are automatically skipped with warnings
- Use `--table ` to seed a specific table that exists in both databases
- Check `moose ls table` to see your local tables
## Step 4: Build Your First API
Now that you have your data models, let's build something useful! You can create APIs, materialized views, and applications with full type safety.
- **REST APIs** that expose your ClickHouse data to frontend applications
- **Materialized Views** for faster queries and aggregations
- **Streaming pipelines** for real-time data processing
- **Full-stack applications** with your ClickHouse data as the backend
### Add APIs
Build REST APIs on top of your existing tables to expose your data to your user-facing apps. This is a great way to get started with Moose without changing any of your existing pipelines.
Check out the MooseAPI module for more information on building APIs with Moose.
### Build Materialized Views
Build materialized views on top of your existing tables to improve query performance. If you have Materialized Views in your ClickHouse, you can use Moose to build new Materialized Views on top of your existing tables, or to migrate your existing Materialized Views to Moose.
Check out the MooseOLAP module for more information on building Materialized Views with Moose.
## Known Limitations
Some advanced ClickHouse features may not be fully supported yet. Join the Moose Slack and let us know if you have any issues, feedback, or requests.
**What we're working on:**
- **Selective table import** (currently imports all tables)
- **Default value annotations**
## Troubleshooting
### Error: Failed to connect to ClickHouse
This guide shows exactly where to find your host, port, username, and password, and how to construct a valid HTTPS connection string.
1. Log into your [ClickHouse Cloud console](https://clickhouse.cloud/)
2. Open your service details page
3. Click "Connect" in the sidebar
4. Select the `HTTPS` tab and copy the values shown
- **Host**: e.g. `your-service-id.region.clickhouse.cloud`
- **Port**: usually `8443`
- **Username**: usually `default`
- **Password**: the password you configured
5. Build your connection string:
```txt
https://USERNAME:PASSWORD@HOST:PORT/?database=DATABASE_NAME
```
6. Example (with placeholders):
```txt
https://default:your_password@your-service-id.region.clickhouse.cloud:8443/?database=default
```
7. Optional: Test with curl
```bash
curl --user "USERNAME:PASSWORD" --data-binary "SELECT 1" https://HOST:PORT
```
### Self-hosted or Docker
- Check your server config (usually `/etc/clickhouse-server/config.xml`)
- `` default: `8123`
- `` default: `8443`
- Check users in `/etc/clickhouse-server/users.xml` or `users.d/`
- For Docker, check environment variables in your compose/run config:
- `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, `CLICKHOUSE_DB`
Build the HTTPS connection string with your values:
```txt
https://USERNAME:PASSWORD@HOST:8443/?database=DB
```
If you only have HTTP enabled, enable HTTPS or use an HTTPS proxy; Moose init expects an HTTPS URL for remote introspection.
### `moose dev` fails to start
Double check Docker is running and you do not have any port conflicts.
- ClickHouse local runs on port `18123`
- Your local webserver runs on port `4000`
- Your local management API runs on port `5001`
## What's Next?
---
## 5-Minute Quickstart
Source: moose/getting-started/quickstart.mdx
Build your first analytical backend with Moose in 5 minutes
# 5-Minute Quickstart
## Prerequisites
Check that your pre-requisites are installed by running the following commands:
```bash filename="Terminal" copy
python --version
```
```bash filename="Terminal" copy
docker ps
```
Skip the tutorial and add Moose as a layer on top of your existing database
## Step 1: Install Moose (30 seconds)
### Run the installation script
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
You should see this message: `Moose vX.X.X installed successfully!` (note that X.X.X is the actual version number)
If you see an error instead, check [Troubleshooting](#need-help) below.
### Reload your shell configuration
**This step is required.** Your current terminal doesn't know about the `moose` command yet.
```bash filename="Terminal" copy
source ~/.zshrc
```
If `echo $SHELL` showed `/bin/bash` or `/usr/bin/bash`:
```bash filename="Terminal" copy
source ~/.bashrc
```
### Verify moose command works
```bash filename="Terminal" copy
moose --version
```
You should see:
```txt
moose X.X.X
```
**Try these steps in order:**
1. Re-run the correct `source` command for your shell
2. Close this terminal completely and open a new terminal window
3. Run `moose --version` again
4. If still failing, see [Troubleshooting](#need-help)
You should see the moose version number. Do not proceed to Step 2 until `moose --version` works.
## Step 2: Create Your Project (1 minute)
### Initialize your project
```bash filename="Terminal" copy
moose init my-analytics-app python
```
You should see output like:
```txt
✓ Created my-analytics-app
✓ Initialized Python project
```
### Navigate to your project directory
```bash filename="Terminal" copy
cd my-analytics-app
```
A virtual environment isolates your project's dependencies. We recommend creating one for your project.
**Create a virtual environment (Recommended)**
```bash filename="Terminal" copy
python3 -m venv .venv
```
**activate your virtual environment(Recommended)**
```bash filename="Terminal" copy
source .venv/bin/activate
```
This creates a `.venv` folder and activates it. Your terminal prompt should now look something like this:
```txt
(.venv) username@computer my-analytics-app %
```
### Install dependencies
```bash filename="Terminal" copy
pip install -r requirements.txt
```
**Wait for installation to complete.** You should see successful installation messages ending with:
```txt
Successfully installed [list of packages]
```
You should see `(.venv)` in your prompt and dependencies installed with no errors.
### Start your development environment
```bash filename="Terminal" copy
moose dev
```
Moose is:
- Downloading Docker images for ClickHouse, Redpanda, and Temporal
- Starting containers
- Initializing databases
- Starting the development server
Do not proceed until you see the "Started Webserver" message.
```txt
Created docker compose file
⡗ Starting local infrastructure
Successfully started containers
Validated clickhousedb-1 docker container
Validated redpanda-1 docker container
Successfully validated red panda cluster
Validated temporal docker container
Successfully ran local infrastructure
Node Id: my-analytics-app::b15efaca-0c23-42b2-9b0c-642105f9c437
Starting development mode
Watching "/path/to/my-analytics-app/app"
Started Webserver. 👈 WAIT FOR THIS
Next Steps
💻 Run the moose 👉 `ls` 👈 command for a bird's eye view of your application and infrastructure
📥 Send Data to Moose
Your local development server is running at: http://localhost:4000/ingest
```
Keep this terminal running. This is your Moose development server. You'll open a new terminal for the next step.
## Step 3: Understand Your Project (1 minute)
Your project includes a complete example pipeline:
**Important:** While your pipeline objects are defined in the child folders, they **must be imported** into the root `main.py` file for the Moose CLI to discover and use them.
```python filename="app/main.py"
from app.ingest.models import * # Data models & pipelines
from app.ingest.transform import * # Transformation logic
from app.apis.bar import * # API endpoints
from app.views.bar_aggregated import * # Materialized views
from app.workflows.generator import * # Background workflows
```
## Step 4: Test Your Pipeline (2 minutes)
**Keep your `moose dev` terminal running.** You need a second terminal for the next commands.
**macOS Terminal:**
- Press `Cmd+N` for a new window, or
- Right-click Terminal icon in dock → New Window
**VSCode:**
- Click the `+` button in the terminal panel, or
- Press `Ctrl+Shift+` ` (backtick)
**Linux Terminal:**
- Press `Ctrl+Shift+N`, or
- Use your terminal's File → New Window menu
### Navigate to your project in the new terminal
In your **new terminal window** (not the one running `moose dev`):
```bash filename="Terminal 2 (New Window)" copy
cd my-analytics-app
```
If not automatically activated, activate the virtual environment:
```bash filename="Terminal 2 (New Window)" copy
source .venv/bin/activate
```
### Run the data generator workflow
Your project comes with a pre-built [Workflow](../workflows) called `generator` that acts as a **data simulator**:
```bash filename="Terminal 2 (New Window)" copy
moose workflow run generator
```
You should see:
```txt
Workflow 'generator' triggered successfully
```
- Generates 1000 fake records with realistic data (using the Faker library)
- Sends each record to your ingestion API via HTTP POST
- Runs as a background task managed by Temporal
- Helps you test your entire pipeline without needing real data
You can see the code in the `/workflows/generator.py` file.
### Watch for data processing logs
**Switch to your first terminal** (where `moose dev` is running). You should see new logs streaming:
```txt
POST ingest/Foo
[POST] Data received at ingest API sink for Foo
Received Foo_0_0 -> Bar_0_0 1 message(s)
[DB] 17 row(s) successfully written to DB table (Bar)
```
These logs show your pipeline working: Workflow generates data → Ingestion API receives it → Data transforms → Writes to ClickHouse
**If you don't see logs after 30 seconds:**
- Verify `moose dev` is still running in Terminal 1
- Check Terminal 2 for error messages from the workflow command
- Run `docker ps` to verify containers are running
The workflow runs in the background, powered by [Temporal](https://temporal.io). You can see workflow status at `http://localhost:8080`.
```bash filename="Terminal" copy
moose peek Bar --limit 5 # This queries your Clickhouse database to show raw data; useful for debugging / verification
```
You should see output like:
```txt
┌─primaryKey─────────────────────────┬─utcTimestamp────────┬─hasText─┬─textLength─┐
│ 123e4567-e89b-12d3-a456-426614174000 │ 2024-01-15 10:30:00 │ 1 │ 42 │
│ 987fcdeb-51a2-43d1-b789-123456789abc │ 2024-01-15 10:31:00 │ 0 │ 0 │
└────────────────────────────────────┴─────────────────────┴─────────┴────────────┘
```
If you see 0 rows, wait a few seconds for the workflow to process data, then try again.
### Query your data
Your application has a pre-built [API](../apis) that reads from your database. The API runs on `localhost:4000`.
**In Terminal 2**, call the API with `curl`:
```bash filename="Terminal 2 (New Window)" copy
curl "http://localhost:4000/api/bar"
```
You should see JSON data like:
```json
[
{
"dayOfMonth": 15,
"totalRows": 67,
"rowsWithText": 34,
"maxTextLength": 142,
"totalTextLength": 2847
},
{
"dayOfMonth": 14,
"totalRows": 43,
"rowsWithText": 21,
"maxTextLength": 98,
"totalTextLength": 1923
}
]
```
You should see JSON data with analytics results. Your complete data pipeline is working!
**Try query parameters:**
```bash filename="Terminal 2 - Add filters and limits" copy
curl "http://localhost:4000/api/bar?limit=5&orderBy=totalRows"
```
- **Port 4000**: Your Moose application webserver (all APIs are running on this port)
- **Port 8080**: Temporal UI dashboard (workflow management)
- **Port 18123**: ClickHouse HTTP interface (direct database access)
**If the workflow command doesn't work:**
- Make sure you're in the project directory (`cd my-analytics-app`)
- Verify `moose dev` is still running in your first terminal
- Check that Docker containers are running: `docker ps`
**If curl returns an error:**
- Verify the URL is `http://localhost:4000` (not 8080)
- Make sure the workflow has had time to generate data (wait 30-60 seconds)
- Check your `moose dev` terminal for error messages
**If you get HTML instead of JSON:**
- You might be hitting the wrong port - use 4000, not 8080
- Port 8080 serves the Temporal UI (workflow dashboard), not your API
**If `moose peek Bar` shows 0 rows:**
- Wait for the workflow to complete (it processes 1000 records)
- Check the workflow is running: look for "Ingested X records..." messages
- Verify no errors in your `moose dev` terminal logs
**If you see connection refused:**
- Restart `moose dev` and wait for "Started Webserver" message
- Check if another process is using port 4000: `lsof -i :4000`
1. Install the [OpenAPI (Swagger) Viewer extension](https://marketplace.cursorapi.com/items?itemName=42Crunch.vscode-openapi) in your IDE
2. Open `.moose/openapi.yaml` in your IDE
3. Click the "Preview" icon to launch the interactive API explorer
4. Test the `POST /ingest/Foo` and `GET /api/bar` endpoints
## Step 5: Hot Reload Schema Changes (1 minute)
1. Open `app/ingest/models.py`
2. Add a new field to your data model:
```python filename="app/ingest/models.py" {16} copy
from moose_lib import Key, StringToEnumMixin
from typing import Optional, Annotated
from enum import IntEnum, auto
from pydantic import BaseModel
class Baz(StringToEnumMixin, IntEnum):
QUX = auto()
QUUX = auto()
class Bar(BaseModel):
primary_key: Key[str]
utc_timestamp: datetime
baz: Baz
has_text: bool
text_length: int
new_field: Optional[str] = None # New field
```
3. Save the file and watch your terminal
**Switch to Terminal 1** (where `moose dev` is running). You should see Moose automatically update your infrastructure:
```txt
⠋ Processing Infrastructure changes from file watcher
~ Table Bar:
Column changes:
+ new_field: String
```
You should see the column change logged. Your API, database schema, and streaming topic all updated automatically!
**Try it yourself:** Add another field with a different data type and watch the infrastructure update in real-time.
## Recap
You've built a complete analytical backend with:
## Need Help?
**Docker not running:**
```bash filename="Terminal" copy
# macOS
open -a Docker
# Linux
sudo systemctl start docker
# Verify Docker is running
docker ps
```
**Docker out of space:**
```bash filename="Terminal" copy
docker system prune -a
```
**Python version too old:**
```bash filename="Terminal" copy
# Check version
python3 --version
# Install Python 3.12+ with pyenv
curl https://pyenv.run | bash
pyenv install 3.12
pyenv local 3.12
```
**Port 4000 already in use:**
```bash filename="Terminal" copy
# Find what's using port 4000
lsof -i :4000
# Kill the process (replace PID)
kill -9
# Or use a different port
moose dev --port 4001
```
**Permission denied:**
```bash filename="Terminal" copy
# Fix Docker permissions (Linux)
sudo usermod -aG docker $USER
newgrp docker
# Fix file permissions
chmod +x ~/.moose/bin/moose
```
**Port 4000 already in use:**
```bash filename="Terminal" copy
# Find what's using port 4000
lsof -i :4000
# Kill the process (replace PID)
kill -9
# Or use a different port
moose dev --port 4001
```
**Permission denied:**
```bash filename="Terminal" copy
# Fix Docker permissions (Linux)
sudo usermod -aG docker $USER
newgrp docker
# Fix file permissions
chmod +x ~/.moose/bin/moose
```
**Still stuck?** Join our [Slack community](https://join.slack.com/t/moose-community/shared_invite/zt-2fjh5n3wz-cnOmM9Xe9DYAgQrNu8xKxg) or [open an issue](https://github.com/514-labs/moose/issues).
---
## Minimum Requirements
Source: moose/help/minimum-requirements.mdx
Minimum Requirements for Moose
## Development Setup
The development setup has higher requirements because Moose runs locally along with all its dependencies (Redpanda, ClickHouse, Temporal, Redis).
- **CPU:** 12 cores
- **Memory:** 18GB
- **Disk:** >500GB SSD
- **OS:**
- Windows with Linux subsystem (Ubuntu preferred)
- Linux (Debian 10+, Ubuntu 18.10+, Fedora 29+, CentOS/RHEL 8+)
- Mac OS 13+
- **Runtime:** Python 3.12+ or Node.js 20+, Docker 24.0.0+, and Docker Compose 2.23.1+
## Production Setup
The production setup has lower requirements, as external components (Redpanda, ClickHouse, Redis, and Temporal) are assumed to be deployed separately.
- **CPU:** 1vCPU
- **Memory:** 6GB
- **Disk:** >30GB SSD
- **OS:**
- Windows with Linux subsystem (Ubuntu preferred)
- Linux (Debian 10+, Ubuntu 18.10+, Fedora 29+, CentOS/RHEL 8+)
- Mac OS 13+
- **Runtime:** Python 3.12+ or Node.js 20+
---
## Troubleshooting
Source: moose/help/troubleshooting.mdx
Troubleshooting for Moose
# Troubleshooting
Common issues and their solutions when working with Moose.
## Development Environment
### Issue: `moose dev` fails to start
**Possible causes and solutions:**
1. **Port conflicts**
- Check if ports 4000-4002 are already in use
- Solution: Kill the conflicting processes or configure different ports
```bash
# Find processes using ports
lsof -i :4000-4002
# Kill process by PID
kill
2. **Missing dependencies**
- Solution: Ensure all dependencies are installed
```bash
pip install .
```
3. **Docker not running**
- Solution: Start Docker Desktop or Docker daemon
```bash
# Check Docker status
docker info
# Start Docker on Linux
sudo systemctl start docker
```
## Data Ingestion
### Issue: Data not appearing in tables
1. **Validation errors**
- Check logs for validation failures
- Solution: Ensure data matches schema
```bash filename="Terminal" copy
moose logs
```
2. **Stream processing errors**
- Solution: Check transform functions for errors
```bash filename="Terminal" copy
moose logs --filter functions
```
3. **Database connectivity**
- Solution: Verify database credentials in `.moose/config.toml`
```toml filename=".moose/config.toml" copy
[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000
```
## Stream Processing
### Issue: High processing latency
1. **Insufficient parallelism**
- Solution: Increase stream parallelism
```python
from moose_lib import Stream, StreamConfig
stream = Stream[Data]("high_volume", StreamConfig(parallelism=8) )
```
### Issue: Data transformations not working
1. **Transform function errors**
- Solution: Debug transformation logic
```python
# Add logging to transform
def transform(record: Data) -> Data:
print(f"Processing record: {record.id}")
try:
# Your transformation logic
return transformed_record
except Exception as e:
print(f"Transform error: {e}")
return None # Skip record on error
```
## Database Issues
### Issue: Slow queries
1. **Missing or improper indexes**
- Solution: Check orderByFields configuration
```typescript
const table = new OlapTable("slow_table", {
orderByFields: ["frequently_queried_field", "timestamp"]
});
```
2. **Large result sets**
- Solution: Add limits and pagination
```python
# In query API
results = client.query.execute(
# not an f-string, the values are provided in the dict
"""
SELECT * FROM large_table
WHERE category = {category}
LIMIT {limit}
""", {"category": "example", "limit": 100}
)
```
## Deployment Issues
### Issue: Deployment fails
1. **Configuration errors**
- Solution: Check deployment configuration
```bash
# Validate configuration
moose validate --config
```
2. **Resource limitations**
- Solution: Increase resource allocation
```yaml
# In kubernetes manifest
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
```
3. **Permission issues**
- Solution: Verify service account permissions
```bash
# Check permissions
moose auth check
```
### Issue: Migration stuck with "Migration already in progress"
**Cause:** A previous migration was interrupted without releasing its lock.
**Solution:**
1. **Wait 5 minutes** - locks expire automatically
2. **Or manually clear the lock:**
```sql
DELETE FROM _MOOSE_STATE WHERE key = 'migration_lock';
```
3. **Verify it worked:**
```sql
SELECT * FROM _MOOSE_STATE WHERE key = 'migration_lock';
-- Should return no rows
```
The `_MOOSE_STATE` table uses ClickHouse's KeeperMap engine for distributed locking, ensuring only one migration runs at a time across multiple deployments.
## Getting Help
If you can't resolve an issue:
1. Ask for help on the [Moose community Slack channel](https://join.slack.com/t/moose-community/shared_invite/zt-2fjh5n3wz-cnOmM9Xe9DYAgQrNu8xKxg)
2. Search existing [GitHub issues](https://github.com/514-labs/moose/issues)
3. Open a new issue with:
- Moose version (`moose --version`)
- Error messages and logs
- Steps to reproduce
- Expected vs. actual behavior
---
## in-your-stack
Source: moose/in-your-stack.mdx
# Moose In Your Dev Stack
Moose handles the analytical layer of your application stack. The [Area Code](https://github.com/514-labs/area-code) repository contains two working implementations that show how to integrate Moose with existing applications.
## User Facing Analytics (UFA)
UFA shows how to add a dedicated analytics microservice to an existing application without impacting your primary database.
View the open source repository to see the full implementation and clone it on your own machine.
### Data Flow
1. Application writes to Supabase (transactional backend)
2. Supabase Realtime streams changes to Analytical Backend and Retrieval Backend
3. Moose ingest pipeline syncs change events from Redpanda into ClickHouse
4. Frontend queries analytics APIs for dashboards
### Architecture Components
The UFA template demonstrates a microservices architecture with specialized components for different data access patterns:
The user interface for dashboards and application interactions
Technologies: [Vite](https://vite.dev), [React](https://react.dev), [TanStack Query](https://tanstack.com/query), [TanStack Router](https://tanstack.com/router), [Tailwind CSS](https://tailwindcss.com)
Handles CRUD operations and maintains application state
Technologies: [Supabase](https://supabase.com), [Fastify](https://fastify.dev), [Drizzle ORM](https://orm.drizzle.team/)
Fast text search and complex queries across large datasets
Technologies: [Elasticsearch](https://www.elastic.co/) + [Fastify](https://fastify.dev)
High-performance analytical queries and aggregations
Technologies: [ClickHouse](https://clickhouse.com/) + [Moose OLAP](/moose/olap), [Redpanda](https://redpanda.com/) + [Moose Streaming](/moose/streaming), [Moose APIs](/moose/apis)
Keep data synchronized between transactional, retrieval, and analytics systems
Technologies: [Supabase Realtime](https://supabase.com/docs/guides/realtime), [Temporal](https://temporal.io/) + [Moose Workflows](/moose/workflows)
## Operational Data Warehouse (ODW)
ODW shows how to build a centralized data platform that ingests from multiple sources for business intelligence and reporting.
View the open source repository to see the full implementation and clone it on your own machine.
### Data Flow
1. Sources send data to Moose ingestion endpoints
2. Streaming functions validate and transform data
3. Data lands in ClickHouse tables
4. BI tools query via generated APIs or direct SQL
### Architecture Components
Handles incoming data from push-based sources (webhooks, application logs) with validation and transformation
Technologies: [Moose APIs](/moose/apis), [Redpanda](https://redpanda.com/) + [Moose Streaming](/moose/streaming)
Connects to your existing databases, object storage, or third-party APIs
Technologies: [Temporal](https://temporal.io/) + [Moose Workflows](/moose/workflows)
Centralized analytical database for raw and transformed data
Technologies: [ClickHouse](https://clickhouse.com/) + [Moose OLAP](/moose/olap)
Query interface for business intelligence and reporting
Technologies: [Streamlit](https://streamlit.io/) dashboards, [Moose APIs](/moose/apis), [ClickHouse Connect](https://clickhouse.com/docs/en/interfaces/http/connect)
---
## Overview
Source: moose/index.mdx
Modular toolkit for building real-time analytical backends
# MooseStack
Type-safe, code-first tooling for building real-time analytical backends--OLAP Databases, Data Streaming, ETL Workflows, Query APIs, and more.
## Get Started
```bash filename="Install Moose" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
## Everything as Code
Declare all infrastructure (e.g. ClickHouse tables, Redpanda streams, APIs, etc.) and pipelines in pure TypeScript or Python. Your code auto-wires everything together, so no integration boilerplate needed.
```ts filename="Complete Analytical Backend in 1 TS file" copy
interface DataModel {
primaryKey: Key;
name: string;
}
// Create a ClickHouse table
);
// Create an ingest API endpoint
);
// Create analytics API endpoint
interface QueryParams {
limit?: number;
}
: QueryParams, {client, sql}) => {
const result = await client.query.execute(sql`SELECT * FROM ${clickhouseTable} LIMIT ${limit}`);
return await result.json();
}
);
```
```python filename="Complete Analytical Backend in 1 Python file" copy
from moose_lib import Key, OlapTable, Stream, StreamConfig, IngestApi, IngestApiConfig, Api
from pydantic import BaseModel
class DataModel(BaseModel):
primary_key: Key[str]
name: str
# Create a ClickHouse table
clickhouse_table = OlapTable[DataModel]("TableName")
# Create a Redpanda streaming topic
redpanda_topic = Stream[DataModel]("TopicName", StreamConfig(
destination=clickhouse_table,
))
# Create an ingest API endpoint
ingest_api = IngestApi[DataModel]("post-api-route", IngestConfig(
destination=redpanda_topic,
))
# Create a analytics API endpoint
class QueryParams(BaseModel):
limit: int = 10
def handler(client, params: QueryParams):
return client.query.execute("SELECT * FROM {table: Identifier} LIMIT {limit: Int32}", {
"table": clickhouse_table.name,
"limit": params.limit,
})
analytics_api = Api[RequestParams, DataModel]("get-api-route", query_function=handler)
```
## Core Concepts
```ts
interface Event {
id: Key;
name: string;
createdAt: Date;
}
interface AggregatedEvent {
count: number;
name: string;
}
```
```bash
# Start local dev server
moose dev
⡏ Starting local infrastructure
Successfully started containers
Validated clickhousedb-1 docker container
Validated redpanda-1 docker container
Successfully validated red panda cluster
Validated temporal docker container
Successfully ran local infrastructure
```
## Modules
```ts
const table = new OlapTable("events");
const mv = new MaterializedView({
selectStatement: sql`
SELECT count(*) as count, name
FROM ${table}
GROUP BY name
`,
selectTables: [table],
tableName: "events",
materializedViewName: "aggregated_events"
});
```
```ts
const stream = new Stream("events", {
destination: table,
});
stream.addConsumer((event) => {
console.log(event);
});
```
```ts
const etl = new Workflow("my_etl", {
startingTask: startEtl,
schedule: "@every 1h",
retries: 3,
});
```
```ts
const postEvent = new IngestApi("post-event", {
destination: stream,
});
const getEvents = new Api("get-events", {
async handler({limit = 10}, {client, sql}) {
// query database and return results
return await client.query.execute(sql`
SELECT * FROM events LIMIT ${limit}
`);
}
});
```
Each module is independent and can be used on its own. You can start with one capability and incrementally adopt more over time.
## Tooling
```bash
# Build for production
moose build
```
```bash
# Create a plan
moose plan
# Example plan output:
~ Table events with column changes: [
Added(
Column {
name: "status",
data_type: String,
required: true,
unique: false,
primary_key: false,
default: None
})]
and order by changes: OrderByChange {
before: [], after: []
}
```
## Technology Partners
- [ClickHouse](https://clickhouse.com/) (Online Analytical Processing (OLAP) Database)
- [Redpanda](https://redpanda.com/) (Streaming)
- [Temporal](https://temporal.io/) (Workflow Orchestration)
- [Redis](https://redis.io/) (Internal State Management)
Want this managed in production for you? Check out Boreal Cloud (from the makers of the Moose Stack).
---
## LLM-Optimized Documentation
Source: moose/llm-docs.mdx
Language-scoped documentation feeds for AI assistants
# LLM-Optimized Documentation
Moose now publishes lightweight documentation bundles so AI assistants can reason about your project without scraping the entire site. Each docs page includes **LLM View** links for TypeScript and Python, and the CLI exposes HTTP endpoints that deliver pre-compiled reference text.
## Quick links
- TypeScript bundle: `/llm-ts.txt`
- Python bundle: `/llm-py.txt`
- Scoped bundle: append `?path=relative/docs/section` to either endpoint to fetch a specific subsection
You can open these URLs in a browser, pipe them into tooling, or share them with agents such as Claude, Cursor, and Windsurf.
```bash filename="Terminal"
# Fetch the TypeScript bundle for the OLAP docs from the hosted site
curl "https://docs.fiveonefour.com/llm-ts.txt?path=moose/olap/model-table"
```
For project-specific knowledge, combine these static bundles with live context from the [MCP server](/moose/mcp-dev-server).
---
## Development Mode
Source: moose/local-dev.mdx
Local development environment with hot reload and automatic infrastructure management
# Setting Up Your Development Environment
Development mode (`moose dev`) provides a full-featured local environment optimized for rapid iteration and debugging. It automatically manages Docker containers, provides hot reload capabilities, and includes enhanced debugging features.
## Getting Started
```bash
# Start development environment
moose dev
# View your running infrastructure
moose ls
```
## Container Management
Development mode automatically manages Docker containers for your infrastructure:
- **ClickHouse** (when `olap` feature is enabled)
- **Redpanda** (when `streaming_engine` feature is enabled)
- **Temporal** (when `workflows` feature is enabled)
- **Analytics APIs Server** (when `apis` feature is enabled)
- **Redis** (always enabled)
- **MCP Server** (always enabled) - Enables AI-assisted development. [Learn more](/moose/mcp-dev-server)
### Container Configuration
Control which containers start with feature flags:
```toml copy
# moose.config.toml
[features]
olap = true # Enables ClickHouse
streaming_engine = true # Enables Redpanda
workflows = false # Controls Temporal startup
apis = true # Enables Analytics APIs server
```
### Extending Docker Infrastructure
You can extend Moose's Docker Compose configuration with custom services by creating a `docker-compose.dev.override.yaml` file in your project root. This allows you to add additional infrastructure (databases, monitoring tools, etc.) that runs alongside your Moose development environment.
**Do not use docker-compose.dev.override.yaml to modify Moose-managed services** (ClickHouse, Redpanda, Redis, Temporal). The Docker Compose merge behavior makes it difficult to override existing configuration correctly, often leading to conflicts.
Instead, use `moose.config.toml` to configure Moose infrastructure. See [Configuration](/moose/configuration) for all available options including database connections, ports, volumes, and service-specific settings.
Use the override file **only for adding new services** that complement your Moose environment (e.g., PostgreSQL for application data, monitoring tools).
**How it works:**
When you run `moose dev`, Moose automatically detects and merges your override file with the generated Docker Compose configuration. The files are merged using Docker Compose's [standard merge behavior](https://docs.docker.com/compose/how-tos/multiple-compose-files/merge/).
**Example: Adding PostgreSQL for Application Data**
Create a `docker-compose.dev.override.yaml` file in your project root:
```yaml copy filename="docker-compose.dev.override.yaml"
services:
postgres:
image: postgres:16
environment:
POSTGRES_USER: myapp
POSTGRES_PASSWORD: mypassword
POSTGRES_DB: myapp_db
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
volumes:
postgres-data:
```
Now when you run `moose dev`, PostgreSQL will start alongside your other infrastructure. You'll see a message confirming the override file is being used:
```
[moose] Using docker-compose.dev.override.yaml for custom infrastructure
```
**Recommended Use Cases:**
- **Add databases**: PostgreSQL, MySQL, MongoDB for application data
- **Add monitoring**: Grafana, Prometheus for metrics visualization
- **Add custom services**: Additional message queues, caching layers, or development tools
**Not Recommended:**
- Modifying Moose-managed services (ClickHouse, Redpanda, Redis, Temporal)
- Overriding ports, volumes, or environment variables for Moose infrastructure
- Attempting to change database credentials or connection settings
For any Moose infrastructure configuration, use `moose.config.toml` instead. See [Configuration](/moose/configuration).
**Example: Adding Grafana for Monitoring**
```yaml copy filename="docker-compose.dev.override.yaml"
services:
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana-data:/var/lib/grafana
volumes:
grafana-data:
```
When merging files, Docker Compose follows these rules:
- **Services**: Merged by name with values from the override file taking precedence
- **Environment variables**: Appended (both files' values are used)
- **Volumes**: Appended
- **Ports**: Appended (use `!override` tag to replace instead of merge)
See [Docker's merge documentation](https://docs.docker.com/reference/compose-file/merge/) for complete details.
The override file is only used in development mode (`moose dev`). For production deployments, configure your infrastructure separately using your deployment platform's tools.
## Hot Reloading Development
The development runtime includes a file watcher that provides near-instantaneous feedback when you save code changes.
### Watched Files
The file watcher recursively monitors your entire `app/` directory structure and only rebuilds the components that actually changed.
Only the root file in your `app/` directory is run when changes are detected. In order for your tables/streams/apis/workflows to be detected, you must import them in your root file (`main.py`). If you change a file in your app directory and it is a dependency of your root file, then those changes WILL be detected.
## Quick Example
**❌ Doesn't work - No export from root:**
```py file="app/tables/users.py"
from schemas.user import UserSchema
users_table = OlapTable[UserSchema]("Users")
# Moose can't see this - not imported in main.py
```
**✅ Works - Import in main file:**
```py file="app/tables/users.py" {3}
from schemas.user import UserSchema
users_table = OlapTable[UserSchema]("Users") # No export needed - Python modules are automatically discovered
```
```py file="app/main.py"
from tables.users import users_table # Moose sees this
```
Now because we imported the table in the main file, Moose will detect the change and rebuild the table.
**✅ Works - Change dependency:**
```ts file="app/schemas/user.ts" {5}
export interface UserSchema {
id: string;
name: string;
email: string;
age: number; // Adding this triggers migration
}
```
*Moose detects this because `UserSchema` is imported in the root file via the dependency chain.*
Learn more about how Moose handles migrations.
## Script Execution Hooks
You can configure your dev server to run your own shell commands automatically during development. Use these hooks to keep generated artifacts in sync (e.g., refreshing external models, regenerating OpenAPI SDKs).
### Available hooks
- `on_first_start_script`: runs once when the dev server first starts in this process
- `on_reload_complete_script`: runs after each dev server reload when code/infra changes have been fully applied
Configure these in `moose.config.toml` under the `http_server_config` section:
```toml copy
# moose.config.toml
[http_server_config]
# One-time on first start
on_first_start_script = "echo 'dev started'"
# After every code/infra reload completes
on_reload_complete_script = "echo 'reload complete'"
```
Notes:
- Scripts run from your project root using your `$SHELL` (falls back to `/bin/sh`).
- Use `&&` to chain multiple commands or point to a custom script.
- Prefer passing credentials via environment variables or your secret manager.
### Use case: keep external models in sync (DB Pull)
Refresh `EXTERNALLY_MANAGED` table models from a remote ClickHouse on dev start so your local code matches the live schema.
```bash filename="Terminal" copy
export REMOTE_CLICKHOUSE_URL="https://username:password@host:8443/?database=default"
```
```toml copy
# moose.config.toml
[http_server_config]
on_first_start_script = "moose db pull --connection-string $REMOTE_CLICKHOUSE_URL"
```
See the full guide: [/moose/olap/db-pull](/moose/olap/db-pull)
### Use case: regenerate OpenAPI SDKs on reload
Automatically regenerate client SDKs after Moose finishes applying code/infra changes so `.moose/openapi.yaml` is fresh.
```toml copy
# moose.config.toml
[http_server_config]
on_first_start_script = "command -v openapi-generator-cli >/dev/null 2>&1 || npm i -g @openapitools/openapi-generator-cli"
on_reload_complete_script = "openapi-generator-cli generate -i .moose/openapi.yaml -g typescript-fetch -o ./generated/ts"
```
More examples: [/moose/apis/openapi-sdk](/moose/apis/openapi-sdk)
## Local Infrastructure
### Port Allocation
Development mode uses the following default ports:
- **4000**: Main API server
- **5001**: Management API (health checks, metrics, admin, OpenAPI docs)
### Service URLs
Access your development services at:
```bash
# Main application
http://localhost:4000
# Management interface
curl http://localhost:5001/metrics
# OpenAPI documentation
http://localhost:5001/openapi.yaml
```
### Container Networking
All containers run in an isolated Docker network with automatic service discovery:
- Containers communicate using service names
- Port mapping only for external access
- Automatic DNS resolution between services
### MCP Server for AI-Assisted Development
Development mode includes a built-in Model Context Protocol (MCP) server that lets AI assistants interact with your local infrastructure through natural language.
**What you can do:**
- Query your ClickHouse database with natural language
- Inspect streaming topics and messages
- Search and filter development logs
- Explore your infrastructure map
**Quick setup:**
The MCP server runs automatically at `http://localhost:4000/mcp`. For Claude Code, just run:
```bash copy
claude mcp add --transport http moose-dev http://localhost:4000/mcp
```
For other AI clients (Windsurf, VS Code, Cursor, Claude Desktop), see the [full setup guide](/moose/mcp-dev-server).
**Example prompts:**
- *"What errors are in the logs?"*
- *"What tables exist in my project?"*
- *"Show me the schema of all tables"*
- *"Sample 5 messages from the Foo stream"*
See the complete guide for all available tools, detailed configuration for each AI client, and example workflows.
## Troubleshooting
### Common Issues
**Container Startup Failures**
```bash
# Check Docker is running
docker info
# View container logs
moose logs
```
**Port Conflicts**
```bash
# Check what's using your ports
lsof -i :4000
lsof -i :5001
# Use custom ports
export MOOSE_HTTP_PORT=4040
export MOOSE_MANAGEMENT_PORT=5010
moose dev
```
---
## MCP Server
Source: moose/mcp-dev-server.mdx
Built-in Model Context Protocol server for AI-assisted development
# MCP Server for AI-Assisted Development
The Moose development server includes a built-in Model Context Protocol (MCP) server that enables AI agents and IDEs to interact directly with your local development infrastructure. This allows you to use natural language to query data, inspect logs, explore infrastructure, and debug your Moose project.
## What is MCP?
[Model Context Protocol (MCP)](https://modelcontextprotocol.io) is an open protocol that standardizes how AI assistants communicate with development tools and services. Moose's MCP server exposes your local development environment—including ClickHouse, Redpanda, logs, and infrastructure state—through a set of tools that AI agents can use.
## Quick Start
The MCP server runs automatically when you start development mode:
```bash
moose dev
```
The MCP server is available at: `http://localhost:4000/mcp`
The MCP server is enabled by default. To disable it, use `moose dev --mcp=false`.
## Configure Your AI Client
Connect your AI assistant to the Moose MCP server. Most clients now support native HTTP transport for easier setup.
**Setup**: Use the Claude Code CLI (easiest method)
```bash copy
claude mcp add --transport http moose-dev http://localhost:4000/mcp
```
That's it! Claude Code will automatically connect to your Moose dev server.
**Scope**: This command adds the MCP server to Claude Code's project configuration, making it available to your project when using Claude Code. Other AI clients (Cursor, Windsurf, etc.) require separate configuration - see the tabs below.
Make sure `moose dev` is running before adding the server. The CLI will verify the connection.
**Alternative**: Manual configuration at `~/.claude/config.json`
```json filename="config.json" copy
{
"mcpServers": {
"moose-dev": {
"transport": "http",
"url": "http://localhost:4000/mcp"
}
}
}
```
**Location**: `~/.codeium/windsurf/mcp_config.json`
Windsurf supports native Streamable HTTP transport:
```json filename="mcp_config.json" copy
{
"mcpServers": {
"moose-dev": {
"serverUrl": "http://localhost:4000/mcp"
}
}
}
```
**Prerequisites**:
- VS Code 1.102+ (built-in MCP support)
- Or install the [Cline extension](https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev)
**Option 1: Native HTTP Support (VS Code 1.102+)**
Add to `.vscode/settings.json` or User Settings:
```json filename=".vscode/settings.json" copy
{
"mcp.servers": {
"moose-dev": {
"transport": "http",
"url": "http://localhost:4000/mcp"
}
}
}
```
**Option 2: Cline Extension**
Configure in Cline's MCP settings:
```json copy
{
"moose-dev": {
"transport": "sse",
"url": "http://localhost:4000/mcp"
}
}
```
**Location**: `.cursor/mcp.json` (project-level) or `~/.cursor/settings/mcp.json` (global)
Cursor currently uses stdio transport. Use `mcp-remote` to bridge to HTTP servers:
```json filename=".cursor/mcp.json" copy
{
"mcpServers": {
"moose-dev": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"http://localhost:4000/mcp"
]
}
}
}
```
**Location**: `~/Library/Application Support/Claude/claude_desktop_config.json`
Access via: Claude > Settings > Developer > Edit Config
```json filename="claude_desktop_config.json" copy
{
"servers": {
"moose-dev": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"http://localhost:4000/mcp"
]
}
}
}
```
The `-y` flag automatically installs `mcp-remote` if not already installed.
Make sure `moose dev` is running before using the MCP tools. The AI client will connect to `http://localhost:4000/mcp`.
## Available Tools
The Moose MCP server provides five tools for interacting with your local development environment:
### `get_logs`
Retrieve and filter Moose development server logs for debugging and monitoring.
**What you can ask for:**
- Filter by log level (ERROR, WARN, INFO, DEBUG, TRACE)
- Limit the number of log lines returned
- Search for specific text patterns in logs
**Example prompts:**
*"Show me the last 10 ERROR logs"*
```
Showing 10 most recent log entries from /Users/user/.moose/2025-10-10-cli.log
Filters applied:
- Level: ERROR
[2025-10-10T17:44:42Z ERROR] Foo -> Bar (worker 1): Unsupported SASL mechanism: undefined
[2025-10-10T17:44:43Z ERROR] FooDeadLetterQueue (consumer) (worker 1): Unsupported SASL mechanism
[2025-10-10T17:51:48Z ERROR] server error on API server (port 4000): connection closed
...
```
*"What WARN level logs do I have?"*
```
Showing 6 most recent log entries
Filters applied:
- Level: WARN
[2025-10-10T16:45:04Z WARN] HTTP client not configured - missing API_KEY
[2025-10-10T16:50:05Z WARN] HTTP client not configured - missing API_KEY
...
```
**Tip**: Combine filters for better results. For example: "Show me ERROR logs with 'ClickHouse' in them" combines level filtering with search.
**Use cases:**
- Debugging application errors
- Monitoring infrastructure health
- Tracking data processing issues
- Finding specific events or patterns
---
### `get_infra_map`
Retrieve and explore the infrastructure map showing all components in your Moose project.
**What you can ask for:**
- List specific component types (tables, topics, API endpoints, workflows, etc.)
- Get a complete overview of all infrastructure
- Search for components by name
- See detailed configuration or just a summary
**Example prompts:**
*"What tables exist in my project?"*
```
# Moose Infrastructure Map (Summary)
## Tables (28)
- MergeTreeTest
- ReplacingMergeTreeVersion
- Bar
- BasicTypes
- UserEvents_1_0
- UserEvents_2_0
- FooDeadLetter
- BarAggregated
- FooWorkflow
...
```
*"Give me an overview of my Moose infrastructure"*
```
# Moose Infrastructure Map (Summary)
## Topics (11)
- Bar, BasicTypes, Foo, FooDeadLetterQueue, SimpleArrays...
## API Endpoints (11)
- INGRESS_Foo (INGRESS -> topic: Foo)
- INGRESS_BasicTypes (INGRESS -> topic: BasicTypes)
- EGRESS_bar (EGRESS (4 params))
...
## Tables (28)
- MixedComplexTypes, Bar, UserEvents_1_0...
## Topic-to-Table Sync Processes (10)
- Bar_Bar, BasicTypes_BasicTypes...
## Function Processes (3)
- Foo__Bar_Foo_Bar, Foo_Foo...
```
*"Find all components with 'User' in the name"*
```
## Tables (2)
- UserEvents_1_0
- UserEvents_2_0
```
**Tip**: Search is case-sensitive by default. Use capital letters to match your component names, or ask the AI to search case-insensitively.
**Use cases:**
- Understanding project structure
- Discovering available components
- Debugging infrastructure issues
- Documenting your data pipeline
---
### `query_olap`
Execute read-only SQL queries against your local ClickHouse database.
**What you can ask for:**
- Query table data with filters, sorting, and aggregations
- Inspect table schemas and column information
- Count rows and calculate statistics
- List all tables in your database
- Results in table or JSON format
**Example prompts:**
*"What columns are in the UserEvents_1_0 table?"*
```
Query executed successfully. Rows returned: 4
| name | type | default_type | default_expression | comment | ...
|-----------|-------------------|--------------|-------------------|---------|
| userId | String | | | |
| eventType | String | | | |
| timestamp | Float64 | | | |
| metadata | Nullable(String) | | | |
```
*"List all tables and their engines"*
```
Query executed successfully. Rows returned: 29
| name | engine |
|-----------------------------|------------------------------|
| Bar | MergeTree |
| BasicTypes | MergeTree |
| UserEvents_1_0 | MergeTree |
| UserEvents_2_0 | ReplacingMergeTree |
| ReplicatedMergeTreeTest | ReplicatedMergeTree |
| BarAggregated_MV | MaterializedView |
...
```
*"Count the number of rows in Bar"*
```
Query executed successfully. Rows returned: 1
| total_rows |
|------------|
| 0 |
```
**Tip**: Ask the AI to discover table names first using "What tables exist in my project?" before querying them. Table names are case-sensitive in ClickHouse.
**Use cases:**
- Exploring data during development
- Validating data transformations
- Checking table schemas
- Debugging SQL queries
- Analyzing data patterns
**Safety:**
Only read-only operations are permitted (SELECT, SHOW, DESCRIBE, EXPLAIN). Write operations (INSERT, UPDATE, DELETE) and DDL statements (CREATE, ALTER, DROP) are blocked.
---
### `get_stream_sample`
Sample recent messages from Kafka/Redpanda streaming topics.
**What you can ask for:**
- View recent messages from any stream/topic
- Specify how many messages to sample
- Get results in JSON or pretty-printed format
- Inspect message structure and content
**Example prompts:**
*"Sample 5 messages from the Bar topic"*
```json
{
"stream_name": "Bar",
"message_count": 5,
"partition_count": 1,
"messages": [
{
"primaryKey": "e90c93be-d28b-47d6-b783-5725655c044f",
"utcTimestamp": "+057480-11-24T20:39:59.000Z",
"hasText": true,
"textLength": 107
},
{
"primaryKey": "b974f830-f28a-4a95-b61c-f65bfc607795",
"utcTimestamp": "+057370-11-04T17:11:51.000Z",
"hasText": true,
"textLength": 166
},
...
]
}
```
*"What data is flowing through the BasicTypes stream?"* (pretty format)
```markdown
# Stream Sample: BasicTypes
Retrieved 3 message(s) from 1 partition(s)
## Message 1
{
"id": "bt-001",
"timestamp": "2024-10-09T12:00:00Z",
"stringField": "hello world",
"numberField": 42,
"booleanField": true
}
## Message 2
{
"id": "bt-002",
"timestamp": "2024-10-09T12:05:00Z",
"stringField": "test",
"numberField": 100,
"booleanField": false
}
...
```
**Tip**: Use "List all streaming topics" first to discover available streams in your project.
**Use cases:**
- Debugging data flow issues
- Validating streaming transformations
- Inspecting message formats
- Troubleshooting pipeline failures
---
### `get_source`
Look up the file path where a Moose component (table, stream, API, workflow, etc.) is defined. The tool searches the infrastructure map and returns the relative path plus snippet metadata.
**What you can ask for:**
- "Where is the `UserActivity` table defined?"
- "Open the TypeScript file for the `orderEventPipeline` stream."
- "Show me the FastAPI app behind the `analytics_api` web route."
**Example prompt and response:**
```
You: "Find the source for the OrderItem stream"
AI:
Component: stream/OrderItem (TypeScript)
File: src/ingest/order-items.ts
Snippet:
);
```
**Use cases:**
- Jump from CLI output or infra map entries to the actual code
- Let AI assistants open the correct file before editing
- Generate documentation references that include file paths
The tool only reads paths within your workspace and never executes code. Pair it with `get_infra_map` for quick “find-then-open” workflows.
---
## Example Workflows
### Debugging a Data Pipeline
```
You: "My users table is empty, can you help me debug?"
AI: Let me check the infrastructure and data flow.
[Uses get_infra_map to see the pipeline]
[Uses get_logs to check for errors]
[Uses query_olap to verify table is empty]
[Uses get_stream_sample to check if data is in the source stream]
AI: I found the issue - there are errors in the logs showing the streaming
function is failing. The stream has data, but it's not reaching the table.
```
### Exploring a New Project
```
You: "Can you give me an overview of this Moose project?"
AI: [Uses get_infra_map with format=summary]
AI: This project has 3 tables (users, events, aggregated_metrics),
2 streaming topics, 4 API endpoints, and 1 workflow. The data flow
appears to be: raw events → transformation → aggregated metrics.
```
### Validating Data Transformations
```
You: "Did my streaming function correctly transform the user data?"
AI: [Uses get_stream_sample on source stream]
[Uses query_olap on destination table]
AI: Yes, the transformation is working. I compared the source stream
messages with the destination table records, and the enrichment
fields are being added correctly.
```
## Advanced Configuration
### Custom Port
If you're running Moose on a non-default port, update the MCP configuration:
```bash
export MOOSE_HTTP_PORT=8080
moose dev
```
Then update your MCP client configuration to use port 8080 instead of 4000.
### Disabling the MCP Server
To run development mode without the MCP server:
```bash
moose dev --mcp=false
```
### Production Considerations
The MCP server is designed for local development only. It provides direct access to your infrastructure and should **never** be exposed in production environments.
The MCP server:
- Runs only in development mode (`moose dev`)
- Does not run in production mode (`moose prod`)
- Provides read-only access to sensitive infrastructure
- Should not be exposed over networks or proxied externally
## LLM-Optimized Documentation Feeds
Before handing control to an AI assistant, prime it with a compact doc bundle so it understands Moose primitives and terminology. We publish TypeScript and Python versions at `/llm-ts.txt` and `/llm-py.txt`, with optional `?path=` filters for specific sections.
See [LLM-optimized docs](/moose/llm-docs) for instructions on embedding these feeds into Claude, Cursor, Windsurf, or MCP clients alongside the live tools described above.
## Troubleshooting
### MCP Tools Not Appearing
1. Verify `moose dev` is running: `curl http://localhost:4000/mcp`
2. Check your AI client's MCP configuration is correct
3. Restart your AI client after updating configuration
4. Check the Moose logs for MCP-related errors: `moose logs --filter mcp`
### Connection Errors
If your AI client can't connect to the MCP server:
```bash
# Check if the dev server is running
curl http://localhost:4000/health
# Check MCP endpoint specifically
curl -X POST http://localhost:4000/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize"}'
```
### Empty Results
If tools return no data:
- Verify your dev server has been running long enough to generate data
- Check that infrastructure has been created: `moose ls`
- Try ingesting test data: `moose peek `
## Related Documentation
- [Local Development](/moose/local-dev) - Development mode overview
- [Moose CLI Reference](/moose/moose-cli) - CLI commands and flags
- [Model Context Protocol](https://modelcontextprotocol.io) - MCP specification
---
## Observability
Source: moose/metrics.mdx
Unified observability for Moose across development and production—metrics console, health checks, Prometheus, OpenTelemetry, logging, and error tracking
# Observability
This page consolidates Moose observability for both local development and production environments.
## Local Development
### Metrics Console
Moose provides a console to view live metrics from your Moose application. To launch the console, run:
```bash filename="Terminal" copy
moose metrics
```
Use the arrow keys to move up and down rows in the endpoint table and press Enter to view more details about that endpoint.
#### Endpoint Metrics
Aggregated metrics for all endpoints:
| Metric | Description |
| :-------------------- | :---------------------------------------------------------------------------------- |
| `AVERAGE LATENCY` | Average time in milliseconds it takes for a request to be processed by the endpoint |
| `TOTAL # OF REQUESTS` | Total number of requests made to the endpoint |
| `REQUESTS PER SECOND` | Average number of requests made per second to the endpoint |
| `DATA IN` | Average number of bytes of data sent to all `/ingest` endpoints per second |
| `DATA OUT` | Average number of bytes of data sent to all `/api` endpoints per second |
Individual endpoint metrics:
| Metric | Description |
| :---------------------------- | :---------------------------------------------------------------------------------- |
| `LATENCY` | Average time in milliseconds it takes for a request to be processed by the endpoint |
| `# OF REQUESTS RECEIVED` | Total number of requests made to the endpoint |
| `# OF MESSAGES SENT TO KAFKA` | Total number of messages sent to the Kafka topic |
#### Stream → Table Sync Metrics
| Metric | Description |
| :---------- | :-------------------------------------------------------------------------------------------------- |
| `MSG READ` | Total number of messages sent from `/ingest` API endpoint to the Kafka topic |
| `LAG` | The number of messages that have been sent to the consumer but not yet received |
| `MSG/SEC` | Average number of messages sent from `/ingest` API endpoint to the Kafka topic per second |
| `BYTES/SEC` | Average number of bytes of data received by the ClickHouse consumer from the Kafka topic per second |
#### Streaming Transformation Metrics
For each streaming transformation:
| Metric | Description |
| :------------ | :---------------------------------------------------------------------------- |
| `MSG IN` | Total number of messages passed into the streaming function |
| `MSG IN/SEC` | Average number of messages passed into the streaming function per second |
| `MSG OUT` | Total number of messages returned by the streaming function |
| `MSG OUT/SEC` | Average number of messages returned by the streaming function per second |
| `BYTES/SEC` | Average number of bytes of data returned by the streaming function per second |
---
## Production
### Health Monitoring
Moose applications expose a health check endpoint at `/health` that returns a 200 OK response when the application is operational. This endpoint is used by container orchestration systems like Kubernetes to determine the health of your application.
In production environments, we recommend configuring three types of probes:
1. Startup Probe: Gives Moose time to initialize before receiving traffic
2. Readiness Probe: Determines when the application is ready to receive traffic
3. Liveness Probe: Detects when the application is in a deadlocked state and needs to be restarted
Learn more about how to configure health checks in your Kubernetes deployment.
### Prometheus Metrics
Moose applications expose metrics in Prometheus format at the `/metrics` endpoint. These metrics include:
- HTTP request latency histograms for each endpoint
- Request counts and error rates
- System metrics for the Moose process
Example metrics output:
```
# HELP latency Latency of HTTP requests.
# TYPE latency histogram
latency_sum{method="POST",path="ingest/UserActivity"} 0.025
latency_count{method="POST",path="ingest/UserActivity"} 2
latency_bucket{le="0.001",method="POST",path="ingest/UserActivity"} 0
latency_bucket{le="0.01",method="POST",path="ingest/UserActivity"} 0
latency_bucket{le="0.02",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="0.05",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="0.1",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="0.25",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="0.5",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="1.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="5.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="10.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="30.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="60.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="120.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="240.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="+Inf",method="POST",path="ingest/UserActivity"} 1
```
You can scrape these metrics using a Prometheus server or any compatible monitoring system.
### OpenTelemetry Integration
In production deployments, Moose can export telemetry data using OpenTelemetry. Enable via environment variables:
```
MOOSE_TELEMETRY__ENABLED=true
MOOSE_TELEMETRY__EXPORT_METRICS=true
```
When running in Kubernetes with an OpenTelemetry operator, you can configure automatic sidecar injection by adding annotations to your deployment:
```yaml
metadata:
annotations:
"sidecar.opentelemetry.io/inject": "true"
```
### Logging
Configure structured logging via environment variables:
```
MOOSE_LOGGER__LEVEL=Info
MOOSE_LOGGER__STDOUT=true
MOOSE_LOGGER__FORMAT=Json
```
The JSON format is ideal for log aggregation systems (ELK Stack, Graylog, Loki, or cloud logging solutions).
### Production Monitoring Stack
Recommended components:
1. Metrics Collection: Prometheus or cloud-native monitoring services
2. Log Aggregation: ELK Stack, Loki, or cloud logging solutions
3. Distributed Tracing: Jaeger or other OpenTelemetry-compatible backends
4. Alerting: Alertmanager or cloud provider alerting
### Error Tracking
Integrate with systems like Sentry via environment variables:
```
SENTRY_DSN=https://your-sentry-dsn
RUST_BACKTRACE=1
```
Want this managed in production for you? Check out Boreal Cloud (from the makers of the Moose Stack).
## Feedback
Join our Slack community to share feedback and get help with Moose.
---
## Migrations & Planning
Source: moose/migrate.mdx
How Moose handles infrastructure migrations and planning
# Moose Migrate
Moose's migration system works like version control for your infrastructure. It automatically detects changes in your code and applies them to your data infrastructure with confidence.
Moose tracks changes across:
- OLAP Tables and Materialized Views
- Streaming Topics
- API Endpoints
- Workflows
## How It Works
Moose collects all objects defined in your main file (main.py) and automatically generates infrastructure operations to match your code:
```python file="app/main.py"
from pydantic import BaseModel
from moose_lib import OlapTable, Stream
class UserSchema(BaseModel):
id: str
name: str
email: str
users_table = OlapTable[UserSchema]("Users")
user_events = Stream[UserSchema]("Users")
```
When you add these objects, Moose automatically creates:
- A ClickHouse table named `Users` with the `UserSchema`
- A Redpanda topic named `Users` with the `UserSchema`
## Development Workflow
When running your code in development mode, Moose will automatically hot-reload migrations to your local infrastructure as you save code changes.
### Quick Start
Start your development environment:
```bash filename="Terminal" copy
moose dev
```
This automatically:
1. Recursively watches your `/app` directory for code changes
2. Parses objects defined in your main file
3. Compares the new objects with the current infrastructure state Moose stores internally
4. Generates and applies migrations in real-time based on the differences
5. Provides immediate feedback on any errors or warnings
6. Updates the internal state of your infrastructure to reflect the new state
### Example: Adding a New Table
```python file="app/main.py" {6} copy
# Before
users_table = OlapTable[UserSchema]("Users")
# After (add analytics table)
users_table = OlapTable[UserSchema]("Users")
analytics_table = OlapTable[AnalyticsSchema]("Analytics")
```
**What happens:**
- Moose detects the new `analyticsTable` object
- Compares: "No Analytics table exists"
- Generates migration: "Create Analytics table"
- Applies migration automatically
- Updates internal state
In your terminal, you will see a log that shows the new table being created:
```bash
⠋ Processing Infrastructure changes from file watcher
+ Table: Analytics Version None - id: String, number: Int64, status: String - - deduplicate: false
```
### Example: Schema Changes
```python file="app/main.py" {8} copy
from moose_lib import Key
# After (add age field)
class UserSchema(BaseModel):
id: Key[str]
name: str
email: str
age: int # New field
```
**What happens:**
- Moose detects the new `age` field
- Generates migration: "Add age column to Users table"
- Applies migration
- Existing rows get NULL/default values
## Production Workflow
Moose supports two deployment patterns: **Moose Server** and **Serverless**.
### Moose Server Deployments
For deployments with a running Moose server, preview changes before applying:
```bash filename="Terminal" copy
moose plan --url https://your-production-instance --token
```
Remote planning requires authentication:
1. Generate a token: `moose generate hash-token`
2. Configure your server:
```toml filename="moose.config.toml" copy
[authentication]
admin_api_key = "your-hashed-token"
```
3. Use the token with `--token` flag
**Deployment Flow:**
1. **Develop locally** with `moose dev`
2. **Test changes** in local environment
3. **Plan against production**: `moose plan --url --token `
4. **Review changes** carefully
5. **Deploy** - Moose applies migrations automatically on startup
### Serverless Deployments
For serverless deployments (no Moose server), use the ClickHouse connection directly:
```bash filename="Terminal" copy
# Step 1: Generate migration files
moose generate migration --clickhouse-url --save
# Step 2: Preview changes in PR
moose plan --clickhouse-url clickhouse://user:pass@host:port/database
# Step 3: Execute migration after merge
moose migrate --clickhouse-url
```
**Deployment Flow:**
1. **Develop locally** with `moose dev`
2. **Generate migration plan**: `moose generate migration --clickhouse-url --save`
3. **Create PR** with `plan.yaml`, `remote_state.json`, `local_infra_map.json`
4. **PR validation**: Run `moose plan --clickhouse-url ` in CI to preview changes
5. **Review** migration files and plan output
6. **Merge PR**
7. **Execute migration**: Run `moose migrate --clickhouse-url ` in CI/CD
Requires `state_config.storage = "clickhouse"` in `moose.config.toml`:
```toml filename="moose.config.toml" copy
[state_config]
storage = "clickhouse"
[features]
olap = true
data_models_v2 = true
```
Your ClickHouse instance needs the KeeperMap engine for state storage and migration locking.
✅ **ClickHouse Cloud**: Works out of the box
✅ **`moose dev` or `moose prod`**: Already configured
⚠️ **Self-hosted ClickHouse**: See [ClickHouse KeeperMap documentation](https://clickhouse.com/docs/en/engines/table-engines/special/keeper-map) for setup requirements
### Understanding Plan Output
Moose shows exactly what will change:
```bash
+ Table: Analytics Version None - id: String, number: Int64, status: String - - deduplicate: false
+ Table: Users Version None - id: String, name: String, email: String - - deduplicate: false
```
## Migration Types
| Change Type | Infrastructure Impact | Data Impact |
|-------------|----------------------|-------------|
| **Add new object** | New table/stream/API created | No impact |
| **Remove object** | Table/stream/API dropped | All data lost |
| **Add field** | New column created | Existing rows get NULL/default |
| **Remove field** | Column dropped | Data permanently lost |
| **Change type** | Column altered | Data converted if compatible |
## Viewing Infrastructure State
### Via CLI
```bash
# Check current infrastructure objects
moose ls
# View migration logs
moose logs
```
### Via Direct Connection
Connect to your local infrastructure using details from `moose.config.toml`:
```toml file="moose.config.toml"
[features]
olap = true # ClickHouse for analytics
streaming_engine = true # Redpanda for streaming
workflows = false # Temporal for workflows
[clickhouse_config]
host = "localhost"
host_port = 18123
native_port = 9000
db_name = "local"
user = "panda"
password = "pandapass"
[redpanda_config]
broker = "localhost:19092"
message_timeout_ms = 1000
retention_ms = 30000
replication_factor = 1
```
## Best Practices
### Development
- Use `moose dev` for all local development
- Monitor plan outputs for warnings
- Test schema changes with sample data
### Production
- Always use remote planning before deployments
- Review changes carefully in production plans
- Maintain proper authentication
- Test migrations in staging first
### Managing TTL Outside Moose
If you're managing ClickHouse TTL settings through other tools or want to avoid migration failures from TTL drift, you can configure Moose to ignore TTL changes:
```toml filename="moose.config.toml" copy
[migration_config]
ignore_operations = ["ModifyTableTtl", "ModifyColumnTtl"]
```
This tells Moose to:
- Skip generating TTL change operations in migration plans
- Ignore TTL differences during drift detection
You'll still get migrations for all other schema changes (adding tables, modifying columns, etc.), but TTL changes won't block your deployments.
## Troubleshooting
### Authentication Errors
- Verify your authentication token
- Generate a new token: `moose generate hash-token`
- Check server configuration in `moose.config.toml`
### Migration Issues
- Check `moose logs` for detailed error messages
- Verify object definitions in your main file
- Ensure all required fields are properly typed
- **Stuck migration lock**: If you see "Migration already in progress" but no migration is running, wait 5 minutes for automatic expiry or manually clear it:
```sql
DELETE FROM _MOOSE_STATE WHERE key = 'migration_lock';
```
---
## LifeCycle Management
Source: moose/migrate/lifecycle.mdx
Control how Moose manages database and streaming resources when your code changes
# LifeCycle Management
## Overview
The `LifeCycle` enum controls how Moose manages the lifecycle of database/streaming resources when your code changes.
This feature gives you fine-grained control over whether Moose automatically updates your database schema or
leaves it under external/manual control.
## LifeCycle Modes
### `FULLY_MANAGED` (Default)
This is the default behavior where Moose has complete control over your database resources. When you change your data models, Moose will automatically:
- Add new columns or tables
- Remove columns or tables that no longer exist in your code
- Modify existing column types and constraints
This mode can perform destructive operations. Data may be lost if you remove fields from your data models or if you perform operations that require a destroy and recreate to be effective, like changing the `order_by_fields` field .
```py filename="FullyManagedExample.py" copy
from moose_lib import OlapTable, OlapConfig, LifeCycle
from pydantic import BaseModel
class UserData(BaseModel):
id: str
name: str
email: str
# Default behavior - fully managed
user_table = OlapTable[UserData]("users")
# Explicit fully managed configuration
explicit_table = OlapTable[UserData]("users", OlapConfig(
order_by_fields=["id"],
life_cycle=LifeCycle.FULLY_MANAGED
))
```
### `DELETION_PROTECTED`
This mode allows Moose to automatically add new database structures but prevents it from removing existing ones.
Perfect for production environments where you want to evolve your schema safely without risking data loss.
**What Moose will do:**
- Add new columns, tables
- Modify column types (if compatible)
- Update non-destructive configurations
**What Moose won't do:**
- Drop columns or tables
- Perform destructive schema changes
```py filename="DeletionProtectedExample.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig, OlapConfig, StreamConfig, LifeCycle
from pydantic import BaseModel
from datetime import datetime
class ProductEvent(BaseModel):
id: str
product_id: str
timestamp: datetime
action: str
product_analytics = IngestPipeline[ProductEvent]("product_analytics", IngestPipelineConfig(
table=OlapConfig(
order_by_fields=["timestamp", "product_id"],
engine=ClickHouseEngines.ReplacingMergeTree,
),
stream=StreamConfig(
parallelism=4,
),
ingest_api=True,
# automatically applied to the table and stream
life_cycle=LifeCycle.DELETION_PROTECTED
))
```
### `EXTERNALLY_MANAGED`
This mode tells Moose to completely hands-off your resources.
You become responsible for creating and managing the database schema. This is useful when:
- You have existing database tables managed by another team
- You're integrating with another system (e.g. PeerDB)
- You have strict database change management processes
With externally managed resources, you must ensure your database schema matches your data models exactly, or you may encounter runtime errors.
```py filename="ExternallyManagedExample.py" copy
from moose_lib import Stream, OlapTable, OlapConfig, StreamConfig, LifeCycle, Key
from pydantic import BaseModel
from datetime import datetime
class ExternalUserData(BaseModel):
user_id: Key[str]
full_name: str
email_address: str
created_at: datetime
# Connect to existing database table
legacy_user_table = OlapTable[ExternalUserData]("legacy_users", OlapConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED
))
# Connect to existing Kafka topic
legacy_stream = Stream[ExternalUserData]("legacy_user_stream", StreamConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED,
destination=legacy_user_table
))
```
---
## Migration Examples & Advanced Development
Source: moose/migrate/migration-types.mdx
Detailed migration examples and advanced development topics
# Migration Types
This guide provides detailed examples of different migration types. For the complete workflow overview, see [Migrations & Planning](/moose/migrate/planning).
## Adding New Infrastructure Components
Keep in mind that only the modules that you have enabled in your `moose.config.toml` will be included in your migrations.
```toml file="moose.config.toml"
[features]
olap = true
streaming_engine = true
workflows = true
```
### New OLAP Table or Materialized View
```python file="app/main.py"
from pydantic import BaseModel
from datetime import datetime
from moose_lib import OlapTable
class AnalyticsSchema(BaseModel):
id: str
event_type: str
timestamp: datetime
user_id: str
value: float
analytics_table = OlapTable[AnalyticsSchema]("Analytics")
```
**Migration Result:** Creates ClickHouse table `Analytics` with all fields from `AnalyticsSchema`
If you have not enabled the `olap` feature flag, you will not be able to create new OLAP tables.
```toml file="moose.config.toml"
[features]
olap = true
```
Check out the OLAP migrations guide to learn more about the different migration modes.
### New Stream
```python file="app/main.py"
user_events = Stream[UserSchema]("UserEvents")
system_events = Stream[SystemEventSchema]("SystemEvents")
```
**Migration Result:** Creates Redpanda topics `UserEvents` and `SystemEvents`
If you have not enabled the `streaming_engine` feature flag, you will not be able to create new streaming topics.
```toml file="moose.config.toml"
[features]
streaming_engine = true
```
## Schema Modifications
### Adding Fields
```python file="app/main.py"
# Before
class UserSchema(BaseModel):
id: str
name: str
email: str
# After
class UserSchema(BaseModel):
id: str
name: str
email: str
age: int
created_at: datetime
is_active: bool
```
**Migration Result:** Adds `age`, `created_at`, and `is_active` columns to existing table
### Removing Fields
```python file="app/main.py"
# Before
class UserSchema(BaseModel):
id: str
name: str
email: str
age: int
deprecated_field: str # Will be removed
# After
class UserSchema(BaseModel):
id: str
name: str
email: str
age: int
```
**Migration Result:** Drops `deprecated_field` column (data permanently lost)
### Type Changes
```python file="app/main.py"
# Before
class UserSchema(BaseModel):
id: str
name: str
email: str
score: float # Will change to str
# After
class UserSchema(BaseModel):
id: str
name: str
email: str
score: str # Changed from float
```
**Migration Result:** Alters `score` column type (data converted if compatible)
## Removing Infrastructure
```python file="app/main.py"
# Before
users_table = OlapTable[UserSchema]("Users")
analytics_table = OlapTable[AnalyticsSchema]("Analytics")
deprecated_table = OlapTable[DeprecatedSchema]("Deprecated")
# After (remove deprecated table)
users_table = OlapTable[UserSchema]("Users")
analytics_table = OlapTable[AnalyticsSchema]("Analytics")
```
**Migration Result:** Drops `Deprecated` table (all data lost)
## Working with Local Infrastructure
There are two main ways to inspect your local infrastructure to see how your migrations are applied:
### Using the CLI
Run `moose ls` to see the current state of your infrastructure:
```bash
# Verify object definitions
moose ls
```
### Connecting to your local infrastructure
You can also connect directly to your local infrastructure to see the state of your infrastructure.
All credentials for your local infrastructure are located in your project config file (`moose.config.toml`).
#### Connecting to ClickHouse
```bash
# Using clickhouse-client
clickhouse-client --host localhost --port 18123 --user panda --password pandapass --database local
# Using connection string
clickhouse-client "clickhouse://panda:pandapass@localhost:18123/local"
```
#### Connecting to Redpanda
```bash
# Using kafka-console-consumer
kafka-console-consumer --bootstrap-server localhost:19092 --topic UserEvents --from-beginning
# Using kafka-console-producer
kafka-console-producer --bootstrap-server localhost:19092 --topic UserEvents
```
#### Viewing Temporal Workflows
Navigate to `http://localhost:8080` to view the Temporal UI and see registered workflows.
## Gotchas:
Your dev server must be running to connect to your local infrastructure.
```bash
moose dev
```
Only the modules that you have enabled in your `moose.config.toml` will be included in your migrations:
```toml file="moose.config.toml"
[features]
olap = true # Required for OLAP Tables and Materialized Views
streaming_engine = true # Required for Streams
workflows = true # Required for Workflows and Tasks
```
---
## Moose CLI Reference
Source: moose/moose-cli.mdx
Moose CLI Reference
# Moose CLI Reference
## Installation
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
## Core Commands
### Init
Initializes a new Moose project.
```bash
moose init --template [--location ] [--no-fail-already-exists]
```
- ``: Name of your app or service
- ``: Template to use for the project
- `--location`: Optional location for your app or service
- `--no-fail-already-exists`: Skip the directory existence check
#### List Templates
Lists available templates for project initialization.
```bash
moose template list
```
### Build
Builds your Moose project.
```bash
moose build [--docker] [--amd64] [--arm64]
```
- `--docker`: Build for Docker
- `--amd64`: Build for AMD64 architecture
- `--arm64`: Build for ARM64 architecture
### Dev
Starts a local development environment with hot reload and automatic infrastructure management.
```bash
moose dev [--mcp] [--docker]
```
- `--mcp`: Enable or disable the MCP (Model Context Protocol) server (default: true). The MCP server provides AI-assisted development tools at `http://localhost:4000/mcp`. See [MCP Server documentation](/moose/mcp-server) for details.
- `--docker`: Use Docker for infrastructure (default behavior in dev mode)
The development server includes:
- Hot reload for code changes
- Automatic Docker container management (ClickHouse, Redpanda, Temporal, Redis)
- Built-in MCP server for AI assistant integration
- Health monitoring and metrics endpoints
### Prod
Starts Moose in production mode for cloud deployments.
```bash
moose prod
```
### Check
Checks the project for non-runtime errors.
```bash
moose check [--write-infra-map]
```
### Clean
Clears temporary data and stops development infrastructure.
```bash
moose clean
```
### Seed (ClickHouse)
Seed your local ClickHouse from a remote ClickHouse instance.
```bash
moose seed clickhouse [--connection-string ] [--table ] [--limit | --all]
```
- `--connection-string`: Remote ClickHouse connection string. If omitted, the CLI uses `MOOSE_SEED_CLICKHOUSE_URL`.
- `--table`: Seed only the specified table (default: all Moose tables).
- `--limit`: Copy up to N rows (mutually exclusive with `--all`). Large limits are automatically batched.
- `--all`: Copy entire table(s) in batches (mutually exclusive with `--limit`).
**Connection String Format:**
The connection string must use ClickHouse native protocol:
```bash
# ClickHouse native protocol (secure connection)
clickhouse://username:password@host:9440/database
```
**Important:**
- The data transfer uses ClickHouse's native TCP protocol via `remoteSecure()` function. The remote ClickHouse server must have the native TCP port accessible (typically port 9440 for secure connections).
- **Smart table matching**: The command automatically validates tables between local and remote databases. Tables that don't exist on the remote are gracefully skipped with warnings.
- Use `--table ` to seed a specific table that exists in both local and remote databases.
**User Experience:**
- Progress indicator shows seeding status with spinner
- Tables that don't exist on remote are automatically skipped with clear warnings
- Final summary shows successful and skipped tables
- Clean, streamlined output focused on results
Notes:
- Seeding is batched automatically for large datasets; Ctrl+C finishes the current batch gracefully.
- Use env var fallback:
```bash
export MOOSE_SEED_CLICKHOUSE_URL='clickhouse://username:password@host:9440/database'
```
### Truncate
Truncate tables or delete the last N rows from local ClickHouse tables.
```bash
moose truncate [TABLE[,TABLE...]] [--all] [--rows ]
```
- `TABLE[,TABLE...]`: One or more table names (comma-separated). Omit to use `--all`.
- `--all`: Apply to all non-view tables in the current database (mutually exclusive with listing tables).
- `--rows `: Delete the last N rows per table; omit to remove all rows (TRUNCATE).
Notes:
- For `--rows`, the command uses the table ORDER BY when available; otherwise it falls back to a timestamp heuristic.
## Monitoring Commands
### Logs
View Moose logs.
```bash
moose logs [--tail] [--filter ]
```
- `--tail`: Follow logs in real-time
- `--filter`: Filter logs by specific string
### Ps
View Moose processes.
```bash
moose ps
```
### Ls
View Moose primitives & infrastructure.
```bash
moose ls [--limit ] [--version ] [--streaming] [--type ] [--name ] [--json]
```
- `--limit`: Limit output (default: 10)
- `--version`: View specific version
- `--streaming`: View streaming topics
- `--type`: Filter by infrastructure type (tables, streams, ingestion, sql_resource, consumption)
- `--name`: Filter by name
- `--json`: Output in JSON format
### Metrics
View live metrics from your Moose application.
```bash
moose metrics
```
### Peek
View data from a table or stream.
```bash
moose peek [--limit ] [--file ] [-t|--table] [-s|--stream]
```
- ``: Name of the table or stream to peek
- `--limit`: Number of rows to view (default: 5)
- `--file`: Output to a file
- `-t, --table`: View data from a table (default if neither flag specified)
- `-s, --stream`: View data from a stream/topic
## Generation Commands
### Generate Hash Token
Generate authentication tokens for API access.
```bash
moose generate hash-token
```
Generates both a plain-text Bearer token and its corresponding hashed version for authentication.
### Generate Migration Plan (OLAP)
Create an ordered ClickHouse DDL plan by comparing a remote instance with your local code.
```bash
moose generate migration --url https:// --token --save
```
- Writes `./migrations/plan.yaml` and snapshots `remote_state.json` and `local_infra_map.json`
- Omit `--save` to output to stdout without writing files
- Requires these feature flags in `moose.config.toml`:
```toml filename="moose.config.toml" copy
[features]
olap = true
ddl_plan = true
```
### DB Pull (External Tables)
Refresh `EXTERNALLY_MANAGED` table definitions from a remote ClickHouse instance.
```bash
moose db pull --connection-string [--file-path ]
```
- `--connection-string`: ClickHouse URL; native `clickhouse://` is auto-converted to HTTP(S). Include `?database=` or the CLI will query the current database.
- `--file-path`: Optional override for the generated external models file (defaults to `app/externalModels.ts` or `app/external_models.py`).
Notes:
- Only tables marked `EXTERNALLY_MANAGED` in your code are refreshed.
- The command writes a single external models file and overwrites the file on each run.
- See the full guide: [/moose/olap/db-pull](/moose/olap/db-pull)
### Kafka
#### Pull external topics and schemas
Discover topics from a Kafka/Redpanda cluster and optionally fetch JSON Schemas from Schema Registry to emit typed external models.
```bash
moose kafka pull [--path ] [--include ] [--exclude ] [--schema-registry ]
```
- ``: Kafka bootstrap servers, e.g. `localhost:19092`
- `--path`: Output directory for generated files. Defaults to `app/external-topics` (TS) or `app/external_topics` (Python).
- `--include`: Include pattern (glob). Default: `*`
- `--exclude`: Exclude pattern (glob). Default: `{__consumer_offsets,_schemas}`
- `--schema-registry`: Base URL for Schema Registry, e.g. `http://localhost:8081`
Notes:
- JSON Schema is supported initially; Avro/Protobuf planned.
- Generated files will be overwritten on subsequent runs.
## Workflow Management
### Workflow
```bash
moose workflow [options]
```
Available workflow commands:
- `init [--tasks ] [--task ...]`: Initialize a new workflow
- `run [--input ]`: Run a workflow
- `resume --from `: Resume a workflow from a specific task
- `list [--json]`: List registered workflows
- `history [--status ] [--limit ] [--json]`: Show workflow history
- `terminate `: Terminate a workflow
- `pause `: Pause a workflow
- `unpause `: Unpause a workflow
- `status [--id ] [--verbose] [--json]`: Get workflow status
## Planning and Deployment
### Plan
Display infrastructure changes for the next production deployment.
**For Moose Server deployments:**
```bash
moose plan [--url ] [--token ]
```
- `--url`: Remote Moose instance URL (default: http://localhost:4000)
- `--token`: API token for authentication
**For Serverless deployments:**
```bash
moose plan --clickhouse-url
```
- `--clickhouse-url`: ClickHouse connection URL (e.g., `clickhouse://user:pass@host:port/database`)
### Refresh
Integrate matching tables from a remote instance into the local project.
```bash
moose refresh [--url ] [--token ]
```
- `--url`: Remote Moose instance URL (default: http://localhost:4000)
- `--token`: API token for authentication
This reference reflects the current state of the Moose CLI based on the source code in the framework-cli directory. The commands are organized by their primary functions and include all available options and flags.
---
## Moose OLAP
Source: moose/olap.mdx
Create and manage ClickHouse tables, materialized views, and data migrations
# Moose OLAP
## Overview
The Moose OLAP module provides standalone ClickHouse database management with type-safe schemas. You can use this capability independently to create tables, materialized views, and manage your table schemas without requiring other MooseStack components.
### Basic Example
```py filename="FirstTable.py" copy
from pydantic import BaseModel
from moose_lib import Key, OlapTable
class MyFirstTable(BaseModel):
id: Key[str]
name: str
age: int
# Create a table named "first_table"
my_table = OlapTable[MyFirstTable]("first_table")
# No export needed - Python modules are automatically discovered
```
## Getting started
If you’re new to Moose OLAP, choose one of these paths:
### Import your schema from an existing ClickHouse database
```bash filename="Terminal" copy
moose init your-project-name --from-remote
```
Review the full guide to learn more about how to bootstrap a new Moose OLAP project from an existing ClickHouse DB.
### Start from scratch
Create a blank project, then model your first table:
```bash filename="Terminal" copy
moose init your-project-name --language
cd your-project-name
```
Review the guide to learn more about how to model your first table.
Working with ClickPipes/PeerDB? Read [External Tables](/moose/olap/external-tables) and keep them in sync with [DB Pull](/moose/olap/db-pull).
## Enabling Moose OLAP
In your `moose.config.toml` file, enable the OLAP Database capability:
```toml
[features]
olap = true
```
## Core Capabilities
- Model tables and views with TypeScript or Python
- Automatic type mapping and support for advanced ClickHouse column types (e.g `JSON`, `LowCardinality`, `Nullable`, etc)
- Create tables and views with one line of code
- In-database transformations/materialized views
- Migrations and schema evolution
- Query with templated SQL strings and type-safe table and column references
- Bulk insertion with failure handling and runtime validation
### Managing your database
### Accessing your data
These capabilities are available from other MooseStack modules or even from your own client applications:
### Connecting to your ClickHouse instance
You can connect to your ClickHouse instance with your favorite database client. Your credentials are located in your `moose.config.toml` file:
```toml filename="moose.config.toml" copy
[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000
```
These are the default credentials for your local ClickHouse instance running in dev mode.
### Combining with other modules:
Although designed to work independently, Moose OLAP can be combined with other modules to add additional capabilities surrounding your database:
- Combine with Streaming and APIs to setup streaming ingestion into ClickHouse tables
- Combine with Workflows to build ETL pipelines and data transformations
- Combine with APIs to expose your ClickHouse queries to client applications via REST API
---
## Applying Migrations
Source: moose/olap/apply-migrations.mdx
How to apply migrations to your database
# Applying Migrations
This page covers OLAP migrations. For migrations across the MooseStack, see the Migrate docs.
## Overview
Migrations are designed for two complementary goals:
- Move fast locally by inferring changes from your code and applying them immediately to your local database.
- Be deliberate in production by executing a reviewed, versioned plan that matches your intent and protects data.
How to think about it:
- Development mode: You edit code, MooseStack infers the SQL and immediately applies it to local ClickHouse. Great for rapid iteration; not guaranteed to infer intent (e.g., renames).
- Production (planned) mode: You generate a plan from the target environment vs your code, review and commit the plan, and MooseStack executes it deterministically during deploy with drift checks.
What you need to do:
- In dev: just code. MooseStack handles local diffs automatically.
- In prod (OLAP):
- Generate and save a plan:
```bash
moose generate migration --url https:// --token --save
```
- Review and edit the plan (`plan.yaml`) as needed
- Commit the plan to source control
- Deploy to production. MooseStack validates snapshots (current DB vs `remote_state.json`, desired code vs `local_infra_map.json`) and executes `plan.yaml` in order. If drift is detected, the deploy aborts; regenerate the plan and retry.
## Development Workflow
### Starting the Runtime
Use `moose dev` to start the MooseStack runtime with automatic migration detection:
```bash
moose dev
⡏ Starting local infrastructure
Successfully started containers
Validated clickhousedb-1 docker container
Validated redpanda-1 docker container
Successfully validated red panda cluster
Validated temporal docker container
Successfully ran local infrastructure
```
### Hot-Reloaded Migrations
MooseStack continuously monitors your code changes and applies migrations automatically. All changes are applied to your **local database only**.
```python filename="app/tables/events.py" copy
class Event(BaseModel):
id: Key[str]
name: str
created_at: datetime
status: str # New field - will trigger migration
table = OlapTable[Event]("events")
```
When you save changes, you'll see live logs in the terminal showing the diffs being applied to your local database:
```bash
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
+ status: String
```
## Production Workflow
Use planned migrations to generate, review, and apply OLAP DDL plans deterministically.
### Generating Migration Plans
When using planned migrations for OLAP, you need to generate a migration plan from the remote environment. This is done by running the following command:
```bash
moose generate migration --url https:// --token --save
```
This generates a few files in the `migrations` directory:
- `plan.yaml`: The migration plan containing an ordered list of operations to apply to the remote database to bring it into alignment with your local code.
- `remote_state.json`: A snapshot of the remote database state at the time the plan was generated
- `local_infra_map.json`: A snapshot of the local database state at the time the plan was generated
The remote and local state are used to validate that the plan is still valid at the time of deployment. If there have been schema changes made to your live remote database since the plan was generated, the deployment will abort and you will need to regenerate the plan. This is to prevent you from dropping data unintentionally.
### Reviewing and Editing the Plan
You can review and edit the plan as needed. The plan is a YAML file that contains an ordered list of operations to apply to the remote database to bring it into alignment with your local code.
```yaml filename="migrations/plan.yaml" copy
```
### Applying the Plan
The plan is applied during deployment. MooseStack will validate that the remote database state matches the snapshot of the database state at the time the plan was generated, and applies `plan.yaml` in order; it aborts if snapshots don’t match current state.
## Migration Types
### Adding New Tables or Materialized Views
```python filename="main.py" {4-7} copy
from app.db import newTable, newMaterializedView
```
The dev mode will automatically detect the new table or materialized view and apply the changes to your local database. You will see a log like this in the terminal:
```bash filename="Terminal" copy
$ moose dev
⠋ Processing Infrastructure changes from file watcher
+ Table: new_table Version None - id: String, a_column: String, some_other_column: Float64 - - deduplicate: false
+ Table: target_table Version None - id: String, a_column: String, some_other_column: Float64 - id - deduplicate: false
+ SQL Resource: mv_to_target
```
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
- CreateTable:
table:
name: new_table
columns:
- name: id
data_type: String
required: true
unique: false
primary_key: true
default: null
annotations: []
- name: a_column
data_type: String
required: true
unique: false
primary_key: false
default: null
annotations: []
- name: some_other_column
data_type: Float64
required: true
unique: false
primary_key: false
default: null
annotations: []
order_by:
- id
deduplicate: false
engine: MergeTree
version: null
metadata:
description: null
life_cycle: FULLY_MANAGED
- CreateTable:
table:
name: target_table
columns:
- name: id
data_type: String
required: true
unique: false
primary_key: true
default: null
annotations: []
- name: a_column
data_type: String
required: true
unique: false
primary_key: false
default: null
annotations: []
- name: some_other_column
data_type: Float64
required: true
unique: false
primary_key: false
default: null
annotations: []
order_by:
- id
deduplicate: false
engine: MergeTree
version: null
metadata:
description: null
life_cycle: FULLY_MANAGED
- RawSQL:
sql: "CREATE MATERIALIZED VIEW mv_to_target TO target_table AS SELECT * FROM source_table"
description: Running setup SQL for resource mv_to_target
```
### Column Additions
Adding new fields to your data models:
```python filename="Before.py" copy
class AddedColumn(BaseModel):
id: Key[str]
another_column: str
some_column: str
table = OlapTable[AddedColumn]("events")
```
```python filename="After.py" copy
class AddedColumn(BaseModel):
id: Key[str]
another_column: str
some_column: str
new_column: int # New field - migration applied
table = OlapTable[AddedColumn]("events")
```
In dev mode, you will see a log like this:
```bash filename="Terminal" copy
$ moose dev
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
+ new_column: Int64
```
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
- AddTableColumn:
table: "events"
column:
name: "new_column"
data_type: "Int64"
```
### Column Removals
Removing fields from your data models:
```python filename="Before.py" copy
class RemovedColumn(BaseModel):
id: Key[str]
another_column: str
some_column: str
old_column: int
table = OlapTable[RemovedColumn]("events")
```
```python filename="After.py" copy
class RemovedColumn(BaseModel):
id: Key[str]
another_column: str
some_column: str
# old_column field removed
table = OlapTable[RemovedColumn]("events")
```
In dev mode, you will see a log like this:
```bash filename="Terminal" copy
$ moose dev
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
- old_column: Int64
```
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
- DropTableColumn:
table: "events"
column_name: "old_column"
```
### Changing Column Data Types (use with caution)
```python filename="Before.py" copy
class ChangedType(BaseModel):
id: Key[str]
some_column: str
table = OlapTable[ChangedType]("events")
```
```python filename="After.py" copy
class ChangedType(BaseModel):
id: Key[str]
some_column: Annotated[str, "LowCardinality"] # Add LowCardinality for better performance
table = OlapTable[ChangedType]("events")
```
In dev mode, you will see a log like this:
```bash filename="Terminal" copy
$ moose dev
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
- some_column: String -> LowCardinality(String)
```
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
- ChangeTableColumn:
table: "events"
column_name: "some_column"
data_type: "LowCardinality(String)"
```
Some data type changes can be incompatible with existing data. Read the guide to learn more.
### Materialized View Changes
Modifying the `SELECT` statement of a materialized view:
```python filename="Before.py" copy
from pydantic import BaseModel
class TargetSchema(BaseModel):
day: Date;
count: number;
sum: number;
mv = MaterializedView[TargetSchema](MaterializedViewConfig(
select_statement="""
SELECT
toStartOfDay(a_date) as day,
uniq(id) as count,
sum(a_number) as sum
FROM table
GROUP BY day
"""
target_table=OlapConfig(
name="target_table",
engine=ClickHouseEngines.MergeTree,
order_by_fields=["day"],
),
materialized_view_name="mv_table_to_target",
))
```
```python filename="After.py" copy
class TargetSchema(BaseModel):
day: Date;
count: number;
sum: number;
avg: number;
mv = MaterializedView[TargetSchema](MaterializedViewConfig(
select_statement="""
SELECT
toStartOfDay(a_date) as day,
uniq(id) as count,
sum(a_number) as sum,
avg(a_number) as avg
FROM table
GROUP BY day
"""
target_table=OlapConfig(
name="target_table",
engine=ClickHouseEngines.MergeTree,
order_by_fields=["day"],
),
materialized_view_name="mv_table_to_target",
))
```
The dev mode diff:
```bash filename="Terminal" copy
$ moose dev
⠋ Processing Infrastructure changes from file watcher
~ Table target_table:
Column changes:
+ avg: Float64
~ SQL Resource: mv_to_target
```
Notice that the materialized view generates both a target table and a SQL resource. The target table creates a new table in the database to store the results of the materialized view `SELECT` statement. The `SQL Resource` is the SQL statement that is used to create the target table.
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
created_at: 2025-08-20T05:35:31.668353Z
operations:
- RawSql:
sql:
- DROP VIEW IF EXISTS mv_table_to_target
description: Running teardown SQL for resource mv_table_to_target
- AddTableColumn:
table: target_table
column:
name: "avg"
data_type: "Float64"
- RawSql:
sql:
- "CREATE MATERIALIZED VIEW IF NOT EXISTS mv_table_to_target \n TO target_table\n AS \n SELECT \n toStartOfDay(`a_date`) as day, \n uniq(`id`) as count, \n sum(`a_number`) as sum, \n avg(`a_number`) as avg\n FROM `source_table` \n GROUP BY day"
- "INSERT INTO target_table\n \n SELECT \n toStartOfDay(`a_date`) as day, \n uniq(`id`) as count, \n sum(`a_number`) as sum, \n avg(`a_number`) as avg\n FROM `source_table` \n GROUP BY day"
description: Running setup SQL for resource mv_table_to_target
```
Changing a materialized view's SELECT statement will recreate the entire view and repopulate all data. This can be time-consuming for large datasets.
---
## Syncing External Tables
Source: moose/olap/db-pull.mdx
Refresh your external table models from an existing ClickHouse database
# Syncing External Tables
## What this is
Use `moose db pull` to refresh the definitions of tables you marked as `EXTERNALLY_MANAGED` from a live ClickHouse instance. It reads your code to find external tables, fetches their remote schemas, regenerates one external models file, and creates a small git commit if anything changed. If new external tables were added remotely (e.g., new CDC streams), they are added to the external models file as part of the same run.
## When to use it
- **External tables changed remotely**: a DBA, CDC, or ETL pipeline updated schema.
- **Keep types in sync**: update generated models without touching fully-managed tables.
- **Safe by design**: does not modify the database or your managed models.
This is a read-only sync for your code models. For concepts and modeling guidance, see [External Tables](/moose/olap/external-tables). To bootstrap a project from an existing DB, see [Initialize from ClickHouse](/moose/getting-started/from-clickhouse).
## Requirements
- Tables are defined with `life_cycle=LifeCycle.EXTERNALLY_MANAGED`
- A ClickHouse connection string (native or HTTP/S)
## Connection strings
`db pull` accepts both native and HTTP(S) URLs. Native strings are automatically converted to HTTP(S) with the appropriate ports.
Examples:
```bash filename="Terminal" copy
# Native (auto-converted to HTTPS + 8443)
moose db pull --connection-string "clickhouse://explorer@play.clickhouse.com:9440/default"
# HTTPS (explicit database via query param)
moose db pull --connection-string "https://play.clickhouse.com/?user=explorer&database=default"
# Local HTTP
moose db pull --connection-string "http://localhost:8123/?user=default&database=default"
```
## What gets written
`app/external_models.py`
`db pull` treats this file as the single source of truth for `EXTERNALLY_MANAGED` tables. It introspects the remote schema, updates existing external tables, and adds any newly detected external tables here. It does not modify models elsewhere in your codebase.
Keep all external tables in this file and import it once from your root (`app/main.py`).
Important:
- The file is overwritten on every run (or at the path passed via `--file-path`).
- If you customize the path, ensure your root file imports it so Moose loads your external models.
## How it works
When you run `db pull` the CLI does the following:
- Loads your project’s infrastructure map and identifies tables marked as `EXTERNALLY_MANAGED`.
- Connects to the remote ClickHouse specified by `--connection-string` and introspects the live schemas for those tables.
- Regenerates a single external models file that mirrors the remote schema.
- Adds any newly detected external tables from the remote database to the generated file so your code stays in sync as sources evolve.
- Does not change any fully managed tables, your `app/main.py`, or the database itself.
- Creates a small git commit if the generated file changed, so you can review and share the update.
### Example output
```py filename="app/external_models.py"
# AUTO-GENERATED FILE. DO NOT EDIT.
# This file will be replaced when you run `moose db pull`.
# ...pydantic models matching remote EXTERNALLY_MANAGED tables...
```
## Command
```bash filename="Terminal" copy
moose db pull --connection-string [--file-path ]
```
- **--connection-string**: Required. ClickHouse URL (native or HTTP/S)
- **--file-path**: Optional. Override the default output file. The file at this path will be regenerated (overwritten) on each run.
## Typical Use Cases
### Remote schema changed; update local types
Your DBA, CDC pipeline (e.g., ClickPipes), or ETL job updated a table’s schema. To keep your code accurate and type-safe, refresh your external models so queries, APIs, and materialized views reference the correct columns and types.
```bash filename="Terminal" copy
moose db pull --connection-string
```
This updates only `EXTERNALLY_MANAGED` models and leaves managed code untouched.
### Automatically run on dev startup (keep local fresh)
In active development, schemas can drift faster than you commit updates. Running `db pull` on dev startup helps ensure your local code matches the live schema you depend on.
```bash filename="Terminal" copy
export REMOTE_CLICKHOUSE_URL="clickhouse://:@:/"
```
Add to `moose.config.toml`:
```toml filename="moose.config.toml" copy
[http_server_config]
on_first_start_script = "moose db pull --connection-string $REMOTE_CLICKHOUSE_URL"
```
This runs once when the dev server first starts. To run after code reloads, use `on_reload_complete_script`. If you run this frequently, prefer HTTP(S) URLs and cache credentials via env/secrets to avoid friction.
### New project from an existing DB
If you’re starting with an existing ClickHouse database, bootstrap code with `init --from-remote`, then use `db pull` over time to keep external models fresh:
```bash filename="Terminal" copy
moose init my-project --from-remote $REMOTE_CLICKHOUSE_URL --language
```
Review the full getting started guide to learn more about how to bootstrap a new Moose OLAP project from an existing ClickHouse DB.
### A new CDC/external table appeared; add it to code
Your CDC pipeline created a new table (or exposed a new stream). Pull to add the new table to your external models file automatically.
```bash filename="Terminal" copy
moose db pull --connection-string
```
The regenerated external models file will now include the newly discovered external table.
## Troubleshooting
- **No changes written**: Ensure tables are actually marked as `EXTERNALLY_MANAGED` and names match remote.
- **Unsupported types**: The CLI will list tables with unsupported types; they’re skipped in the generated file.
- **Auth/TLS errors**: Verify scheme/ports (8123 or 8443) and credentials; try HTTPS if native URL fails.
- **Git commit issues**: The command attempts a lightweight commit; commit manually if your working tree is dirty.
## Related
- **External Tables**: concepts and configuration
- **Initialize from ClickHouse**: bootstrap projects from an existing DB
- **Supported Types**: mapping and constraints
---
## External Tables
Source: moose/olap/external-tables.mdx
Connect to externally managed database tables and CDC services
# External Tables
## Overview
External tables allow you to connect Moose to database tables that are managed outside of your application. This is essential when working with:
- **CDC (Change Data Capture) services** like ClickPipes, Debezium, or AWS DMS
- **Legacy database tables** managed by other teams
- **Third-party data sources** with controlled schema evolution
## When to Use External Tables
## Configuration
Set `life_cycle=LifeCycle.EXTERNALLY_MANAGED` to tell Moose not to modify the table schema:
```py filename="ExternalTableExample.py" copy
from moose_lib import OlapTable, OlapConfig, LifeCycle
from pydantic import BaseModel
from datetime import datetime
class CdcUserData(BaseModel):
id: str
name: str
email: str
updated_at: datetime
# Connect to CDC-managed table
cdc_user_table = OlapTable[CdcUserData]("cdc_users", OlapConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED
))
```
## Getting Models for External Tables
### New project: initialize from your existing ClickHouse
If you don’t yet have a Moose project, use init-from-remote to bootstrap models from your existing ClickHouse:
```bash filename="Terminal" copy
moose init my-project --from-remote --language
```
What happens:
- Moose introspects your database and generates table models in your project.
- If Moose detects ClickPipes (or other CDC-managed) tables, it marks those as `EXTERNALLY_MANAGED` and writes them into a dedicated external models file:
- TypeScript: `app/externalModels.ts`
- Python: `app/external_models.py`
- This is a best-effort detection to separate CDC-managed tables from those you may want Moose to manage in code.
How detection works (ClickPipes/PeerDB example):
- Moose looks for PeerDB-specific fields that indicate CDC ownership and versions, such as `_peerdb_synced_at`, `_peerdb_is_deleted`, `_peerdb_version`, and related metadata columns.
- When these are present, the table will be marked `EXTERNALLY_MANAGED` and emitted into the external models file automatically.
### Existing project: mark additional external tables
If there are other tables in your DB that are not CDC-managed but you want Moose to treat as external (not managed by code):
1) Mark them as external in code
```py copy
table = OlapTable[MySchema](
"my_table",
OlapConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED
)
)
```
2) Move them into the external models file
- Move the model definitions to your external file (`app/externalModels.ts` or `app/external_models.py`).
- Ensure your root file still loads only the external models via a single import:
- Add `from external_models import *` in your `app/main.py` file.
This keeps truly external tables out of your managed code path, while still making them available locally (and in tooling) without generating production DDL.
## Important Considerations
`EXTERNALLY_MANAGED` tables reflect schemas owned by your CDC/DBA/ETL processes. Do not change their field shapes in code.
If you accidentally edited an external model, revert to the source of truth by running **DB Pull**: [/moose/olap/db-pull](/moose/olap/db-pull).
Locally, externally managed tables are created/kept in sync in your development ClickHouse so you can develop against them and **seed data**. See **Seed (ClickHouse)** in the CLI: [/moose/moose-cli#seed-clickhouse](/moose/moose-cli#seed-clickhouse).
Moose will **not** apply schema changes to `EXTERNALLY_MANAGED` tables in production. If you edit these table models in code, those edits will not produce DDL operations in the migration plan (they will not appear in `plan.yaml`).
For more on how migration plans are generated and what shows up in `plan.yaml`, see [/moose/olap/planned-migrations](/moose/olap/planned-migrations).
## Staying in sync with remote schema
For `EXTERNALLY_MANAGED` tables, keep your code in sync with the live database by running DB Pull. You can do it manually or automate it in dev.
```bash filename="Terminal" copy
moose db pull --connection-string
```
Use DB Pull to regenerate your external models file from the remote schema. To run it automatically during development, see the script hooks in [the local development guide](/moose/local-dev#script-execution-hooks).
---
## Secondary Indexes
Source: moose/olap/indexes.mdx
Specifying indexes with Moose OLAP
## Indexes for ClickHouse tables
Moose lets you declare secondary/data-skipping indexes directly in your table definitions.
Moose generates the ClickHouse `INDEX` clauses on create and
plans `ALTER TABLE ADD/DROP INDEX` operations when you change them later.
### When to use indexes
- Use indexes to optimize selective predicates on large tables, especially string and high-cardinality columns.
- Common types: `minmax`, `Set(max_rows)`, `ngrambf_v1(...)`, `bloom_filter`.
### TypeScript
```ts
interface Events {
id: string;
user: string;
message: string;
}
);
```
### Python
```python
from moose_lib.dmv2.olap_table import OlapTable, OlapConfig, MergeTreeEngine
from pydantic import BaseModel
class Events(BaseModel):
id: str
user: str
message: str
events_table = OlapTable[Events](
"Events",
OlapConfig(
engine=MergeTreeEngine(),
order_by_fields=["id"],
indexes=[
OlapConfig.TableIndex(name="idx_user", expression="user", type="minmax", granularity=1),
OlapConfig.TableIndex(name="idx_message_ngrams", expression="message", type="ngrambf_v1", arguments=["3","256","1","123"], granularity=1),
],
),
)
```
### How Moose applies changes
- On create, Moose emits `INDEX ...` entries inside `CREATE TABLE`.
- On change, Moose plans `ALTER TABLE DROP INDEX ` then `ADD INDEX ...` if the definition changed; pure adds/drops are applied as single operations.
---
## Inserting Data
Source: moose/olap/insert-data.mdx
Insert data into OLAP tables using various methods
# Inserting Data
Inserting data into your database is a common task. MooseStack provides a few different ways to insert data into your database.
If a table column is modeled as optional in your app type but has a ClickHouse default, Moose treats incoming records as optional at the API/stream boundary, but the ClickHouse table stores the column as required with a DEFAULT clause. If you omit the field in the payload, ClickHouse fills it with the default at insert time.
`Annotated[int, clickhouse_default("18")]`
## From a Stream (Streaming Ingest)
When you need to stream data into your ClickHouse tables, you can set the `Stream.destination` as a reference to the `OlapTable` you want to insert into. This will automatically provision a synchronization process that batches and inserts data into the table.
```py filename="StreamInsert.py" copy
from moose_lib import Stream, StreamConfig, Key
from pydantic import BaseModel
from datetime import datetime
class Event(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
events_table = OlapTable[Event]("user_events")
events_pipeline = Stream[Event]("user_events", StreamConfig(
destination=events_table # Automatically syncs the stream to the table in ClickHouse-optimized batches
))
```
[ClickHouse inserts need to be batched for optimal performance](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance). Moose automatically batches your data into ClickHouse-optimized batches of up to 100,000 records, with automatic flushing every second. It also handles at-least-once delivery and retries on connection errors to ensure your data is never lost.
## From a Workflow (Batch Insert)
If you have data source better suited for batch patterns, use a workflow and the direct `insert()` method to land data into your tables:
```py filename="WorkflowInsert.py" copy
from moose_lib import OlapTable, Key, InsertOptions
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
events_table = OlapTable[UserEvent]("user_events")
# Direct insertion for ETL workflows
result = events_table.insert([
{"id": "evt_1", "user_id": "user_123", "timestamp": datetime.now(), "event_type": "click"},
{"id": "evt_2", "user_id": "user_456", "timestamp": datetime.now(), "event_type": "view"}
])
print(f"Successfully inserted: {result.successful} records")
print(f"Failed: {result.failed} records")
```
## From a Client App
### Via REST API
In your Moose code, you can leverage the built in [MooseAPI module](/moose/apis) to place a `POST` REST API endpoint in front of your streams and tables to allow you to insert data from external applications.
```py filename="IngestApi.py" copy
from moose_lib import IngestApi, IngestConfig
ingest_api = IngestApi[Event]("user_events", IngestConfig(
destination=events_stream
))
```
Alternatively, use `IngestPipeline` instead of standalone `IngestApi`, `Stream` `OlapTable` components:
```py filename="IngestPipeline.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig
ingest_pipeline = IngestPipeline[Event]("user_events", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True,
))
```
With these APIs you can leverage the built-in OpenAPI client integration to generate API clients in your own language to connect to your pipelines from external applications.
### Coming Soon: MooseClient
We're working on a new client library that you can use to interact with your Moose pipelines from external applications.
Join the community slack to stay updated and let us know if you're interested in helping us build it.
## Direct Data Insertion
The `OlapTable` provides an `insert()` method that allows you to directly insert data into ClickHouse tables with validation and error handling.
### Inserting Arrays of Records
```py filename="DirectInsert.py" copy
from moose_lib import OlapTable, Key, InsertOptions
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
events_table = OlapTable[UserEvent]("user_events")
# Insert single record or array of records
result = events_table.insert([
{"id": "evt_1", "user_id": "user_123", "timestamp": datetime.now(), "event_type": "click"},
{"id": "evt_2", "user_id": "user_456", "timestamp": datetime.now(), "event_type": "view"}
])
print(f"Successfully inserted: {result.successful} records")
print(f"Failed: {result.failed} records")
```
ClickHouse strongly recommends batching inserts. You should avoid inserting single records in to tables, and consider using Moose Streams and Ingest Pipelines if your data source sends events as individual records.
### Handling Large Batch Inserts
For large datasets, use Python generators for memory-efficient processing:
```py filename="StreamInsert.py" copy
def user_event_generator():
"""Generate user events for memory-efficient processing."""
for i in range(10000):
yield {
"id": f"evt_{i}",
"user_id": f"user_{i % 100}",
"timestamp": datetime.now(),
"event_type": "click" if i % 2 == 0 else "view"
}
# Insert from generator (validation not available for streams)
result = events_table.insert(user_event_generator(), InsertOptions(strategy="fail-fast"))
```
### Validation Methods
Before inserting data, you can validate it using the following methods:
```py filename="ValidationMethods.py" copy
from moose_lib import OlapTable, Key
from pydantic import BaseModel
class UserEvent(BaseModel):
id: Key[str]
user_id: str
event_type: str
events_table = OlapTable[UserEvent]("user_events")
# Validate a single record
validated_data, error = events_table.validate_record(unknown_data)
if validated_data is not None:
print("Valid data:", validated_data)
else:
print("Validation error:", error)
# Validate multiple records with detailed error reporting
validation_result = events_table.validate_records(data_array)
print(f"Valid records: {len(validation_result.valid)}")
print(f"Invalid records: {len(validation_result.invalid)}")
for error in validation_result.invalid:
print(f"Record {error.index} failed: {error.error}")
```
### Error Handling Strategies
Choose from three error handling strategies based on your reliability requirements:
#### Fail-Fast Strategy (Default)
```py filename="FailFast.py" copy
from moose_lib import InsertOptions
# Stops immediately on any error
result = events_table.insert(data, InsertOptions(strategy="fail-fast"))
```
#### Discard Strategy
```py filename="Discard.py" copy
from moose_lib import InsertOptions
# Discards invalid records, continues with valid ones
result = events_table.insert(data, InsertOptions(
strategy="discard",
allow_errors=10, # Allow up to 10 failed records
allow_errors_ratio=0.05 # Allow up to 5% failure rate
))
```
#### Isolate Strategy
```py filename="Isolate.py" copy
from moose_lib import InsertOptions
# Retries individual records to isolate failures
result = events_table.insert(data, InsertOptions(
strategy="isolate",
allow_errors_ratio=0.1
))
# Access detailed failure information
if result.failed_records:
for failed in result.failed_records:
print(f"Record {failed.index} failed: {failed.error}")
```
### Performance Optimization
The insert API includes several performance optimizations:
- **Memoized connections**: ClickHouse clients are reused across insert calls
- **Batch processing**: Optimized batch sizes for large datasets
- **Async inserts**: Automatic async insert mode for datasets > 1000 records
- **Connection management**: Use `close_client()` when completely done
```py filename="Performance.py" copy
from moose_lib import InsertOptions
# For high-throughput scenarios
result = events_table.insert(large_dataset, InsertOptions(
validate=False, # Skip validation for performance
strategy="discard"
))
# Clean up when completely done (optional)
events_table.close_client()
```
## Best Practices
---
## Creating Materialized Views
Source: moose/olap/model-materialized-view.mdx
Create and configure materialized views for data transformations
# Modeling Materialized Views
## Overview
Materialized views are write-time transformations in ClickHouse. A static `SELECT` populates a destination table from one or more sources. You query the destination like any other table. The `MaterializedView` class wraps [ClickHouse `MATERIALIZED VIEW`](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#create-materialized-view) and keeps the `SELECT` explicit. When you edit the destination schema in code and update the `SELECT` accordingly, Moose applies the corresponding DDL, orders dependent updates, and backfills as needed, so the pipeline stays consistent as you iterate.
In local dev, Moose Migrate generates and applies DDL to your local database.
Today, destination schemas are declared in code and kept in sync manually with your `SELECT`. Moose Migrate coordinates DDL and dependencies when you make those changes. A future enhancement will infer the destination schema from the `SELECT` and update it automatically.
This dependency awareness is critical for [cascading materialized views](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#create-materialized-view-with-dependencies). Moose Migrate [orders DDL across views and tables](https://www.fiveonefour.com/blog/Moose-SQL-Getting-DDL-Dependencies-in-Order) to avoid failed migrations and partial states.
### Basic Usage
```python filename="BasicUsage.py" copy
from moose_lib import MaterializedView, MaterializedViewOptions, ClickHouseEngines
from source_table import source_table
# Define the schema of the transformed rows-- this is static and it must match the results of your SELECT. It also represents the schema of your entire destination table.
class TargetSchema(BaseModel):
id: str
average_rating: float
num_reviews: int
mv = MaterializedView[TargetSchema](MaterializedViewOptions(
# The transformation to run on the source table
select_statement="""
SELECT
{source_table.columns.id},
avg({source_table.columns.rating}) AS average_rating,
count(*) AS num_reviews
FROM {source_table}
GROUP BY {source_table.columns.id}
""",
# Reference to the source table(s) that the SELECT reads from
select_tables=[source_table],
# Creates a new OlapTable named "target_table" where the transformed rows are written to.
table_name="target_table",
order_by_fields=["id"],
# The name of the materialized view in ClickHouse
materialized_view_name="mv_to_target_table",
))
```
The ClickHouse `MATERIALIZED VIEW` object acts like a trigger: on new inserts into the source table(s), it runs the SELECT and writes the transformed rows to the destination.
### Quick Reference
```python filename="ViewOptions.py" copy
from moose_lib import MaterializedView, sql
from source_table import source_table
class MaterializedViewOptions(BaseModel):
select_statement: str
table_name: str
materialized_view_name: str
select_tables: List[OlapTable | View]
engine: ClickHouseEngines = ClickHouseEngines.MergeTree
order_by_fields: List[str] = []
```
## Modeling the Target Table
The destination table is where the transformed rows are written by the materialized view. You can model it in two ways:
### Option 1 — Define target table inside the MaterializedView (most cases)
- Simple, co-located lifecycle: the destination table is created/updated/dropped with the MV.
- Best for: projection/denormalization, filtered serving tables, enrichment joins, and most rollups.
```python filename="InlineTarget.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions
class TargetSchema(BaseModel):
id: str
value: int
mv = MaterializedView[TargetSchema](MaterializedViewOptions(
select_statement="""
SELECT {source_table.columns.id}, toInt32({source_table.columns.value}) AS value FROM {source_table}
""",
select_tables=[source_table],
table_name="serving_table",
order_by_fields=["id"],
materialized_view_name="mv_to_serving_table",
))
```
### Option 2 — Decoupled: reference a standalone `OlapTable`
Certain use cases may benefit from a separate lifecycle for the target table that is managed independently from the MV.
```python filename="DecoupledTarget.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions, OlapConfig, ClickHouseEngines
class TargetSchema(BaseModel):
id: str
value: int
# Create the standalone table
target_table = OlapTable[TargetSchema](OlapConfig(
name="target_table",
engine=ClickHouseEngines.MergeTree,
order_by_fields=["id"],
))
mv = MaterializedView[TargetSchema](MaterializedViewOptions(
select_statement="""
SELECT {source_table.columns.id}, toInt32({source_table.columns.value}) AS value FROM {source_table}
""",
select_tables=[source_table],
materialized_view_name="mv_to_target_table",
), target_table=target_table)
```
### Basic Transformation, Cleaning, Filtering, Denormalization
Create a narrower, query-optimized table from a wide source. Apply light transforms (cast, rename, parse) at write time.
```python filename="Denormalization.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions
class Dest(BaseModel):
id: str
value: int
created_at: str
mv = MaterializedView[Dest](MaterializedViewOptions(
select_statement="""
SELECT {source_table.columns.id}, toInt32({source_table.columns.value}) AS value, {source_table.columns.created_at} AS created_at FROM {source_table} WHERE active = 1
""",
select_tables=[source_table],
table_name="proj_table",
order_by_fields=["id"],
materialized_view_name="mv_to_proj_table",
))
```
### Aggregations
### Simple Additive Rollups
When you want to maintain running sums (counts, totals) that are additive per key, use the `SummingMergeTree` engine:
```python filename="Summing.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions, ClickHouseEngines
class DailyCounts(BaseModel):
day: str
user_id: str
events: int
stmt = """
SELECT
toDate({events.columns.timestamp}) AS day,
{events.columns.user_id} AS user_id,
count(*) AS events
FROM {events}
GROUP BY day, user_id
"""
mv = MaterializedView[DailyCounts](MaterializedViewOptions(
select_statement=STMT,
select_tables=[events],
table_name="daily_counts",
engine=ClickHouseEngines.SummingMergeTree,
order_by_fields=["day", "user_id"],
materialized_view_name="mv_to_daily_counts",
))
```
#### Complex Aggregations
When you want to compute complex aggregation metrics that are not just simple additive operations (sum, count, avg, etc), but instead uses more complex anlaytical functions: (topK,percentile, etc), create a target table with the `AggregatingMergeTree` engine.
```python filename="AggTransform.py" copy
from typing import Annotated
from pydantic import BaseModel
from moose_lib import MaterializedView, AggregateFunction, MaterializedViewOptions, ClickHouseEngines
class MetricsById(BaseModel):
id: str
avg_rating: Annotated[float, AggregateFunction(agg_func="avg", param_types=[float])]
daily_uniques: Annotated[int, AggregateFunction(agg_func="uniqExact", param_types=[str])]
# The SELECT must output aggregate states
STMT = """
SELECT
id,
avgState(${events.columns.rating}) AS avg_rating,
uniqExactState(${events.columns.user_id}) AS daily_uniques
FROM ${events}
GROUP BY ${events.columns.id}
"""
# Create the MV (engine config shown in TS example)
mv = MaterializedView[MetricsById](MaterializedViewOptions(
select_statement=STMT,
table_name="metrics_by_id",
materialized_view_name="mv_metrics_by_id",
engine=ClickHouseEngines.AggregatingMergeTree,
order_by_fields=["id"],
select_tables=[events],
))
```
Jump to the [Advanced: AggregatingMergeTree transformations](#advanced-aggregatingmergetree-transformations) section for more details.
### Fan-in Patterns
When you have multiple sources that you want to merge into a single destination table, its best to create an OlapTable and reference it in each MV that needs to write to it:
```python filename="FanIn.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions, OlapConfig, ClickHouseEngines
class DailyCounts(BaseModel):
day: str
user_id: str
events: int
# Create the destination table explicitly
daily = OlapTable[DailyCounts]("daily_counts", OlapConfig(
engine=ClickHouseEngines.SummingMergeTree,
order_by_fields=["day", "user_id"],
))
# MV 1 - write to the daily_counts table
mv1 = MaterializedView[DailyCounts](MaterializedViewOptions(
select_statement="SELECT toDate(ts) AS day, user_id, 1 AS events FROM {webEvents}",
select_tables=[webEvents],
materialized_view_name="mv_web_to_daily_counts",
), target_table=daily)
# MV 2 - write to the daily_counts table
mv2 = MaterializedView[DailyCounts](MaterializedViewOptions(
select_statement="SELECT toDate(ts) AS day, user_id, 1 AS events FROM {mobileEvents}",
select_tables=[mobileEvents],
materialized_view_name="mv_mobile_to_daily_counts",
), target_table=daily)
```
### Blue/green schema migrations
Create a new table for a breaking schema change and use an MV to copy data from the old table; when complete, switch reads to the new table and drop just the MV and old table.
For more information on how to use materialized views to perform blue/green schema migrations, see the [Schema Versioning](./schema-versioning) guide.
## Defining the transformation
The `select_statement` is a static SQL query that Moose runs to transform data from your source table(s) into rows for the destination table.
Transformations are defined as ClickHouse SQL queries. We strongly recommend using the ClickHouse SQL reference and functions overview to help you develop your transformations.
You can use f-strings to interpolate tables and columns identifiers to your queries. Since these are static, you don't need to worry about SQL injection.
```python filename="Transformation.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions, OlapConfig
class Dest(BaseModel):
id: str
name: str
day: str
mv = MaterializedView[Dest](MaterializedViewOptions(
select_statement="""
SELECT
{events.columns.id} AS id,
{events.columns.name} AS name,
toDate({events.columns.ts}) AS day
FROM {events}
JOIN {users} ON {events.columns.user_id} = {users.columns.id}
WHERE {events.columns.active} = 1
""",
select_tables=[events, users],
order_by_fields=["id"],
table_name="user_activity_by_day",
materialized_view_name="mv_user_activity_by_day",
))
```
The columns returned by your `SELECT` must exactly match the destination table schema.
- Use column aliases (`AS target_column_name`) to align names.
- All destination columns must be present in the `SELECT`, or the materialized view won't be created. Adjust your transformation or table schema so they match.
Go to the [Advanced: Writing SELECT statements to Aggregated tables](#writing-select-statements-to-aggregated-tables) section for more details.
## Backfill Destination Tables
When the MaterializedView is created, Moose backfills the destination once by running your `SELECT` (so you start with a fully populated table).
You can see the SQL that Moose will run to backfill the destination table when you generate the [Migration Plan](./migration-plan).
During dev mode, as soon as you save the MaterializedView, Moose will run the backfill and you can see the results in the destination table by querying it in your local ClickHouse instance.
## Query Destination Tables
You can query the destination table like any other table.
For inline or decoupled target tables, you can reference target table columns and tables directly in your queries:
```python filename="Query.py" copy
# Query inline destination table by name
QUERY = """
SELECT {mv.target_table.columns.id}, {mv.target_table.columns.value}
FROM {mv.target_table}
ORDER BY {mv.target_table.columns.id}
LIMIT 10
"""
```
If you define your target table outside of the MaterializedView, you can also just reference the table by its variable name in your queries:
```python filename="QueryDecoupled.py" copy
# Query the standalone destination table by name
target_table = OlapTable[TargetTable](OlapConfig(
name="target_table",
engine=ClickHouseEngines.MergeTree,
order_by_fields=["id"],
))
QUERY = """
SELECT
{target_table.columns.id},
{target_table.columns.average_rating}
FROM {target_table}
WHERE {target_table.columns.id} = 'abc'
"""
```
Go to the [Querying Aggregated tables](#querying-aggregated-tables) section for more details on how to query Aggregated tables.
## Advanced: Aggregations + Materialized Views
This section dives deeper into advanced patterns and tradeoffs when building aggregated materialized views.
### Target Tables with `AggregatingMergeTree`
When using an `AggregatingMergeTree` target table, you must use the `AggregateFunction` type to model the result of the aggregation functions:
```python filename="AggTransform.py" copy
from typing import Annotated, TypedDict
from moose_lib import MaterializedView, AggregateFunction, MaterializedViewOptions
class MetricsById(TypedDict):
id: Key[str]
# avg_rating stores result of avgState(events.rating)
# daily_uniques stores result of uniqExactState(events.user_id)
# - uniqExact returns an integer; use number & ClickHouseInt<"uint64"> for precision
# - Aggregated arg type is [string] because the column (events.user_id) is a string
# - Aggregated function name is "uniqExact"
avg_rating: Annotated[float, AggregateFunction(agg_func="avg", param_types=[float])]
# daily_uniques stores result of uniqExactState(events.user_id)
# - uniqExact returns an integer; Annotated[int, ...] to model this result type
# - Aggregated function name is "uniqExact"
# - The column we are aggregating (events.user_id) is a string, so the Aggregated arg type is [string].
daily_uniques: Annotated[int, AggregateFunction(agg_func="uniqExact", param_types=[str])]
# The SELECT must output aggregate states
STMT = """
SELECT
id,
avgState(${events.columns.rating}) AS avg_rating,
uniqExactState(${events.columns.user_id}) AS daily_uniques
FROM ${events}
GROUP BY ${events.columns.id}
"""
# Create the MV (engine config shown in TS example)
mv = MaterializedView[MetricsById](MaterializedViewOptions(
select_statement=STMT,
table_name="metrics_by_id",
materialized_view_name="mv_metrics_by_id",
select_tables=[events],
))
```
- Using `avg()`/`uniqExact()` in the SELECT instead of `avgState()`/`uniqExactState()`
- Forgetting to annotate the schema with `AggregateFunction(...)` so the target table can be created correctly
- Mismatch between `GROUP BY` keys in your `SELECT` and the `order_by_fields` of your target table
### Modeling columns with `AggregateFunction`
- Pattern: `Annotated[U, AggregateFunction(agg_func="avg", param_types=[float])]`
- `U` is the read-time type (e.g., `float`, `int`)
- `agg_func` is the aggregation name (e.g., `avg`, `uniqExact`)
- `param_types` are the argument types. These are the types of the columns that are being aggregated.
```python filename="FunctionToTypeMapping.py" copy
Annotated[int, Aggregated["avg", [int]]] # avgState(col: int)
Annotated[int, Aggregated["uniqExact", [str]]] # uniqExactState(col: str)
Annotated[int, Aggregated["count", []]] # countState(col: any)
Annotated[str, Aggregated["argMax", [str, datetime]]] # argMaxState(col: str, value: datetime)
Annotated[str, Aggregated["argMin", [str, datetime]]] # argMinState(col: str, value: datetime)
Annotated[float, Aggregated["corr", [float, float]]] # corrState(col1: float, col2: float)
Annotated[float, Aggregated["quantiles", [float]]] # quantilesState(levels: float, value: float)
```
### Writing SELECT statements to Aggregated tables
When you write to an `AggregatingMergeTree` table, you must add a `State` suffix to the aggregation functions in your `SELECT` statement.
```python filename="AggTransform.py" copy
from pydantic import BaseModel
from typing import Annotated
from moose_lib import MaterializedView, ClickHouseEngines, AggregateFunction, MaterializedViewOptions
class MetricsById(BaseModel):
id: str
avg_rating: Annotated[float, AggregateFunction(agg_func="avg", param_types=[float])]
total_reviews: Annotated[int, AggregateFunction(agg_func="sum", param_types=[int])]
agg_stmt = '''
SELECT
{reviews.columns.id} AS id,
avgState({reviews.columns.rating}) AS avg_rating,
countState({reviews.columns.id}) AS total_reviews
FROM {reviews}
GROUP BY {reviews.columns.id}
'''
mv = MaterializedView[MetricsById](MaterializedViewOptions(
select_statement=agg_stmt,
select_tables=[reviews],
table_name="metrics_by_id",
engine=ClickHouseEngines.AggregatingMergeTree,
order_by_fields=["id"],
materialized_view_name="mv_metrics_by_id",
))
```
Why states? Finalized values (e.g., `avg()`) are not incrementally mergeable. Storing states lets ClickHouse maintain results efficiently as new data arrives. Docs: https://clickhouse.com/docs/en/sql-reference/aggregate-functions/index and https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators#-state
### Querying Aggregated Tables
When you query a table with an `AggregatingMergeTree` engine, you must use aggregate functions with the `Merge` suffix (e.g., `avgMerge`)
```python filename="QueryAgg.py" copy
# Manual finalization using ...Merge
QUERY = """
SELECT
avgMerge(avg_rating) AS avg_rating,
countMerge(total_reviews) AS total_reviews
FROM metrics_by_id
WHERE id = '123'
"""
```
## Choosing the right engine
- Use `MergeTree` for copies/filters/enrichment without aggregation semantics.
- Use `SummingMergeTree` when all measures are additive, and you want compact, eventually-consistent sums.
- Use `AggregatingMergeTree` for non-additive metrics and advanced functions; store states and finalize on read.
- Use `ReplacingMergeTree` for dedup/upserts or as an idempotent staging layer before rollups.
---
## Modeling Tables
Source: moose/olap/model-table.mdx
Model your database schema in code using native TypeScript/Python typing
# Modeling Tables
## Overview
Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing.
You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more.
```py filename="FirstTable.py" copy
from pydantic import BaseModel
from moose_lib import Key, OlapTable
from pydantic import BaseModel
class MyFirstTable(BaseModel):
id: Key[str]
name: str
age: int
# Create a table named "first_table"
my_table = OlapTable[MyFirstTable]("first_table")
# No export needed - Python modules are automatically discovered
```
## Basic Usage
### Standalone Tables
Create a table directly for custom data flows or when you need fine-grained control:
```py filename="StandaloneTable.py"
from moose_lib import Key, OlapTable
from pydantic import BaseModel
class ExampleSchema(BaseModel):
id: Key[str]
date_field: Date
numeric_field: float
boolean_field: bool
# Create a standalone table named "example_table"
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
example_table = OlapTable[ExampleSchema]("example_table", OlapConfig(
order_by_fields=["id", "date_field"],
engine=ReplacingMergeTreeEngine()
))
```
### Creating Tables in Ingestion Pipelines
For end-to-end data flows, create tables as part of an ingestion pipeline:
```py filename="PipelineTable.py"
from moose_lib import IngestPipeline, Key, OlapTable
from pydantic import BaseModel
class UserEvent(BaseModel):
id: Key[str]
user_id: str
timestamp: Date
event_type: str
from moose_lib import IngestPipeline, IngestPipelineConfig, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
events_pipeline = IngestPipeline[UserEvent]("user_events", IngestPipelineConfig(
ingest_api=True, # Creates a REST API endpoint at POST localhost:4000/ingest/user_events
stream=True, # Creates a Kafka/Redpanda topic
table=OlapConfig( # Creates and configures the table named "user_events"
order_by_fields=["id", "timestamp"],
engine=ReplacingMergeTreeEngine()
)
))
# Access the table component when needed:
events_table = events_pipeline.get_table()
```
## Data Modeling
### Special ClickHouse Types (LowCardinality, Nullable, etc)
```py filename="ClickHouseTypes.py" copy
from moose_lib import Key, clickhouse_decimal, ClickHouseNamedTuple
from typing import Annotated
from pydantic import BaseModel
from datetime import datetime
class Customer(BaseModel):
name: str
address: str
class Order(BaseModel):
order_id: Key[str]
amount: clickhouse_decimal(10, 2)
status: Literal["Paid", "Shipped", "Delivered"] # translated to LowCardinality(String) in ClickHouse
created_at: datetime
customer: Annotated[Customer, "ClickHouseNamedTuple"]
```
### Default values
Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL.
```py filename="Defaults.py" copy
from typing import Annotated
from pydantic import BaseModel
from moose_lib import OlapTable, Key, clickhouse_default, clickhouse_decimal
from datetime import datetime
class Event(BaseModel):
id: Key[str]
# Static defaults
status: Annotated[str, clickhouse_default("'pending'")] # DEFAULT 'pending'
retries: Annotated[int, clickhouse_default("0")] # DEFAULT 0
# Server-side timestamps
created_at: Annotated[datetime, clickhouse_default("now()")]
# Decimal with default
amount: Annotated[float, clickhouse_decimal(10, 2)] = 0
events = OlapTable[Event]("events", {
"orderByFields": ["id", "created_at"],
})
```
The value passed into the `clickhouse_default` function can either be a string literal or a stringified ClickHouse SQL expression.
If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.
- Optional without default → ClickHouse Nullable type.
- Optional with default (using `clickhouse_default("18")` in annotations) → non-nullable column with default `18`.
This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.
### Database Selection
By default, tables are created in the database specified in your `moose.config.toml` ClickHouse configuration. You can override this on a per-table basis using the `database` field:
```py filename="DatabaseOverride.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel
class UserData(BaseModel):
id: Key[str]
name: str
email: str
# Table in default database (from moose.config.toml)
default_table = OlapTable[UserData]("users")
# Table in specific database (e.g., "analytics")
analytics_table = OlapTable[UserData]("users", OlapConfig(
database="analytics",
order_by_fields=["id"]
))
```
To use custom databases, configure them in your `moose.config.toml`:
```toml
[clickhouse_config]
db_name = "local"
additional_databases = ["analytics", "staging"]
```
The databases in `additional_databases` will be created automatically when you start your Moose application.
### Primary Keys and Sorting
You must configure table indexing using one of these approaches:
1. Define at least one `Key` in your table schema
2. Specify `orderByFields` in the table config
3. Use both (all `Key` fields must come first in the `orderByFields` array)
```py filename="PrimaryKeyConfig.py" copy
from moose_lib import Key, OlapTable
from pydantic import BaseModel
class Record1(BaseModel):
id: Key[str] # Primary key field
field1: str
field2: int
table1 = OlapTable[Record1]("table1") # id is the primary key
```
### Order By Fields Only
Leverage the `OlapConfig` class to configure your table:
```py filename="OrderByFieldsOnly.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel
from datetime import datetime
class SchemaWithoutPrimaryKey(BaseModel):
field1: str
field2: int
field3: datetime
table2 = OlapTable[SchemaWithoutPrimaryKey]("table2", OlapConfig(
order_by_fields=["field1", "field2"] # Specify ordering without primary key
))
```
### Order By Expression
Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely.
```py filename="OrderByExpression.py" copy
from moose_lib import OlapTable, OlapConfig
from pydantic import BaseModel
from datetime import datetime
class Events(BaseModel):
user_id: str
created_at: datetime
event_type: str
# Equivalent to order_by_fields=["user_id", "created_at", "event_type"]
events = OlapTable[Events]("events", OlapConfig(
order_by_expression="(user_id, created_at, event_type)",
))
# Advanced: functions inside expression
events_by_month = OlapTable[Events]("events_by_month", OlapConfig(
order_by_expression="(user_id, toYYYYMM(created_at))",
))
# No sorting
unsorted = OlapTable[Events]("events_unsorted", OlapConfig(
order_by_expression="tuple()",
))
```
### Using Both Primary Key and Order By Fields
```py filename="ComboKeyAndOrderByFields.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel
class SchemaWithKey(BaseModel):
id: Key[str]
field1: str
field2: int
table3 = OlapTable[SchemaWithKey]("table3", OlapConfig(
order_by_fields=["id", "field1"] # Primary key must be first
))
```
### Using Multiple Primary Keys
```py filename="MultiKeyTable.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel
class MultiKeyRecord(BaseModel):
key1: Key[str]
key2: Key[int]
field1: str
multi_key_table = OlapTable[MultiKeyRecord]("multi_key_table", OlapConfig(
order_by_fields=["key1", "key2", "field1"] # Multiple keys must come first
))
```
### Table engines
By default, Moose will create tables with the `MergeTree` engine. You can use different engines by setting the `engine` in the table configuration.
```py filename="TableEngine.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import MergeTreeEngine, ReplacingMergeTreeEngine
# Default MergeTree engine
table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"]
))
# Explicitly specify engine
dedup_table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"],
engine=ReplacingMergeTreeEngine()
))
```
#### Deduplication (`ReplacingMergeTree`)
Use the `ReplacingMergeTree` engine to keep only the latest record for your designated sort key:
```py filename="DeduplicatedTable.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
class Record(BaseModel):
id: Key[str]
updated_at: str # Version column
deleted: int = 0 # Soft delete marker (UInt8)
# Basic deduplication
table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"],
engine=ReplacingMergeTreeEngine()
))
# With version column (keeps record with highest version)
versioned_table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"],
engine=ReplacingMergeTreeEngine(ver="updated_at")
))
# With soft deletes (requires ver parameter)
soft_delete_table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"],
engine=ReplacingMergeTreeEngine(
ver="updated_at",
is_deleted="deleted" # UInt8 column: 1 marks row for deletion
)
))
```
ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately.
**Version Column (`ver`)**: When specified, ClickHouse keeps the row with the maximum version value for each unique sort key.
**Soft Deletes (`is_deleted`)**: When specified along with `ver`, rows where this column equals 1 are deleted during merges. This column must be UInt8 type.
For more details, see the [ClickHouse documentation](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree).
#### Streaming from S3 (`S3Queue`)
Use the `S3Queue` engine to automatically ingest data from S3 buckets as files are added:
```py filename="S3StreamingTable.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import S3QueueEngine
class S3Event(BaseModel):
id: str
timestamp: datetime
data: dict
# Modern API using engine configuration
s3_events = OlapTable[S3Event]("s3_events", OlapConfig(
engine=S3QueueEngine(
s3_path="s3://my-bucket/data/*.json",
format="JSONEachRow",
aws_access_key_id="AKIA...",
aws_secret_access_key="secret..."
),
settings={
"mode": "unordered",
"keeper_path": "/clickhouse/s3queue/events"
}
))
```
S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas.
#### Replicated Engines
Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants:
- `ReplicatedMergeTree` - Replicated version of MergeTree
- `ReplicatedReplacingMergeTree` - Replicated with deduplication
- `ReplicatedAggregatingMergeTree` - Replicated with aggregation
- `ReplicatedSummingMergeTree` - Replicated with summation
```py filename="ReplicatedEngines.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import (
ReplicatedMergeTreeEngine,
ReplicatedReplacingMergeTreeEngine,
ReplicatedAggregatingMergeTreeEngine,
ReplicatedSummingMergeTreeEngine
)
class Record(BaseModel):
id: Key[str]
updated_at: datetime
deleted: int = 0
# Basic replicated table with explicit paths
replicated_table = OlapTable[Record]("records", OlapConfig(
order_by_fields=["id"],
engine=ReplicatedMergeTreeEngine(
keeper_path="/clickhouse/tables/{database}/{shard}/records",
replica_name="{replica}"
)
))
# Replicated with deduplication
replicated_dedup = OlapTable[Record]("dedup_records", OlapConfig(
order_by_fields=["id"],
engine=ReplicatedReplacingMergeTreeEngine(
keeper_path="/clickhouse/tables/{database}/{shard}/dedup_records",
replica_name="{replica}",
ver="updated_at",
is_deleted="deleted"
)
))
# For ClickHouse Cloud or Boreal (no parameters needed)
cloud_replicated = OlapTable[Record]("cloud_records", OlapConfig(
order_by_fields=["id"],
engine=ReplicatedMergeTreeEngine()
))
```
The `keeper_path` and `replica_name` parameters are **optional** for replicated engines:
- **Omit both parameters** (recommended): Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments. The default path pattern `/clickhouse/tables/{uuid}/{shard}` with replica `{replica}` works automatically with Atomic databases (default in modern ClickHouse).
- **Provide custom paths**: You can still specify both parameters explicitly if you need custom replication paths for your self-managed cluster.
**Note**: Both parameters must be provided together, or both omitted. The `{uuid}`, `{shard}`, and `{replica}` macros are automatically substituted by ClickHouse at runtime.
For more details, see the [ClickHouse documentation on data replication](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication).
### Irregular column names and Python Aliases
If a ClickHouse column name isn't a valid Python identifier or starts with an underscore,
you can use a safe Python field name and set a Pydantic alias to the real column name.
MooseOLAP then uses the alias for ClickHouse DDL and data mapping,
so your model remains valid while preserving the true column name.
```python
from pydantic import BaseModel, Field
class CHUser(BaseModel):
# ClickHouse: "_id" → safe Python attribute with alias
UNDERSCORE_PREFIXED_id: str = Field(alias="_id")
# ClickHouse: "user name" → replace spaces, keep alias
user_name: str = Field(alias="user name")
```
## Externally Managed Tables
If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to `EXTERNALLY_MANAGED`.
```py filename="ExternallyManagedTable.py" copy
from moose_lib import OlapTable, OlapConfig, LifeCycle
# Table managed by external system
external_table = OlapTable[UserData]("external_users", OlapConfig(
order_by_fields=["id", "timestamp"],
life_cycle=LifeCycle.EXTERNALLY_MANAGED # Moose won't create or modify this table in prod mode
))
```
Learn more about the different lifecycle options and how to use them in the [LifeCycle Management](/stack/olap/lifecycle) documentation.
## Invalid Configurations
```py filename="InvalidConfig.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from typing import Optional
class BadRecord1(BaseModel):
field1: str
field2: int
bad_table1 = OlapTable[BadRecord1]("bad_table1") ## No primary key or orderByFields
class BadRecord2(BaseModel):
id: Key[str]
field1: str
bad_table2 = OlapTable[BadRecord2]("bad_table2", OlapConfig(
order_by_fields=["field1", "id"] # Wrong order - primary key must be first
))
class BadRecord3(BaseModel):
id: Key[str]
field1: str
field2: Optional[int]
bad_table3 = OlapTable[BadRecord3]("bad_table3", OlapConfig(
order_by_fields=["id", "field2"] # Can't have nullable field in orderByFields
))
```
## Development Workflow
### Local Development with Hot Reloading
One of the powerful features of Moose is its integration with the local development server:
1. Start your local development server with `moose dev`
2. When you define or modify an `OlapTable` in your code and save the file:
- The changes are automatically detected
- The TypeScript compiler plugin processes your schema definitions
- The infrastructure is updated in real-time to match your code changes
- Your tables are immediately available for testing
For example, if you add a new field to your schema:
```py filename="HotReloading.py" copy
# Before
class BasicSchema(BaseModel):
id: Key[str]
name: str
# After adding a field
class BasicSchema(BaseModel):
id: Key[str]
name: str
created_at: datetime
```
The Moose framework will:
1. Detect the change when you save the file
2. Update the table schema in the local ClickHouse instance
3. Make the new field immediately available for use
### Verifying Your Tables
You can verify your tables were created correctly using:
```bash filename="Terminal" copy
# List all tables in your local environment
moose ls
```
#### Connecting to your local ClickHouse instance
You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your `moose.config.toml` file:
```toml filename="moose.config.toml" copy
[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000
```
---
## Modeling Views
Source: moose/olap/model-view.mdx
Define standard ClickHouse Views for read-time projections
# Modeling Views
## Overview
Views are read-time projections in ClickHouse. A static `SELECT` defines the view over one or more base tables or other views. Moose wraps [ClickHouse `VIEW`](https://clickhouse.com/docs/en/sql-reference/statements/create/view) with a simple `View` class in Python. You provide the view name, the `SELECT`, and the list of source tables/views so Moose can order DDL correctly during migrations.
Use `View` when you want a virtual read-time projection and don’t need write-time transformation or a separate storage table. For write-time pipelines and backfills, use a Materialized View instead.
## Basic Usage
```python filename="BasicUsage.py" copy
from moose_lib import View
from tables import users, events
active_user_events = View(
"active_user_events",
"""
SELECT
{events.columns.id} AS event_id,
{users.columns.id} AS user_id,
{users.columns.name} AS user_name,
{events.columns.ts} AS ts
FROM {events}
JOIN {users} ON {events.columns.user_id} = {users.columns.id}
WHERE {users.columns.active} = 1
""",
[events, users],
)
```
## Quick Reference
```python filename="Signature.py" copy
# View(name: str, select_statement: str, base_tables: list[OlapTable | View])
View(
"view_name",
"SELECT ... FROM {someTable}",
[someTable],
)
```
The `SELECT` should be static (no runtime parameters). In TypeScript, prefer Moose’s `sql` template for safe table/column interpolation; in Python, use string templates with `{table.columns.col}`.
---
## Planned Migrations (OLAP)
Source: moose/olap/planned-migrations.mdx
Generate, review, and safely execute ClickHouse DDL plans
# Planned Migrations
Migration planning is a new way to have more fine-grained control over HOW database schema changes are applied to your database when you deploy your code into production.
## Why planned migrations?
Most database migrations are designed under the assumption that your code is the sole owner of the database schema. In OLAP databases, we have to be more careful and assume that schema changes can happen at any time:
- The database schema is shared with other services (e.g. Change Data Capture services like ClickPipes)
- Other users (e.g. analysts) of the database may have credentials that let them change the schema
This is why the plan is generated from the remote environment, and validated against the live state of the database at the time of deployment. If it detects a drift, it will abort the deployment and require you to regenerate the plan, to make sure you are not dropping data unintentionally.
Planned migrations apply only to OLAP (ClickHouse) schema changes. Streaming, APIs, and processes are unaffected by this flow.
## What this does
- Generates an ordered set of ClickHouse operations and writes them to `./migrations/plan.yaml`
- Saves two validation snapshots for drift detection:
- `./migrations/remote_state.json` (state when plan was created)
- `./migrations/local_infra_map.json` (desired state from your local code)
- When enabled, validates state and executes the exact reviewed operations
## Prerequisites
```toml file="moose.config.toml"
[features]
olap = true
ddl_plan = true
```
## Generating a Plan
Once done editing your code in your feature branch, you can generate a plan that diffs your local code against your live remote database:
**For Moose server deployments:**
```bash filename="Terminal" copy
moose generate migration --url https:// --token --save
```
**For serverless deployments:**
```bash filename="Terminal" copy
moose generate migration --clickhouse-url clickhouse://user:pass@host:port/db --save
```
Outputs:
```text
./migrations/plan.yaml
./migrations/remote_state.json
./migrations/local_infra_map.json
```
What each file contains:
- `remote_state.json`: The state of the remote database when the plan was generated.
- `local_infra_map.json`: The state of the local code when the plan was generated.
- `plan.yaml`: The plan to apply to the remote database based on the diff between the two states.
You will commit the entire `migrations/` directory to version control, and Moose will automatically apply the plan when you deploy the code to production.
## Review and edit the plan
Moose makes some assumptions about your schema changes, such as renaming a column instead of dropping and adding. You can modify the plan to override these assumptions.
Open `plan.yaml` in your PR. Operations are ordered (teardown first, then setup) to avoid dependency issues. Review like regular code. You can also edit the plan to override the default assumptions Moose makes.
```yaml filename="migrations/plan.yaml" copy
# Drop a deprecated column
- DropTableColumn:
table: "events"
column_name: "deprecated_field"
# Rename a column to match code
- RenameTableColumn:
table: "events"
before_column_name: "createdAt"
after_column_name: "created_at"
# Add a new nullable column after created_at
- AddTableColumn:
table: "events"
column:
name: "status"
data_type: "String"
required: false
unique: false
primary_key: false
default: null
annotations: []
comment: null
after_column: "created_at"
# Change a column type to Nullable(Float64)
- ModifyTableColumn:
table: "events"
before_column:
name: "value"
data_type: "Float64"
required: false
unique: false
primary_key: false
default: null
annotations: []
comment: null
after_column:
name: "value"
data_type:
Nullable:
nullable: "Float64"
required: false
unique: false
primary_key: false
default: null
annotations: []
comment: null
# Create a simple view via raw SQL
- RawSql:
sql:
- "CREATE VIEW IF NOT EXISTS `events_by_user` AS SELECT user_id, count() AS c FROM events GROUP BY user_id"
description: "Creating view events_by_user"
```
You can edit the plan to override the default assumptions Moose makes.
### When to edit the plan
There are two main reasons to edit the plan:
1. To "override" the default assumptions Moose makes when it cannot infer the intent of your schema changes, such as renaming a column instead of dropping and adding.
2. To add new operations that are not covered by the default assumptions, such as adding a backfill operation to a new column.
#### Rename a column instead of drop/add
When you rename a column, Moose will default to dropping and adding the column. However, you can override this by using the `RenameTableColumn` operation:
```yaml filename="migrations/plan.yaml" copy
- DropTableColumn:
table: source_table
column_name: created_at
- AddTableColumn:
table: source_table
column:
name: createdAt
data_type: DateTime
required: true
unique: false
primary_key: false
default: null
annotations: []
after_column: color
```
In the plan, you can override this by using the `RenameTableColumn` operation:
```yaml filename="migrations/plan.yaml" copy
created_at: 2025-08-20T05:35:31.668353Z
- RenameTableColumn:
table: source_table
before_column_name: created_at
after_column_name: createdAt
```
#### Add a backfill operation to a new column
When you add a new column, Moose will default to backfilling the column based on the value in the `default` field.
If your field is a `DateTime`, you can edit the plan to set the default value to the current timestamp:
```yaml filename="migrations/plan.yaml" copy
- AddTableColumn:
table: "source_table"
column:
name: "created_at"
data_type: "DateTime"
required: false
unique: false
default: NOW ## Specify the default value to the current timestamp
```
You can also override the the default behavior by using the `RawSql` operation to define your own custom backfill logic:
```yaml filename="migrations/plan.yaml" copy
- AddTableColumn:
table: "source_table"
column:
name: "created_at"
data_type: "DateTime"
required: false
unique: false
default: null
- RawSql:
sql:
- "UPDATE events SET created_at = toDateTime(created_at_ms / 1000) WHERE created_at IS NULL"
description: "Backfill created_at from created_at_ms"
```
## Deployment Flows
### Moose Server Deployments
For Moose server deployments (with `moose prod` running), migrations are applied automatically on startup. Generate plans using:
```bash filename="Terminal" copy
moose generate migration --url https:// --token --save
```
When you deploy, Moose validates the plan and executes it automatically.
### Serverless Deployments
For serverless deployments (no Moose server), you manage migrations manually using the ClickHouse connection directly:
```toml file="moose.config.toml"
[state_config]
storage = "clickhouse"
[features]
olap = true
data_model_v2 = true
```
**Workflow:**
1. **Generate the plan** from your ClickHouse database:
```bash filename="Terminal" copy
moose generate migration --clickhouse-url --save
```
2. **Review** the generated `./migrations/` files in your PR
3. **Execute the plan** against your ClickHouse with CI/CD or manually:
```bash filename="Terminal" copy
moose migrate --clickhouse-url
```
Before applying the plan, Moose will first validate that the snapshot of your database that was taken when you generated the plan is still the same as the current database state. If it is not, Moose will abort the deployment. If it is, Moose will execute the plan in `plan.yaml` against your production database.
Execution rules:
- If current tables in your live production database differ from `remote_state.json`, Moose aborts (remote drift since planning).
- If desired tables in your local code differ from `local_infra_map.json`, Moose aborts (code changed since planning).
- If both match, `plan.yaml` operations are executed in order against ClickHouse.
## Troubleshooting
- Failure to connect to remote database? Make sure you have [your admin API key setup correctly](./apis/auth#admin-endpoints)
- Plan rejected due to drift: Re-generate a plan against the current remote, review, and retry.
- No execution in moose server deployments: Ensure `ddl_plan = true` and `./migrations/plan.yaml` exists.
- OLAP disabled: Ensure `[features].olap = true`.
---
## Querying Data
Source: moose/olap/read-data.mdx
Query OLAP tables using SQL with type safety
# Querying Data
Moose provides type-safe SQL querying for your `OlapTable` and `MaterializedView` instances. Use cases include:
- Building APIs to expose your data to client/frontend applications
- Building transformation pipelines inside your database with materialized views
## Querying with MooseClient
Use `MooseClient` to query data from existing tables and materialized views.
### Basic Querying
You can use a formatted string with `execute`:
```py filename="BasicQuerying.py"
from moose_lib import MooseClient
from app.UserTable import UserTable
client = MooseClient()
status = "active"
limit = 10
query = """
SELECT id, name, email
FROM {table}
WHERE status = {status}
LIMIT {limit}
"""
rows = client.query.execute(query, {"table": UserTable, "status": status, "limit": limit})
rows = client.query.execute(query)
```
This allows you to safely interpolate the table and column names while still using your Moose OlapTables and columns.
If you'd rather just use the raw ClickHouse python driver with server-side parameter binding, you can use `execute_raw`:
```py filename="BasicQuerying.py"
from moose_lib import MooseClient
client = MooseClient()
# Query existing table using execute_raw with explicit ClickHouse types
query = """
SELECT id, name, email
FROM users
WHERE status = {status:String}
LIMIT {limit:UInt32}
"""
rows = client.query.execute_raw(query, {
"status": "active",
"limit": 10
})
```
### Querying Materialized Views
You can use a formatted string with `execute`:
```py filename="QueryMaterializedView.py"
from moose_lib import MooseClient
client = MooseClient()
min_orders = 10
query = """
SELECT user_id, total_orders, average_order_value
FROM user_stats_view
WHERE total_orders > {min_orders}
ORDER BY average_order_value DESC
"""
rows = client.query.execute(query, {"min_orders": min_orders})
```
Use `execute_raw` with parameter binding:
```py filename="QueryMaterializedView.py"
from moose_lib import MooseClient
client = MooseClient()
min_orders = 10
# Query existing materialized view
query = """
SELECT user_id, total_orders, average_order_value
FROM user_stats_view
WHERE total_orders > {min_orders:UInt32}
ORDER BY average_order_value DESC
"""
rows = client.query.execute_raw(query, {"min_orders": min_orders})
```
## Select With Column and Table References
```py filename="TypedReferences.py"
from moose_lib import MooseClient
from app.UserTable import UserTable
client = MooseClient()
status = "active"
query = """
SELECT
{column}
FROM {table}
WHERE status = {status}
"""
rows = client.query.execute(query, {"column": UserTable.cols.id, "table": UserTable, "status": status})
```
```python copy
from moose_lib import MooseClient
client = MooseClient()
# Use parameter binding with explicit identifiers
query = """
SELECT
id,
name,
email
FROM {table: Identifier}
WHERE status = {status:String}
"""
rows = client.query.execute_raw(query, {"table": UserTable.name, "status": "active"})
```
## Filtering with WHERE Clauses
```py copy
from moose_lib import MooseClient
client = MooseClient()
status = "active"
start_date = "2024-01-01"
search_pattern = "%example%"
min_age = 18
max_age = 65
user_ids = [1, 2, 3, 4, 5]
# Multiple WHERE conditions
filter_query = """
SELECT id, name
FROM {table}
WHERE status = {status}
AND created_at > {start_date}
AND email ILIKE {search_pattern}
"""
# Using BETWEEN
range_query = """
SELECT * FROM {table}
WHERE age BETWEEN {min_age} AND {max_age}
"""
# Using IN
in_query = """
SELECT * FROM {table}
WHERE id IN {user_ids}
"""
# Execute examples
filter_rows = client.query.execute(filter_query, {"table": UserTable, "status": status, "startDate": start_date, "searchPattern": search_pattern})
range_rows = client.query.execute(range_query, {"table": UserTable, "minAge": min_age, "maxAge": max_age})
in_rows = client.query.execute(in_query, {"table": UserTable, "userIds": user_ids})
```
```py filename="WhereClauses.py"
from moose_lib import MooseClient
client = MooseClient()
# Multiple WHERE conditions
filter_query = """
SELECT id, name
FROM users
WHERE status = {status:String}
AND created_at > {startDate:DateTime}
AND email ILIKE {searchPattern:String}
"""
# Using BETWEEN
range_query = """
SELECT * FROM users
WHERE age BETWEEN {minAge:UInt32} AND {maxAge:UInt32}
"""
# Using IN with typed arrays
in_query = """
SELECT * FROM users
WHERE id IN {userIds:Array(UInt32)}
"""
# Execute examples
filter_rows = client.query.execute_raw(filter_query, {
"status": "active",
"startDate": "2024-01-01",
"searchPattern": "%example%"
})
range_rows = client.query.execute_raw(range_query, {
"minAge": 18,
"maxAge": 65
})
in_rows = client.query.execute_raw(in_query, {
"userIds": [1, 2, 3, 4, 5]
})
```
## Dynamic Query Building
Moose provides two distinct approaches for executing queries in Python. Choose the right one for your use case:
- Option 1: Use formatted strings with `execute`
- Option 2: Use `execute_raw` with parameter binding (lowest level of abstraction)
```py filename="execute.py"
from moose_lib import MooseClient
from pydantic import BaseModel, Field, validator
from typing import Optional
client = MooseClient()
# Example: Static query with validated parameters
def get_active_users(status: str, limit: int):
# Static table/column names, validated parameters
query = """
SELECT id, name, email
FROM {table}
WHERE status = {status}
LIMIT {limit}
"""
return client.query.execute(query, {"table": UserTable, "status": status, "limit": limit})
# Usage with validated input
active_users = get_active_users("active", 10)
class UserQueryParams(BaseModel):
status: str = Field(..., pattern=r"^(active|inactive|pending)$")
limit: int = Field(default=10, ge=1, le=1000)
def build_validated_query(params: UserQueryParams):
# All parameters are validated by Pydantic
query = """
SELECT id, name, email
FROM {table}
WHERE status = {status}
LIMIT {limit}
"""
return client.query.execute(query, {"table": UserTable, "status": params.status, "limit": params.limit})
```
```py filename="ParameterBinding.py"
from moose_lib import MooseClient
client = MooseClient()
# Example: Dynamic table and column selection with server-side parameter binding
def query_user_data(table_name: str, status_filter: str, limit: int):
# Dynamic identifiers in query structure, bound parameters for values
query = """
SELECT id, name, email
FROM {table_name:Identifier}
WHERE status = {status:String}
AND created_at > {startDate:DateTime}
LIMIT {limit:UInt32}
"""
return client.query.execute_raw(query, {
"table_name": table_name, # Bound parameter
"status": status_filter, # Bound parameter
"startDate": "2024-01-01T00:00:00", # Bound parameter
"limit": limit # Bound parameter
})
# Usage with different tables
users_data = query_user_data("users", "active", 10)
admins_data = query_user_data("admin_users", "pending", 5)
# Conditional WHERE clauses
def build_conditional_query(client: MooseClient, params: FilterParams):
conditions: list[str] = []
parameters: dict = {}
if params.min_age is not None:
conditions.append("age >= {minAge:UInt32}")
parameters["minAge"] = params.min_age
if params.status:
conditions.append("status = {status:String}")
parameters["status"] = params.status
if params.search_text:
conditions.append("(name ILIKE {searchPattern:String} OR email ILIKE {searchPattern:String})")
parameters["searchPattern"] = f"%{params.search_text}%"
query = "SELECT * FROM users"
if conditions:
query += " WHERE " + " AND ".join(conditions)
query += " ORDER BY created_at DESC"
return client.query.execute_raw(query, parameters)
```
## Building APIs
To build REST APIs that expose your data, see the [Analytics APIs documentation](/moose/apis/analytics-api) for comprehensive examples and patterns.
## Common Pitfalls
## Performance Optimization
If your query is slower than expected, there are a few things you can check:
- If using filters, try to filter on a column that is defined in the `orderByFields` of the table
- For common queries, consider [creating a materialized view](/stack/olap/create-materialized-view) to pre-compute the result set
## Further Reading
---
## olap/schema-change
Source: moose/olap/schema-change.mdx
# Handling Failed Migrations
One of the main benefits of the Moose local development environment is that you can detect breaking schema changes before they happen in production. This can be specifically useful for identifying incompatible data type changes when you change a column's data type and the generated migration cannot cast the existing data to the new type.
This page describes how to recover from a failed migration in dev and gives a playbook for safely achieving the desired type change.
## What happened
You changed a column’s data type on a table that already has data. The dev migration tried to run an in-place ALTER and ClickHouse created a mutation that failed (incompatible cast, nullability, defaults, etc.).
Symptoms:
- Failed migration in dev
- A stuck mutation on the table
- Reverting your code type alone doesn’t help until the mutation is cleared
## Quick recovery (dev)
Follow these steps to get unblocked quickly.
### View the terminal logs to see the failing mutation
In your terminal, you should see a message like this:
```txt
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
~ value: String -> Float64
Applying: ALTER TABLE events MODIFY COLUMN value Float64
ClickHouse mutation created: mutation_id='00000001-0000-4000-8000-000000000123'
Error: Code: 368. Conversion failed: cannot parse 'abc' as Float64 (column: value)
Status: mutation failed; table may be partially transformed
```
Copy the mutation ID from the terminal logs and run the following command to kill the mutation.
### Kill the mutation
- If you have the `mutation_id`:
```sql
KILL MUTATION WHERE mutation_id = '';
```
- If you didn’t capture the ID, find it and kill by table:
```sql
SELECT mutation_id, command, is_done, latest_fail_reason
FROM system.mutations
WHERE database = currentDatabase() AND table = ''
ORDER BY create_time DESC;
KILL MUTATION WHERE database = currentDatabase() AND table = '';
```
ClickHouse ALTERs are implemented as asynchronous mutations, not transactional. If a mutation fails mid-way, some parts may have been rewritten while others were not, leaving the table partially transformed. The failed mutation also remains queued until you kill it. Clear the mutation first, then proceed.
Soon, Moose will automatically generate a local DDL plan that kills the mutation and "rolls back" the transformation to the data that was changed before the failure occurred.
### Revert your code to match the current DB schema
- Change the column type in code back to the previous (working) type
- Save your changes; let `moose dev` resync. You should be able to query the table again
If the table only has disposable dev data, you can also `TRUNCATE TABLE .` or drop/recreate the table and let `moose dev` rebuild it. Only do this in dev.
## Safely achieving the desired type change
Instead of editing the column type in place, you can add a new column with the target type and backfill the data. This is the recommended approach.
### Add a new column + backfill
Then, generate a plan to add the new column and backfill the data.
```bash
moose generate migration --url --save --token
```
Open the generated `/migrations/plan.yaml` file. You'll see the `AddTableColumn` operation to add the new column. Right after it, you can add a `RawSql` operation to backfill the data. Here you can write an `ALTER TABLE` statement to update the new column with the data from the old column:
```yaml filename="migrations/plan.yaml"
- AddTableColumn:
table: "events"
column:
name: "status_v2"
data_type:
Nullable:
nullable: "StatusEnum"
default: null
- RawSql:
sql:
- "ALTER TABLE events UPDATE status_v2 = toStatusEnumOrNull(status) WHERE status_v2 IS NULL"
description: "Backfill status_v2 from status"
```
Then, when writing to the table, double write to both columns.
This allows for all surrounding processes and applications that rely on the old column to continue working, and you can later deprecate the old column and rename the new column when you are ready.
### Later, deprecate the old column and rename the new column
Once the column backfill is complete and you are ready to deprecate the old column, you can rename the new column to the old column name and apply this in a new, subsequent PR.
In your code, you can rename the column and deprecate the old column:
```python filename="app/tables/events.py" copy
class Event(BaseModel):
id: Key[str]
name: str
created_at: datetime
status_old: str
status: StatusEnum
table = OlapTable[Event]("events")
```
Initially you'll see two `DeleteTableColumn` operations, followed by two `AddTableColumn` operations.
*IMPORTANT*: DELETE ALL FOUR GENERATED `DeleteTableColumn` AND `AddTableColumn` OPERATIONS WITH THE FOLLOWING:
```yaml filename="migrations/plan.yaml"
- RenameTableColumn:
table: "events"
before_column_name: "status"
after_column_name: "status_old"
- RenameTableColumn:
table: "events"
before_column_name: "status_v2"
after_column_name: "status"
```
Once the old column is no longer needed, you can drop it in a third PR.
```yaml filename="migrations/plan.yaml"
- DropTableColumn:
table: "events"
column_name: "status_old"
```
## Common breaking cases
- String -> Int/Float: can fail on non-numeric rows; prefer `toInt64OrNull(...)`/`toFloat64OrNull(...)` + backfill
- Nullable(T) -> T (NOT NULL): fails if any NULLs exist and no default is provided; backfill then drop nullability
- Narrowing types (e.g., Int64 -> Int32): fails if values overflow; validate and transform first
Read about migration planning and how to use it to safely manage schema changes in production.
---
## olap/schema-optimization
Source: moose/olap/schema-optimization.mdx
# Schema Optimization
Choosing the right data types and column ordering for your tables is crucial for ClickHouse performance and storage efficiency. Poor schema design can lead to 10-100x slower queries and 2-5x larger storage requirements.
## Data Types
Keep the following best practices in mind when defining your column types:
### Avoid Nullable Columns
Nullable columns in ClickHouse have significant performance overhead.
Instead of using `| None` or `Optional[type]`, add the `Annotated[type, clickhouse_default("...")]` to your column type.
```py filename="AvoidNullable.py"
from moose_lib import OlapTable
from pydantic import BaseModel, Field
# ❌ Bad: Using nullable columns
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
description: str | None = None # Nullable
metadata: dict | None = None # Nullable
created_at: Date
# ✅ Good: Use default values instead
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
description: str = "" # Default empty string
metadata: dict = Field(default_factory=dict) # Default empty dict
created_at: Date
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["id", "created_at"]
})
```
### Use `LowCardinality` where possible
`LowCardinality` is ClickHouse's most efficient string type for columns with limited unique values.
```py filename="LowCardinality.py"
from moose_lib import OlapTable, LowCardinality
from pydantic import BaseModel
from typing import Literal
class UserEvent(BaseModel):
id: str
user_id: str
event_type: Annotated[str, "LowCardinality"] # ✅ Good for limited values
status: Literal["active", "inactive", "pending"]# ✅ Literals become LowCardinality automatically
country: Annotated[str, "LowCardinality"] # ✅ Good for country codes
user_agent: str # ❌ Keep as String for high cardinality
created_at: Date
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["id", "created_at"]
})
```
### Pick the right Integer types
Choose the smallest integer type that fits your data range to save storage and improve performance.
```py filename="IntegerTypes.py"
from moose_lib import OlapTable, UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64
from pydantic import BaseModel
class UserEvent(BaseModel):
id: str
user_id: str
age: Annotated[int, "int8"] # ✅ 0-255 (1 byte)
score: Annotated[int, "uint16"] # ✅ -32,768 to 32,767 (2 bytes)
view_count: Annotated[int, "int32"] # ✅ 0 to ~4 billion (4 bytes)
timestamp: Annotated[int, "int64"] # ✅ Unix timestamp (8 bytes)
event_type: str
created_at: Date
# Integer type ranges:
# UInt8: 0 to 255
# UInt16: 0 to 65,535
# UInt32: 0 to 4,294,967,295
# UInt64: 0 to 18,446,744,073,709,551,615
# Int8: -128 to 127
# Int16: -32,768 to 32,767
# Int32: -2,147,483,648 to 2,147,483,647
# Int64: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
```
### Use the right precision for `DateTime`
Choose appropriate DateTime precision based on your use case to balance storage and precision.
```py filename="DateTimePrecision.py"
from moose_lib import OlapTable, DateTime, DateTime64
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
created_at: datetime # ✅ Second precision (default)
updated_at: clickhouse_datetime(3) # ✅ Millisecond precision
processed_at: clickhouse_datetime(6) # ✅ Microsecond precision
logged_at: clickhouse_datetime(9) # ✅ Nanosecond precision
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["id", "created_at"]
})
```
### Use Decimal over Float
Use `Decimal` for financial and precise calculations to avoid floating-point precision issues.
```py filename="DecimalOverFloat.py"
from moose_lib import OlapTable, Decimal
from pydantic import BaseModel
class Order(BaseModel):
id: str
user_id: str
amount: clickhouse_decimal(10, 2) # ✅ 10 total digits, 2 decimal places
tax: clickhouse_decimal(8, 2) # ✅ 8 total digits, 2 decimal places
discount: clickhouse_decimal(5, 2) # ✅ 5 total digits, 2 decimal places
total: clickhouse_decimal(12, 2) # ✅ 12 total digits, 2 decimal places
created_at: Date
# ❌ Bad: Using float for financial data
class BadOrder(BaseModel):
id: str
amount: float # Float - can cause precision issues
tax: float # Float - can cause precision issues
orders_table = OlapTable[Order]("orders", {
"orderByFields": ["id", "created_at"]
})
```
### Use `NamedTuple` over `Nested`
`NamedTuple` is more efficient than `Nested` for structured data in ClickHouse.
```py filename="NamedTupleOverNested.py"
from moose_lib import OlapTable, NamedTuple
from pydantic import BaseModel
class Location(BaseModel):
latitude: float
longitude: float
city: str
country: str
class Metadata(BaseModel):
version: str
source: str
priority: int
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
location: Annotated[Location, "ClickHouseNamedTuple"] # lat, lon, city, country
metadata: Annotated[Metadata, "ClickHouseNamedTuple"] # version, source, priority
created_at: Date
# ❌ Bad: Using nested objects (less efficient)
class BadUserEvent(BaseModel):
id: str
location: Location # Nested - less efficient
metadata: Metadata # Nested - less efficient
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["id", "created_at"]
})
```
## Ordering
### Choose columns that you will use in WHERE and GROUP BY clauses
Optimize your `orderByFields` (or `orderByExpression`) for your most common query patterns.
```py filename="OrderByOptimization.py"
from moose_lib import OlapTable
from pydantic import BaseModel
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
status: str
created_at: Date
country: str
# ✅ Good: Optimized for common query patterns
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["user_id", "created_at", "event_type"] # Most common filters first
})
# Common queries this optimizes for:
# - WHERE user_id = ? AND created_at > ?
# - WHERE user_id = ? AND event_type = ?
# - GROUP BY user_id, event_type
```
### `ORDER BY` should prioritize LowCardinality columns first
Place `LowCardinality` columns first in your `order_by_fields` (or reflect this priority in your `order_by_expression`) for better compression and query performance.
```py filename="LowCardinalityOrdering.py"
from moose_lib import OlapTable, LowCardinality
from pydantic import BaseModel
class UserEvent(BaseModel):
id: str
user_id: str
event_type: LowCardinality[str] # ✅ Low cardinality
status: LowCardinality[str] # ✅ Low cardinality
country: LowCardinality[str] # ✅ Low cardinality
created_at: Date # High cardinality
session_id: str # High cardinality
# ✅ Good: LowCardinality columns first
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["event_type", "status", "country", "created_at", "session_id"]
})
# ❌ Bad: High cardinality columns first
bad_user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["created_at", "session_id", "event_type", "status"] # Less efficient
})
```
---
## Schema Versioning with Materialized Views
Source: moose/olap/schema-versioning.mdx
Use table versions and materialized views to migrate breaking schema changes safely
# Table Versioning & Blue/Green Migrations
## Overview
Changing a table's storage layout (engine or sorting key) in ClickHouse requires a full table rewrite. Doing it in-place can block or slow concurrent reads and writes due to heavy merges and metadata changes, creating real risk for production workloads. Blue/Green avoids this by creating a new versioned table and migrating data live via a materialized view, so traffic continues uninterrupted.
**When to use it**:
- Change the **table engine** (e.g., MergeTree → ReplacingMergeTree)
- Update **ORDER BY fields** (sorting keys) to better match query patterns
- Reshape **primary keys** or perform type changes that require a rewrite
**How Moose does it**:
1. Define a new table with the same logical name and a bumped `version`, setting the new `order_by_fields` and/or `engine` ([Table modeling](/moose/olap/model-table)).
2. Create a [Materialized view](/moose/olap/model-materialized-view) that selects from the old table and writes to the new one; Moose backfills once and keeps the view live for new inserts.
3. Later on, cut over readers/writers to the new export and clean up old resources ([Applying migrations](/moose/olap/apply-migrations)).
Setting `config.version` on an `OlapTable` changes only the underlying table name (suffixes dots with underscores). Your code still refers to the logical table you exported.
## High-level workflow
## Example: change sorting key (ORDER BY)
Assume the original `events` table orders by `id` only. We want to update the
sorting key to optimize reads by ordering on `id, createdAt`.
### Original table (version 0.0)
```python filename="app/tables/events.py" copy
from typing import Annotated
from pydantic import BaseModel
from moose_lib import OlapTable, Key, OlapConfig
class EventV0(BaseModel):
id: str
name: str
created_at: str # datetime in your format
events = OlapTable[EventV0]("events", config=OlapConfig(version="0.0", order_by_fields=["id"]))
```
### New table (bump to version 0.1)
Create a new table with the same logical name, but set `version: "0.1"` and update the ordering to `id, createdAt`. Moose will create `events_0_1` in ClickHouse.
```python filename="app/tables/events_v01.py" copy
class EventV1(BaseModel):
id: Key[str]
name: str
created_at: str
events_v1 = OlapTable[EventV1]("events", config=OlapConfig(version="0.1", order_by_fields=["id", "created_at"]))
```
### Create the materialized view to migrate data
Create a materialized view that:
- SELECTs from the old table (`events_v0`)
- copies fields 1:1 to the new table
- writes into the versioned target table (`events_v1`)
Pass the versioned `OlapTable` instance as `targetTable`. If you only pass a `tableName`, Moose will create an unversioned target.
```python filename="app/views/migrate_events_to_v01.py" copy
from moose_lib import MaterializedView, MaterializedViewOptions
from app.tables.events import events
from app.tables.events_v01 import events_v1, EventV1
migrate_events_to_v01 = MaterializedView[EventV1](
MaterializedViewOptions(
materialized_view_name="mv_events_to_0_1",
select_statement=(
f"SELECT * FROM {events.name}"
),
select_tables=[events],
),
target_table=events_v1,
)
```
What happens when you export this view:
- Moose creates the versioned table if needed
- Moose creates the MATERIALIZED VIEW and immediately runs a one-time backfill (`INSERT INTO ... SELECT ...`)
- ClickHouse keeps the view active: any new inserts into `events` automatically flow into `events_0_1`
## Cutover and cleanup
- Update readers to query the new table export (`eventsV1`).
- Update writers/streams to produce to the new table if applicable.
- After verifying parity and retention windows, drop the old table and the migration view.
## Notes and tips
- Use semantic versions like `0.1`, `1.0`, `1.1`. Moose will render `events_1_1` as the physical name.
- Keep the migration view simple and deterministic. If you need complex transforms, prefer explicit SQL in the `selectStatement`.
- Very large backfills can take time. Consider deploying during low-traffic windows.
---
## Supported Column Types
Source: moose/olap/supported-types.mdx
Complete guide to defining columns for ClickHouse tables in Moose
# Supported Column Types
Moose supports a comprehensive set of ClickHouse column types across both TypeScript and Python libraries. This guide covers all supported types, their syntax, and best practices for defining table schemas.
## Basic Types
### String Types
```python
from typing import Literal
from uuid import UUID
class User(BaseModel):
string: str # String
low_cardinality: Annotated[str, "LowCardinality"] # LowCardinality(String)
uuid: UUID # UUID
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `String` | `str` | Variable-length string |
| `LowCardinality(String)` | `str` with `Literal[str]` | Optimized for repeated values |
| `UUID` | `UUID` | UUID format strings |
### Numeric Types
### Integer Types
```python
from typing import Annotated
class Metrics(BaseModel):
user_id: Annotated[int, "int32"] # Int32
count: Annotated[int, "int64"] # Int64
small_value: Annotated[int, "uint8"] # UInt8
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Int8` | `Annotated[int, "int8"]` | -128 to 127 |
| `Int16` | `Annotated[int, "int16"]` | -32,768 to 32,767 |
| `Int32` | `Annotated[int, "int32"]` | -2,147,483,648 to 2,147,483,647 |
| `Int64` | `Annotated[int, "int64"]` | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
| `UInt8` | `Annotated[int, "uint8"]` | 0 to 255 |
| `UInt16` | `Annotated[int, "uint16"]` | 0 to 65,535 |
| `UInt32` | `Annotated[int, "uint32"]` | 0 to 4,294,967,295 |
| `UInt64` | `Annotated[int, "uint64"]` | 0 to 18,446,744,073,709,551,615 |
### Floating Point Types
```python
from moose_lib import ClickhouseSize
class SensorData(BaseModel):
temperature: float # Float64
humidity: Annotated[float, ClickhouseSize(4)] # Float32
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Float64` | `float` | floating point number |
### Decimal Types
```python
from moose_lib import clickhouse_decimal
class FinancialData(BaseModel):
amount: clickhouse_decimal(10, 2) # Decimal(10,2)
rate: clickhouse_decimal(5, 4) # Decimal(5,4)
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Decimal(P,S)` | `clickhouse_decimal(P,S)` | Fixed-point decimal |
### Boolean Type
```python
class User(BaseModel):
is_active: bool
verified: bool
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Boolean` | `bool` | `bool` |
### Date and Time Types
```python
from datetime import date, datetime
from moose_lib import ClickhouseSize, clickhouse_datetime64
class Event(BaseModel):
created_at: datetime # DateTime
updated_at: clickhouse_datetime64(3) # DateTime(3)
birth_date: date # Date
compact_date: Annotated[date, ClickhouseSize(2)] # Date16
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Date` | `date` | Date only |
| `Date16` | `date` | `Annotated[date, ClickhouseSize(2)]` | Compact date format |
| `DateTime` | `datetime` | Date and time |
### Network Types
```python
from ipaddress import IPv4Address, IPv6Address
class NetworkEvent(BaseModel):
source_ip: IPv4Address
dest_ip: IPv6Address
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `IPv4` | `ipaddress.IPv4Address` | IPv4 addresses |
| `IPv6` | `ipaddress.IPv6Address` | IPv6 addresses |
## Complex Types
### Geometry Types
Moose supports ClickHouse geometry types. Use the helpers in each language to get type-safe models and correct ClickHouse mappings.
```python
from moose_lib import Point, Ring, LineString, MultiLineString, Polygon, MultiPolygon
class GeoTypes(BaseModel):
point: Point # tuple[float, float]
ring: Ring # list[tuple[float, float]]
line_string: LineString # list[tuple[float, float]]
multi_line_string: MultiLineString # list[list[tuple[float, float]]]
polygon: Polygon # list[list[tuple[float, float]]]
multi_polygon: MultiPolygon # list[list[list[tuple[float, float]]]]
```
| ClickHouse Type | Python |
|------|------------|
| `Point` | `Point` (tuple[float, float]) |
| `Ring` | `Ring` (list[tuple[float, float]]) |
| `LineString` | `LineString` (list[tuple[float, float]]) |
| `MultiLineString` | `MultiLineString` (list[list[tuple[float, float]]]) |
| `Polygon` | `Polygon` (list[list[tuple[float, float]]]) |
| `MultiPolygon` | `MultiPolygon` (list[list[list[tuple[float, float]]]]) |
Geometry coordinates are represented as numeric pairs `[x, y]` (TypeScript) or `tuple[float, float]` (Python).
### Array Types
Arrays are supported for all basic types and some complex types.
```python
from typing import List, Dict, Any
class User(BaseModel):
tags: List[str] # Array(String)
scores: List[float] # Array(Float64)
metadata: List[Dict[str, Any]] # Array(Json)
tuple: List[Tuple[str, int]] # Array(Tuple(String, Int32))
```
### Map Types
Maps store key-value pairs with specified key and value types.
```python
from typing import Dict
class User(BaseModel):
preferences: Dict[str, str] # Map(String, String)
metrics: Dict[str, float] # Map(String, Float64)
```
### Nested Types
Nested types allow embedding complex objects within tables.
```python
class Address(BaseModel):
street: str
city: str
zip: str
class User(BaseModel):
name: str
address: Address # Nested type
```
### Named Tuple Types
Named tuples provide structured data with named fields.
```python
from typing import Annotated
class Point(BaseModel):
x: float
y: float
class Shape(BaseModel):
center: Annotated[Point, "ClickHouseNamedTuple"] # Named tuple
radius: float
```
### Enum Types
Enums map to ClickHouse enums with string or integer values.
```python
from enum import Enum
class UserRole(str, Enum):
ADMIN = "admin"
USER = "user"
GUEST = "guest"
class User(BaseModel):
role: UserRole # Enum with string values
```
## Special Types
### JSON Type
The `Json` type stores arbitrary JSON data with optional schema configuration for performance and type safety.
#### Basic JSON (Unstructured)
For completely dynamic JSON data without any schema:
```python
from typing import Any, Dict
class Event(BaseModel):
metadata: Dict[str, Any] # Basic JSON - accepts any structure
config: Any # Basic JSON - fully dynamic
```
#### Rich JSON with Type Configuration
For better performance and validation, you can define typed fields within your JSON using `ClickHouseJson`. This creates a ClickHouse `JSON` column with explicit type hints for specific paths.
```python
from typing import Annotated
from pydantic import BaseModel, ConfigDict
from moose_lib.data_models import ClickHouseJson
# Define the structure for your JSON payload
class PayloadStructure(BaseModel):
model_config = ConfigDict(extra='allow') # Required for JSON types
name: str
count: int
timestamp: Optional[datetime] = None
class Event(BaseModel):
id: str
# JSON with typed paths - better performance, allows extra fields
payload: Annotated[PayloadStructure, ClickHouseJson()]
# JSON with performance tuning options
metadata: Annotated[PayloadStructure, ClickHouseJson(
max_dynamic_paths=256, # Limit tracked paths
max_dynamic_types=16, # Limit type variations
skip_paths=("skip.me",), # Exclude specific paths
skip_regexes=(r"^tmp\\.",) # Exclude paths matching regex
)]
```
#### Configuration Options
| Option | Type | Description |
|--------|------|-------------|
| `max_dynamic_paths` | `number` | Maximum number of unique JSON paths to track. Helps control memory usage for highly variable JSON structures. |
| `max_dynamic_types` | `number` | Maximum number of type variations allowed per path. Useful when paths may contain different types. |
| `skip_paths` | `string[]` | Array of exact JSON paths to ignore during ingestion (e.g., `["temp", "debug.info"]`). |
| `skip_regexps` | `string[]` | Array of regex patterns for paths to exclude (e.g., `["^tmp\\.", ".*_internal$"]`). |
#### Benefits of Typed JSON
1. **Better Performance**: ClickHouse can optimize storage and queries for known paths
2. **Type Safety**: Validates that specified paths match expected types
3. **Flexible Schema**: Allows additional fields beyond typed paths
4. **Memory Control**: Configure limits to prevent unbounded resource usage
- **Basic JSON** (`any`, `Dict[str, Any]`): Use when JSON structure is completely unknown or rarely queried
- **Rich JSON** (`ClickHouseJson`): Use when you have known fields that need indexing/querying, but want to allow additional dynamic fields
#### Example: Product Event Tracking
```python
from typing import Annotated, Optional
from pydantic import BaseModel, ConfigDict
from moose_lib import Key, ClickHouseJson
from datetime import datetime
class ProductProperties(BaseModel):
model_config = ConfigDict(extra='allow')
category: str
price: float
in_stock: bool
class ProductEvent(BaseModel):
event_id: Key[str]
timestamp: datetime
# Typed paths for common fields, but allows custom properties
properties: Annotated[ProductProperties, ClickHouseJson(
max_dynamic_paths=128, # Track up to 128 unique paths
max_dynamic_types=8, # Allow up to 8 type variations per path
skip_paths=("_internal",), # Ignore internal fields
skip_regexes=(r"^debug_",) # Ignore debug fields
)]
```
With this schema, you can send events like:
```python
{
"event_id": "evt_123",
"timestamp": "2025-10-22T12:00:00Z",
"properties": {
"category": "electronics", # Typed field ✓
"price": 99.99, # Typed field ✓
"in_stock": True, # Typed field ✓
"custom_tag": "holiday-sale", # Extra field - accepted ✓
"brand_id": 42, # Extra field - accepted ✓
"_internal": "ignored" # Skipped by skip_paths ✓
}
}
```
### Nullable Types
All types support nullable variants using optional types.
```python
from typing import Optional
class User(BaseModel):
name: str # Required
email: Optional[str] = None # Nullable
age: Optional[int] = None # Nullable
```
If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.
- Optional without default → ClickHouse Nullable type.
- Optional with default (using `clickhouse_default("18")` in annotations) → non-nullable column with default `18`.
This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.
### SimpleAggregateFunction
`SimpleAggregateFunction` is designed for use with `AggregatingMergeTree` tables. It stores pre-aggregated values that are automatically merged when ClickHouse combines rows with the same primary key.
```python
from moose_lib import simple_aggregated, Key, OlapTable, OlapConfig, AggregatingMergeTreeEngine
from pydantic import BaseModel
from datetime import datetime
class DailyStats(BaseModel):
date: datetime
user_id: str
total_views: simple_aggregated('sum', int)
max_score: simple_aggregated('max', float)
last_seen: simple_aggregated('anyLast', datetime)
stats_table = OlapTable[DailyStats](
"daily_stats",
OlapConfig(
engine=AggregatingMergeTreeEngine(),
order_by_fields=["date", "user_id"]
)
)
```
See [ClickHouse docs](https://clickhouse.com/docs/en/sql-reference/data-types/simpleaggregatefunction) for the complete list of functions.
## Table Engines
Moose supports all common ClickHouse table engines:
| Engine | Python | Description |
|--------|------------|-------------|
| `MergeTree` | `ClickHouseEngines.MergeTree` | Default engine |
| `ReplacingMergeTree` | `ClickHouseEngines.ReplacingMergeTree` | Deduplication |
| `SummingMergeTree` | `ClickHouseEngines.SummingMergeTree` | Aggregates numeric columns |
| `AggregatingMergeTree` | `ClickHouseEngines.AggregatingMergeTree` | Advanced aggregation |
| `ReplicatedMergeTree` | `ClickHouseEngines.ReplicatedMergeTree` | Replicated version of MergeTree |
| `ReplicatedReplacingMergeTree` | `ClickHouseEngines.ReplicatedReplacingMergeTree` | Replicated with deduplication |
| `ReplicatedSummingMergeTree` | `ClickHouseEngines.ReplicatedSummingMergeTree` | Replicated with aggregation |
| `ReplicatedAggregatingMergeTree` | `ClickHouseEngines.ReplicatedAggregatingMergeTree` | Replicated with advanced aggregation |
```python
from moose_lib import ClickHouseEngines
user_table = OlapTable("users", {
"engine": ClickHouseEngines.ReplacingMergeTree,
"orderByFields": ["id", "updated_at"]
})
```
## Best Practices
### Type Selection
- **Use specific integer types** when you know the value ranges to save storage
- **Prefer `Float64`** for most floating-point calculations unless storage is critical
- **Use `LowCardinality`** for string columns with repeated values
- **Choose appropriate DateTime precision** based on your accuracy needs
### Performance Considerations
- **Order columns by cardinality** (low to high) for better compression
- **Use `ReplacingMergeTree`** for tables with frequent updates
- **Specify `orderByFields` or `orderByExpression`** for optimal query performance
- **Consider `LowCardinality`** for string columns with < 10,000 unique values
---
## olap/ttl
Source: moose/olap/ttl.mdx
## TTL (Time-to-Live) for ClickHouse Tables
Moose lets you declare ClickHouse TTL directly in your data model:
- Table-level TTL via the `ttl` option on `OlapTable` config
- Column-level TTL via `ClickHouseTTL` on individual fields
### When to use TTL
- Automatically expire old rows to control storage cost
- Mask or drop sensitive columns earlier than the full row expiry
### TypeScript
```ts
interface Event {
id: Key;
timestamp: DateTime;
email: string & ClickHouseTTL<"timestamp + INTERVAL 30 DAY">; // column TTL
}
);
```
### Python
```python
from typing import Annotated
from moose_lib import OlapTable, OlapConfig, Key, ClickHouseTTL
from pydantic import BaseModel
from datetime import datetime
class Event(BaseModel):
id: Key[str]
timestamp: datetime
email: Annotated[str, ClickHouseTTL("timestamp + INTERVAL 30 DAY")]
events = OlapTable[Event](
"Events",
OlapConfig(
order_by_fields=["id", "timestamp"],
ttl="timestamp + INTERVAL 90 DAY DELETE",
),
)
```
### Notes
- Expressions must be valid ClickHouse TTL expressions, but do not include the leading `TTL` keyword.
- Column TTLs are independent from the table TTL and can be used together.
- Moose will apply TTL changes via migrations using `ALTER TABLE ... MODIFY TTL` and `MODIFY COLUMN ... TTL`.
### Related
- See `Modeling Tables` for defining your schema
- See `Applying Migrations` to roll out TTL changes
---
## Python Moose Lib Reference
Source: moose/reference/py-moose-lib.mdx
Python Moose Lib Reference
# API Reference
This is a comprehensive reference for the Python `moose_lib`, detailing all exported components, types, and utilities.
## Core Types
### `Key[T]`
A type annotation for marking fields as primary keys in data models. Used with Pydantic.
```python
from moose_lib import Key
from pydantic import BaseModel
class MyModel(BaseModel):
id: Key[str] # Marks 'id' as a primary key of type string
```
### `BaseModel`
Pydantic base model used for data modeling in Moose.
```python
from pydantic import BaseModel
class MyDataModel(BaseModel):
id: str
name: str
count: int
```
### `MooseClient`
Client for interacting with ClickHouse and Temporal.
```python
class MooseClient:
query: QueryClient # For database queries
workflow: Optional[WorkflowClient] # For workflow operations
```
### `ApiResult`
Class representing the result of a analytics API call.
```python
@dataclass
class ApiResult:
status: int # HTTP status code
body: Any # Response body
```
## Configuration Types
### `OlapConfig`
Configuration for OLAP tables.
```python
from typing import Union, Optional
from moose_lib.blocks import EngineConfig
class OlapConfig(BaseModel):
database: Optional[str] = None # Optional database name (defaults to moose.config.toml clickhouse_config.db_name)
order_by_fields: list[str] = [] # Fields to order by
engine: Optional[EngineConfig] = None # Table engine configuration
```
### `EngineConfig` Classes
Base class and implementations for table engine configurations.
```python
# Base class
class EngineConfig:
pass
# Available engine implementations
class MergeTreeEngine(EngineConfig):
pass
class ReplacingMergeTreeEngine(EngineConfig):
ver: Optional[str] = None # Version column for keeping latest
is_deleted: Optional[str] = None # Soft delete marker (requires ver)
class AggregatingMergeTreeEngine(EngineConfig):
pass
class SummingMergeTreeEngine(EngineConfig):
columns: Optional[List[str]] = None # Columns to sum
# Replicated engines
class ReplicatedMergeTreeEngine(EngineConfig):
keeper_path: Optional[str] = None # ZooKeeper/Keeper path (optional for Cloud)
replica_name: Optional[str] = None # Replica name (optional for Cloud)
class ReplicatedReplacingMergeTreeEngine(EngineConfig):
keeper_path: Optional[str] = None # ZooKeeper/Keeper path (optional for Cloud)
replica_name: Optional[str] = None # Replica name (optional for Cloud)
ver: Optional[str] = None # Version column for keeping latest
is_deleted: Optional[str] = None # Soft delete marker (requires ver)
class ReplicatedAggregatingMergeTreeEngine(EngineConfig):
keeper_path: Optional[str] = None # ZooKeeper/Keeper path (optional for Cloud)
replica_name: Optional[str] = None # Replica name (optional for Cloud)
class ReplicatedSummingMergeTreeEngine(EngineConfig):
keeper_path: Optional[str] = None # ZooKeeper/Keeper path (optional for Cloud)
replica_name: Optional[str] = None # Replica name (optional for Cloud)
columns: Optional[List[str]] = None # Columns to sum
```
### `StreamConfig`
Configuration for data streams.
```python
class StreamConfig(BaseModel):
parallelism: int = 1
retention_period: int = 60 * 60 * 24 * 7 # 7 days
destination: Optional[OlapTable[Any]] = None
```
### `IngestConfig`
Configuration for data ingestion.
```python
class IngestConfig(BaseModel):
destination: Optional[OlapTable[Any]] = None
```
### `IngestPipelineConfig`
Configuration for creating a complete data pipeline.
```python
class IngestPipelineConfig(BaseModel):
table: bool | OlapConfig = True
stream: bool | StreamConfig = True
ingest_api: bool | IngestConfig = True
```
## Infrastructure Components
### `OlapTable[T]`
Creates a ClickHouse table with the schema of type T.
```python
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
# Basic usage
my_table = OlapTable[UserProfile]("user_profiles")
# With configuration (fields)
my_table = OlapTable[UserProfile]("user_profiles", OlapConfig(
order_by_fields=["id", "timestamp"],
engine=ReplacingMergeTreeEngine(),
))
# With configuration (expression)
my_table_expr = OlapTable[UserProfile]("user_profiles_expr", OlapConfig(
order_by_expression="(id, timestamp)",
engine=ReplacingMergeTreeEngine(),
))
# With custom database override
analytics_table = OlapTable[UserProfile]("user_profiles", OlapConfig(
database="analytics", # Override default database
order_by_fields=["id", "timestamp"]
))
# Disable sorting entirely
my_table_unsorted = OlapTable[UserProfile]("user_profiles_unsorted", OlapConfig(
order_by_expression="tuple()",
))
```
### `Stream[T]`
Creates a Redpanda topic with the schema of type T.
```python
# Basic usage
my_stream = Stream[UserEvent]("user_events")
# With configuration
my_stream = Stream[UserEvent]("user_events", StreamConfig(
parallelism=3,
retention_period=86400 # 1 day in seconds
))
# Adding transformations
def transform_user_event(event: UserEvent) -> ProfileUpdate:
return ProfileUpdate(user_id=event.user_id, update_type="event")
my_stream.add_transform(profile_stream, transform_user_event)
```
### `IngestApi[T]`
Creates an HTTP endpoint for ingesting data of type T.
```python
# Basic usage with destination stream
my_ingest_api = IngestApi[UserEvent]("user_events", IngestConfigWithDestination(
destination=my_user_event_stream
))
```
### `Api[T, U]`
Creates an HTTP endpoint for querying data with request type T and response type U.
```python
# Basic usage
def get_user_profiles(params: UserQuery) -> list[UserProfile]:
# Query implementation
return [UserProfile(...), UserProfile(...)]
my_api = Api[UserQuery, list[UserProfile]](
"get_user_profiles",
get_user_profiles
)
```
### `IngestPipeline[T]`
Combines ingest API, stream, and table creation in a single component.
```python
from moose_lib import IngestPipeline, IngestPipelineConfig, StreamConfig, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
# Basic usage
pipeline = IngestPipeline[UserEvent]("user_pipeline", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True
))
# With advanced configuration
pipeline = IngestPipeline[UserEvent]("user_pipeline", IngestPipelineConfig(
ingest_api=True,
stream=StreamConfig(parallelism=3),
table=OlapConfig(
order_by_fields=["id", "timestamp"],
engine=ReplacingMergeTreeEngine(),
)
))
```
### `MaterializedView[T]`
Creates a materialized view in ClickHouse.
```python
# Basic usage
view = MaterializedView[UserStatistics](MaterializedViewOptions(
select_statement="SELECT user_id, COUNT(*) as event_count FROM user_events GROUP BY user_id",
table_name="user_events",
materialized_view_name="user_statistics",
order_by_fields=["user_id"]
))
```
## ClickHouse Utilities
### Engine Configuration Classes
Type-safe configuration classes for table engines:
```python
from moose_lib.blocks import (
MergeTreeEngine,
ReplacingMergeTreeEngine,
AggregatingMergeTreeEngine,
SummingMergeTreeEngine,
ReplicatedMergeTreeEngine,
ReplicatedReplacingMergeTreeEngine,
ReplicatedAggregatingMergeTreeEngine,
ReplicatedSummingMergeTreeEngine,
S3QueueEngine
)
# ReplacingMergeTree with version control and soft deletes
dedup_engine = ReplacingMergeTreeEngine(
ver="updated_at", # Optional: version column for keeping latest
is_deleted="deleted" # Optional: soft delete marker (requires ver)
)
# ReplicatedMergeTree with explicit keeper paths (self-managed ClickHouse)
replicated_engine = ReplicatedMergeTreeEngine(
keeper_path="/clickhouse/tables/{database}/{shard}/my_table",
replica_name="{replica}"
)
# ReplicatedReplacingMergeTree with deduplication
replicated_dedup_engine = ReplicatedReplacingMergeTreeEngine(
keeper_path="/clickhouse/tables/{database}/{shard}/my_dedup_table",
replica_name="{replica}",
ver="updated_at",
is_deleted="deleted"
)
# For ClickHouse Cloud or Boreal - omit keeper parameters
cloud_replicated = ReplicatedMergeTreeEngine() # No parameters needed
# S3Queue configuration for streaming from S3
s3_engine = S3QueueEngine(
s3_path="s3://bucket/data/*.json",
format="JSONEachRow",
aws_access_key_id="AKIA...", # Optional
aws_secret_access_key="secret...", # Optional
compression="gzip", # Optional
headers={"X-Custom": "value"} # Optional
)
# Use with OlapTable
s3_table = OlapTable[MyData]("s3_events", OlapConfig(
engine=s3_engine,
settings={
"mode": "unordered",
"keeper_path": "/clickhouse/s3queue/events",
"loading_retries": "3"
}
))
```
## Task Management
### `Task[T, U]`
A class that represents a single task within a workflow system, with typed input and output.
```python
from moose_lib import Task, TaskConfig, TaskContext
from pydantic import BaseModel
# Define input and output models
class InputData(BaseModel):
user_id: str
class OutputData(BaseModel):
result: str
status: bool
# Task with input and output
def process_user(ctx: TaskContext[InputData]) -> OutputData:
# Process the user data
return OutputData(result=f"Processed {ctx.input.user_id}", status=True)
user_task = Task[InputData, OutputData](
name="process_user",
config=TaskConfig(
run=process_user,
retries=3,
timeout="30s"
)
)
# Task with no input, but with output
def fetch_data(ctx: TaskContext[None]) -> OutputData:
return OutputData(result="Fetched data", status=True)
fetch_task = Task[None, OutputData](
name="fetch_data",
config=TaskConfig(run=fetch_data)
)
# Task with input but no output
def log_event(ctx: TaskContext[InputData]) -> None:
print(f"Event logged for: {ctx.input.user_id}")
log_task = Task[InputData, None](
name="log_event",
config=TaskConfig(run=log_event)
)
# Task with neither input nor output
def cleanup(ctx: TaskContext[None]) -> None:
print("Cleanup complete")
cleanup_task = Task[None, None](
name="cleanup",
config=TaskConfig(run=cleanup)
)
```
### `TaskConfig[T, U]`
Configuration for a Task.
```python
@dataclasses.dataclass
class TaskConfig(Generic[T, U]):
# The handler function that executes the task logic
# Can be any of: () -> None, () -> U, (T) -> None, or (T) -> U depending on input/output types
run: TaskRunFunc[T, U]
# Optional list of tasks to run after this task completes
on_complete: Optional[list[Task[U, Any]]] = None
# Optional function that is called when the task is cancelled
on_cancel: Optional[Callable[[TaskContext[T_none]], Union[None, Awaitable[None]]]] = None
# Optional timeout string (e.g. "5m", "1h", "never")
timeout: Optional[str] = None
# Optional number of retry attempts
retries: Optional[int] = None
```
### `Workflow`
Represents a workflow composed of one or more tasks.
```python
from moose_lib import Workflow, WorkflowConfig
# Create a workflow that starts with the fetch_task
data_workflow = Workflow(
name="data_processing",
config=WorkflowConfig(
starting_task=fetch_task,
schedule="@every 1h", # Run every hour
timeout="10m", # Timeout after 10 minutes
retries=2 # Retry up to 2 times if it fails
)
)
```
### `WorkflowConfig`
Configuration for a workflow.
```python
@dataclasses.dataclass
class WorkflowConfig:
# The first task to execute in the workflow
starting_task: Task[Any, Any]
# Optional number of retry attempts for the entire workflow
retries: Optional[int] = None
# Optional timeout string for the entire workflow
timeout: Optional[str] = None
# Optional cron-like schedule string for recurring execution
schedule: Optional[str] = None
```
---
## TypeScript Moose Lib Reference
Source: moose/reference/ts-moose-lib.mdx
TypeScript Moose Lib Reference
# API Reference
This is a comprehensive reference for `moose_lib` , detailing all exported components, types, and utilities.
## Core Types
### `Key`
A type for marking fields as primary keys in data models.
```ts
// Example
interface MyModel {
id: Key; // Marks 'id' as a primary key of type string
}
```
### `JWT`
A type for working with JSON Web Tokens.
```ts
// Example
type UserJWT = JWT<{ userId: string, role: string }>;
```
### `ApiUtil`
Interface providing utilities for analytics APIs.
```ts
interface ApiUtil {
client: MooseClient; // Client for interacting with the database
sql: typeof sql; // SQL template tag function
jwt: JWTPayload | undefined; // Current JWT if available
}
```
## Infrastructure Components
### `OlapTable`
Creates a ClickHouse table with the schema of type T.
```ts
// Basic usage with MergeTree (default)
);
// With sorting configuration (expression)
);
// Disable sorting entirely
);
// For deduplication, explicitly set the ReplacingMergeTree engine
);
```
### `BaseOlapConfig`
Base configuration interface for `OlapTable` with common table configuration options.
```ts
interface BaseOlapConfig {
// Optional database name (defaults to moose.config.toml clickhouse_config.db_name)
database?: string;
// Optional array of field names to order by
orderByFields?: (keyof T & string)[];
// Optional SQL expression for ORDER BY clause (alternative to orderByFields)
orderByExpression?: string;
// Optional table engine (defaults to MergeTree)
engine?: ClickHouseEngines;
// Optional settings for table configuration
settings?: { [key: string]: string };
// Optional lifecycle mode (defaults to MOOSE_MANAGED)
lifeCycle?: LifeCycle;
// Additional engine-specific fields (ver, isDeleted, keeperPath, etc.)
// depend on the engine type
}
```
Example with database override:
```ts
// Table in custom database
);
// Default database (from moose.config.toml)
);
```
### `Stream`
Creates a Redpanda topic with the schema of type T.
```ts
// Basic usage
);
// Adding transformations
myConfiguredStream.addTransform(
destinationStream,
(record) => transformFunction(record)
);
```
### `IngestApi`
Creates an HTTP endpoint for ingesting data of type T.
```ts
// Basic usage with destination stream
);
```
### `Api`
Creates an HTTP endpoint for querying data with request type T and response type R.
```ts
// Basic usage
) => {
const result = await client.query.execute(
sql`SELECT * FROM user_profiles WHERE age > ${params.minAge} LIMIT 10`
);
return result;
}
);
```
### `IngestPipeline`
Combines ingest API, stream, and table creation in a single component.
```ts
// Basic usage
);
// With advanced configuration
);
```
### `MaterializedView`
Creates a materialized view in ClickHouse.
```ts
// Basic usage
);
```
## SQL Utilities
### `sql` Template Tag
Template tag for creating type-safe SQL queries with parameters.
```ts
// Basic usage
const query = sql`SELECT * FROM users WHERE id = ${userId}`;
// With multiple parameters
const query = sql`
SELECT * FROM users
WHERE age > ${minAge}
AND country = ${country}
LIMIT ${limit}
`;
```
### `MooseClient`
Client for interacting with ClickHouse and Temporal.
```ts
class MooseClient {
query: QueryClient; // For database queries
workflow: WorkflowClient; // For workflow operations
}
```
## ClickHouse Utilities
### Table Engine Configurations
#### `ClickHouseEngines` Enum
Available table engines:
```ts
enum ClickHouseEngines {
MergeTree = "MergeTree",
ReplacingMergeTree = "ReplacingMergeTree",
AggregatingMergeTree = "AggregatingMergeTree",
SummingMergeTree = "SummingMergeTree",
ReplicatedMergeTree = "ReplicatedMergeTree",
ReplicatedReplacingMergeTree = "ReplicatedReplacingMergeTree",
ReplicatedAggregatingMergeTree = "ReplicatedAggregatingMergeTree",
ReplicatedSummingMergeTree = "ReplicatedSummingMergeTree",
S3Queue = "S3Queue"
}
```
#### `ReplacingMergeTreeConfig`
Configuration for ReplacingMergeTree tables:
```ts
type ReplacingMergeTreeConfig = {
engine: ClickHouseEngines.ReplacingMergeTree;
orderByFields?: (keyof T & string)[];
ver?: keyof T & string; // Optional: version column for keeping latest
isDeleted?: keyof T & string; // Optional: soft delete marker (requires ver)
settings?: { [key: string]: string };
}
```
#### Replicated Engine Configurations
Configuration for replicated table engines:
```ts
// ReplicatedMergeTree
type ReplicatedMergeTreeConfig = {
engine: ClickHouseEngines.ReplicatedMergeTree;
keeperPath?: string; // Optional: ZooKeeper/Keeper path (omit for Cloud)
replicaName?: string; // Optional: replica name (omit for Cloud)
orderByFields?: (keyof T & string)[];
settings?: { [key: string]: string };
}
// ReplicatedReplacingMergeTree
type ReplicatedReplacingMergeTreeConfig = {
engine: ClickHouseEngines.ReplicatedReplacingMergeTree;
keeperPath?: string; // Optional: ZooKeeper/Keeper path (omit for Cloud)
replicaName?: string; // Optional: replica name (omit for Cloud)
ver?: keyof T & string; // Optional: version column
isDeleted?: keyof T & string; // Optional: soft delete marker
orderByFields?: (keyof T & string)[];
settings?: { [key: string]: string };
}
// ReplicatedAggregatingMergeTree
type ReplicatedAggregatingMergeTreeConfig = {
engine: ClickHouseEngines.ReplicatedAggregatingMergeTree;
keeperPath?: string; // Optional: ZooKeeper/Keeper path (omit for Cloud)
replicaName?: string; // Optional: replica name (omit for Cloud)
orderByFields?: (keyof T & string)[];
settings?: { [key: string]: string };
}
// ReplicatedSummingMergeTree
type ReplicatedSummingMergeTreeConfig = {
engine: ClickHouseEngines.ReplicatedSummingMergeTree;
keeperPath?: string; // Optional: ZooKeeper/Keeper path (omit for Cloud)
replicaName?: string; // Optional: replica name (omit for Cloud)
columns?: string[]; // Optional: columns to sum
orderByFields?: (keyof T & string)[];
settings?: { [key: string]: string };
}
```
**Note**: The `keeperPath` and `replicaName` parameters are optional. When omitted, Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments (default path: `/clickhouse/tables/{uuid}/{shard}` with replica `{replica}`). You can still provide both parameters explicitly if you need custom replication paths.
### `S3QueueTableSettings`
Type-safe interface for S3Queue-specific table settings (ClickHouse 24.7+).
```ts
interface S3QueueTableSettings {
mode?: "ordered" | "unordered"; // Processing mode
after_processing?: "keep" | "delete"; // File handling after processing
keeper_path?: string; // ZooKeeper path for coordination
loading_retries?: string; // Number of retry attempts
processing_threads_num?: string; // Parallel processing threads
// ... and many more settings
}
```
### S3Queue Configuration
Configure S3Queue tables for streaming data from S3 buckets (ORDER BY is not supported):
```ts
);
```
## Task Management
### `Task`
A class that represents a single task within a workflow system.
```ts
// No input, no output
);
// With input and output
);
```
### `TaskContext`
A context object that includes input & state passed between the task's run/cancel functions.
```ts
export type TaskContext = T extends null ? { state: any; input?: null } : { state: any; input: T };
```
### `TaskConfig`
Configuration options for tasks.
```ts
interface TaskConfig {
// The main function that executes the task logic
run: (context: TaskContext) => Promise;
// Optional array of tasks to execute after this task completes
onComplete?: (Task | Task)[];
// Optional function that is called when the task is cancelled.
onCancel?: (context: TaskContext) => Promise;
// Optional timeout duration (e.g., "30s", "5m", "never")
timeout?: string;
// Optional number of retry attempts
retries?: number;
}
```
### `Workflow`
A class that represents a complete workflow composed of interconnected tasks.
```ts
const myWorkflow = new Workflow("getData", {
startingTask: callAPI,
schedule: "@every 5s", // Run every 5 seconds
timeout: "1h",
retries: 3
});
```
### `WorkflowConfig`
Configuration options for defining a workflow.
```ts
interface WorkflowConfig {
// The initial task that begins the workflow execution
startingTask: Task | Task | Task | Task;
// Optional number of retry attempts
retries?: number;
// Optional timeout duration (e.g., "10m", "1h", "never")
timeout?: string;
// Optional cron-style schedule string
schedule?: string;
}
```
---
**Important:** The following components must be exported from your `app/index.ts` file for Moose to detect them:
- `OlapTable` instances
- `Stream` instances
- `IngestApi` instances
- `Api` instances
- `IngestPipeline` instances
- `MaterializedView` instances
- `Task` instances
- `Workflow` instances
**Configuration objects and utilities** (like `DeadLetterQueue`, `Key`, `sql`) do not need to be exported as they are used as dependencies of the main components.
---
## Moose Streaming
Source: moose/streaming.mdx
Build real-time data pipelines with Redpanda/Kafka streams, transformations, and event processing
# Moose Streaming
## Overview
The Streaming module provides standalone real-time data processing with Kafka/Redpanda topics. You can use this capability independently to build event-driven architectures, data transformations, and real-time pipelines without requiring other MooseStack components.
## Basic Usage
```py filename="Stream.py" copy
from moose_lib import Stream
from pydantic import BaseModel
from datetime import datetime
class ExampleEvent(BaseModel):
id: str
user_id: str
timestamp: datetime
event_type: str
# Create a standalone stream for user events
example_stream = Stream[ExampleEvent]("streaming-topic-name")
# Add consumers for real-time processing
def process_event(event: ExampleEvent):
print(f"Processing event: {event}")
# Custom processing logic here
example_stream.add_consumer(process_event)
# No export needed - Python modules are automatically discovered
```
### Enabling Streaming
To enable streaming, you need to ensure that the `streaming_engine` feature flag is set to `true` in your `moose.config.toml` file:
```toml
[features]
streaming_engine = true
```
## Core Capabilities
## Integration with Other Capabilities
The Streaming capability can be used independently, or in conjunction with other MooseStack modules:
---
## streaming/connect-cdc
Source: moose/streaming/connect-cdc.mdx
# Connect to CDC Services
Coming Soon!
---
## Streaming Consumer Functions
Source: moose/streaming/consumer-functions.mdx
Read and process data from streams with consumers and processors
# Streaming Consumer Functions
## Overview
Consuming data from streams allows you to read and process data from Kafka/Redpanda topics. This is essential for building real-time applications, analytics, and event-driven architectures.
## Basic Usage
Consumers are just functions that are called when new data is available in a stream. You add them to a stream like this:
```py filename="StreamConsumer.py" copy
from moose_lib import Stream
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: str
user_id: str
timestamp: datetime
event_type: str
user_events_stream = Stream[UserEvent]("user-events")
# Add a consumer to process events
def process_event(event: UserEvent):
print(f"Processing event: {event.id}")
print(f"User: {event.user_id}, Type: {event.event_type}")
# Your processing logic here
# e.g., update analytics, send notifications, etc.
user_events_stream.add_consumer(process_event)
# Add multiple consumers for different purposes
def analytics_consumer(event: UserEvent):
# Analytics processing
if event.event_type == 'purchase':
update_purchase_analytics(event)
def notification_consumer(event: UserEvent):
# Notification processing
if event.event_type == 'signup':
send_welcome_email(event.user_id)
user_events_stream.add_consumer(analytics_consumer)
user_events_stream.add_consumer(notification_consumer)
```
## Processing Patterns
### Stateful Processing with MooseCache
Maintain state across event processing using MooseCache for distributed state management:
```py filename="StatefulProcessing.py" copy
from datetime import datetime
from typing import Dict, Any
from moose_lib import MooseCache
from pydantic import BaseModel
# State container for accumulating data
class AccumulatorState(BaseModel):
id: str
counter: int
sum: float
last_modified: datetime
attributes: Dict[str, Any]
# Input message structure
class InputMessage(BaseModel):
id: str
group_id: str
numeric_value: float
message_type: str
timestamp: datetime
payload: Dict[str, Any]
message_stream = Stream[InputMessage]("input-stream")
# Initialize distributed cache
cache = MooseCache()
def process_message(message: InputMessage):
cache_key = f"state:{message.group_id}"
# Load existing state or create new one
state = cache.get(cache_key, AccumulatorState)
if not state:
# Initialize new state
state = AccumulatorState(
id=message.group_id,
counter=0,
sum=0.0,
last_modified=datetime.now(),
attributes={}
)
# Apply message to state
state.counter += 1
state.sum += message.numeric_value
state.last_modified = message.timestamp
state.attributes.update(message.payload)
# Determine cache lifetime based on message type
ttl_seconds = 60 if message.message_type == 'complete' else 3600
if message.message_type == 'complete' or should_finalize(state):
# Finalize and remove state
finalize_state(state)
cache.delete(cache_key)
else:
# Persist updated state
cache.set(cache_key, state, ttl_seconds=ttl_seconds)
def should_finalize(state: AccumulatorState) -> bool:
"""Condition for automatic state finalization"""
threshold = 100
time_limit_seconds = 30 * 60 # 30 minutes
elapsed = (datetime.now() - state.last_modified).total_seconds()
return state.counter >= threshold or elapsed > time_limit_seconds
def finalize_state(state: AccumulatorState):
print(f"Finalizing state {state.id}: counter={state.counter}, sum={state.sum}")
message_stream.add_consumer(process_message)
```
## Propagating Events to External Systems
You can use consumer functions to trigger actions across external systems - send notifications, sync databases, update caches, or integrate with any other service when events occur:
### HTTP API Calls
Send processed data to external APIs:
```py filename="HttpIntegration.py" copy
from datetime import datetime
from typing import Dict, Any
from pydantic import BaseModel
class WebhookPayload(BaseModel):
id: str
data: Dict[str, Any]
timestamp: datetime
webhook_stream = Stream[WebhookPayload]("webhook-events")
async def send_to_external_api(payload: WebhookPayload):
try:
async with httpx.AsyncClient() as client:
response = await client.post(
'https://external-api.com/webhook',
headers={
'Content-Type': 'application/json',
'Authorization': f'Bearer {os.getenv("API_TOKEN")}'
},
json={
'event_id': payload.id,
'event_data': payload.data,
'processed_at': datetime.now().isoformat()
}
)
if response.status_code != 200:
raise Exception(f"HTTP {response.status_code}: {response.text}")
print(f"Successfully sent event {payload.id} to external API")
except Exception as error:
print(f"Failed to send event {payload.id}: {error}")
# Could implement retry logic or dead letter queue here
webhook_stream.add_consumer(send_to_external_api)
```
#### Database Operations
Write processed data to external databases:
```py filename="DatabaseIntegration.py" copy
from datetime import datetime
from typing import Dict, Any
from pydantic import BaseModel
class DatabaseRecord(BaseModel):
id: str
category: str
value: float
metadata: Dict[str, Any]
timestamp: datetime
db_stream = Stream[DatabaseRecord]("database-events")
async def insert_to_database(record: DatabaseRecord):
try:
# Connect to PostgreSQL database
conn = await asyncpg.connect(
host=os.getenv('DB_HOST'),
user=os.getenv('DB_USER'),
password=os.getenv('DB_PASSWORD'),
database=os.getenv('DB_NAME')
)
# Insert record into external database
await conn.execute(
'''
INSERT INTO processed_events (id, category, value, metadata, created_at)
VALUES ($1, $2, $3, $4, $5)
''',
record.id,
record.category,
record.value,
json.dumps(record.metadata),
record.timestamp
)
print(f"Inserted record {record.id} into database")
except Exception as error:
print(f"Database insert failed for record {record.id}: {error}")
finally:
if 'conn' in locals():
await conn.close()
db_stream.add_consumer(insert_to_database)
```
#### File System Operations
Write processed data to files or cloud storage:
```py filename="FileSystemIntegration.py" copy
from datetime import datetime
from typing import Literal
from pydantic import BaseModel
class FileOutput(BaseModel):
id: str
filename: str
content: str
directory: str
format: Literal['json', 'csv', 'txt']
file_stream = Stream[FileOutput]("file-events")
async def write_to_file(output: FileOutput):
try:
# Ensure directory exists
os.makedirs(output.directory, exist_ok=True)
# Format content based on type
if output.format == 'json':
formatted_content = json.dumps(json.loads(output.content), indent=2)
else:
formatted_content = output.content
# Write file with timestamp
timestamp = datetime.now().isoformat().replace(':', '-').replace('.', '-')
filename = f"{output.filename}_{timestamp}.{output.format}"
filepath = os.path.join(output.directory, filename)
async with aiofiles.open(filepath, 'w', encoding='utf-8') as f:
await f.write(formatted_content)
print(f"Written file: {filepath}")
except Exception as error:
print(f"Failed to write file for output {output.id}: {error}")
file_stream.add_consumer(write_to_file)
```
#### Email and Notifications
Send alerts and notifications based on processed events:
```py filename="NotificationIntegration.py" copy
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from typing import Dict, Any, Literal
from pydantic import BaseModel
class NotificationEvent(BaseModel):
id: str
type: Literal['email', 'slack', 'webhook']
recipient: str
subject: str
message: str
priority: Literal['low', 'medium', 'high']
metadata: Dict[str, Any]
notification_stream = Stream[NotificationEvent]("notifications")
async def send_notification(notification: NotificationEvent):
try:
if notification.type == 'email':
# Send email
msg = MIMEMultipart()
msg['From'] = os.getenv('SMTP_FROM')
msg['To'] = notification.recipient
msg['Subject'] = notification.subject
body = f"""
{notification.message}
Priority: {notification.priority}
"""
msg.attach(MIMEText(body, 'plain'))
server = smtplib.SMTP(os.getenv('SMTP_HOST'), int(os.getenv('SMTP_PORT', '587')))
server.starttls()
server.login(os.getenv('SMTP_USER'), os.getenv('SMTP_PASS'))
server.send_message(msg)
server.quit()
elif notification.type == 'slack':
# Send to Slack
async with httpx.AsyncClient() as client:
await client.post(
f"https://hooks.slack.com/services/{os.getenv('SLACK_WEBHOOK')}",
json={
'text': notification.message,
'channel': notification.recipient,
'username': 'Moose Alert',
'icon_emoji': ':warning:' if notification.priority == 'high' else ':information_source:'
}
)
elif notification.type == 'webhook':
# Send to webhook
async with httpx.AsyncClient() as client:
await client.post(
notification.recipient,
json={
'id': notification.id,
'subject': notification.subject,
'message': notification.message,
'priority': notification.priority,
'metadata': notification.metadata
}
)
print(f"Sent {notification.type} notification {notification.id}")
except Exception as error:
print(f"Failed to send notification {notification.id}: {error}")
notification_stream.add_consumer(send_notification)
```
---
## Create Streams
Source: moose/streaming/create-stream.mdx
Define and create Kafka/Redpanda topics with type-safe schemas
# Creating Streams
## Overview
Streams serve as the transport layer between your data sources and database tables. Built on Kafka/Redpanda topics, they provide a way to implement real-time pipelines for ingesting and processing incoming data.
## Creating Streams
You can create streams in two ways:
- High-level: Using the `IngestPipeline` class (recommended)
- Low-level: Manually configuring the `Stream` component
### Streams for Ingestion
The `IngestPipeline` class provides a convenient way to set up streams with ingestion APIs and tables. This is the recommended way to create streams for ingestion:
```py filename="IngestionStream.py" copy {10}
from moose_lib import IngestPipeline, IngestPipelineConfig, Key
from pydantic import BaseModel
class RawData(BaseModel):
id: Key[str]
value: int
raw_ingestion_stream = IngestPipeline[RawData]("raw_data", IngestPipelineConfig(
ingest_api = True, # Creates an ingestion API endpoint at `POST /ingest/raw_data`
stream = True, # Buffers data between the ingestion API and the database table
table = True, # Creates an OLAP table named `raw_data`
))
```
While the `IngestPipeline` provides a convenient way to set up streams with ingestion APIs and tables, you can also configure these components individually for more granular control:
```py filename="StreamObject.py" copy {8-12}
from moose_lib import Stream, StreamConfig, Key, IngestApi, IngestConfig
from pydantic import BaseModel
class RawData(BaseModel):
id: Key[str]
value: int
raw_table = OlapTable[RawData]("raw_data")
raw_stream = Stream[RawData]("raw_data", StreamConfig(
destination: raw_table # Optional: Specify a destination table for the stream, sets up a process to sync data from the stream to the table
))
raw_ingest_api = IngestApi[RawData]("raw_data", IngestConfig(
destination: raw_stream # Configure Moose to write all validated data to the stream
))
```
### Streams for Transformations
If the raw data needs to be transformed before landing in the database, you can define a transform destination stream and a transform function to process the data:
#### Single Stream Transformation
```py filename="TransformDestinationStream.py" copy
# Import required libraries
from moose_lib import IngestPipeline, Key
from pydantic import BaseModel
# Define schema for raw incoming data
class RawData(BaseModel):
id: Key[str] # Primary key
value: int # Value to be transformed
# Define schema for transformed data
class TransformedData(BaseModel):
id: Key[str] # Primary key (preserved from raw data)
transformedValue: int # Transformed value
transformedAt: Date # Timestamp of transformation
# Create pipeline for raw data - only for ingestion and streaming
raw_data = IngestPipeline[RawData]("raw_data", IngestPipelineConfig(
ingest_api = True, # Enable API endpoint
stream = True, # Create stream for buffering
table = False # No table needed for raw data
))
# Create pipeline for transformed data - for storage only
transformed_data = IngestPipeline[TransformedData]("transformed_data", IngestPipelineConfig(
ingest_api = False, # No direct API endpoint
stream = True, # Create stream to receive transformed data
table = True # Store transformed data in table
))
# Define a named transformation function
def transform_function(record: RawData) -> TransformedData:
return TransformedData(
id=record.id,
transformedValue=record.value * 2,
transformedAt=datetime.now()
)
# Connect the streams with the transformation function
raw_data.get_stream().add_transform(
destination=transformed_data.get_stream(), # Use the get_stream() method to get the stream object from the IngestPipeline
transformation=transform_function # Can also define a lambda function
)
```
Use the `get_stream()` method to get the stream object from the IngestPipeline to avoid errors when referencing the stream object.
You can use lambda functions to define transformations:
```py filename="TransformDestinationStream.py" copy
from moose_lib import Key, IngestApi, OlapTable, Stream
from pydantic import BaseModel
class RawData(BaseModel):
id: Key[str]
value: int
class TransformedData(BaseModel):
id: Key[str]
transformedValue: int
transformedAt: Date
# Create pipeline components for raw data - only for ingestion and streaming
raw_stream = Stream[RawData]("raw_data") ## No destination table since we're not storing the raw data
raw_api = IngestApi[RawData]("raw_data", IngestConfig(
destination=raw_stream ## Connect the ingestion API to the raw data stream
))
# Create pipeline components for transformed data - no ingestion API since we're not ingesting the transformed data
transformed_table = OlapTable[TransformedData]("transformed_data") ## Store the transformed data in a table
transformed_stream = Stream[TransformedData]("transformed_data", StreamConfig(destination=transformed_table)) ## Connect the transformed data stream to the destination table
## Example transformation using a lambda function
raw_stream.add_transform(
destination=transformed_stream,
transformation=lambda record: TransformedData(
id=record.id,
transformedValue=record.value * 2,
transformedAt=datetime.now()
)
)
```
#### Chaining Transformations
For more complex transformations, you can chain multiple transformations together. This is a use case where using a standalone Stream for intermediate stages of your pipeline may be useful:
```py filename="ChainedTransformations.py" copy
from moose_lib import IngestPipeline, Key, Stream, IngestPipelineConfig
# Define the schema for raw input data
class RawData(BaseModel):
id: Key[str]
value: int
# Define the schema for intermediate transformed data
class IntermediateData(BaseModel):
id: Key[str]
transformedValue: int
transformedAt: Date
# Define the schema for final transformed data
class FinalData(BaseModel):
id: Key[str]
transformedValue: int
anotherTransformedValue: int
transformedAt: Date
# Create the first pipeline for raw data ingestion
# Only create an API and a stream (no table) since we're ingesting the raw data
raw_data = IngestPipeline[RawData]("raw_data", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=False
))
# Create an intermediate stream to hold data between transformations (no api or table needed)
intermediate_stream = Stream[IntermediateData]("intermediate_stream")
# First transformation: double the value and add timestamp
raw_data.get_stream().add_transform(destination=intermediate_stream, transformation=lambda record: IntermediateData(
id=record.id,
transformedValue=record.value * 2,
transformedAt=datetime.now()
))
# Create the final pipeline that will store the fully transformed data
final_data = IngestPipeline[FinalData]("final_stream", IngestPipelineConfig(
ingest_api=False,
stream=True,
table=True
))
# Second transformation: further transform the intermediate data
intermediate_stream.add_transform(destination=final_data.get_stream(), transformation=lambda record: FinalData(
id=record.id,
transformedValue=record.transformedValue * 2,
anotherTransformedValue=record.transformedValue * 3,
transformedAt=datetime.now()
))
```
## Stream Configurations
### Parallelism and Retention
```py filename="StreamConfig.py" copy
from moose_lib import Stream, StreamConfig
high_throughput_stream = Stream[Data]("high_throughput", StreamConfig(
parallelism=4, # Process 4 records simultaneously
retention_period=86400, # Keep data for 1 day
))
```
### LifeCycle Management
Control how Moose manages your stream resources when your code changes. See the [LifeCycle Management guide](./lifecycle) for detailed information.
```py filename="LifeCycleStreamConfig.py" copy
from moose_lib import Stream, StreamConfig, LifeCycle
# Production stream with external management
prod_stream = Stream[Data]("prod_stream", StreamConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED
))
# Development stream with full management
dev_stream = Stream[Data]("dev_stream", StreamConfig(
life_cycle=LifeCycle.FULLY_MANAGED
))
```
See the [API Reference](/moose/reference/ts-moose-lib#stream) for complete configuration options.
---
## Dead Letter Queues
Source: moose/streaming/dead-letter-queues.mdx
Handle failed stream processing with dead letter queues
# Dead Letter Queues
## Overview
Dead Letter Queues (DLQs) provide a robust error handling mechanism for stream processing in Moose. When streaming functions fail during transformation or consumption, failed messages are automatically routed to a configured dead letter queue for later analysis and recovery.
## Dead Letter Record Structure
When a message fails processing, Moose creates a dead letter record with the following structure:
```py
class DeadLetterModel(BaseModel, Generic[T]):
original_record: Any # The original message that failed
error_message: str # Error description
error_type: str # Error class/type name
failed_at: datetime.datetime # Timestamp when failure occurred
source: Literal["api", "transform", "table"] # Where the failure happened
def as_typed(self) -> T: # Type-safe access to original record
return self._t.model_validate(self.original_record)
```
## Creating Dead Letter Queues
### Basic Setup
```py filename="dead-letter-setup.py" copy
from moose_lib import DeadLetterQueue
from pydantic import BaseModel
# Define your data model
class UserEvent(BaseModel):
user_id: str
action: str
timestamp: float
# Create a dead letter queue for UserEvent failures
user_event_dlq = DeadLetterQueue[UserEvent](name="UserEventDLQ")
```
### Configuring Transformations with Dead Letter Queues
Add a dead letter queue to your Transformation Function configuration, and any errors thrown in the transformation will trigger the event to be routed to the dead letter queue.
```py filename="transform-with-dlq.py" copy
from moose_lib import DeadLetterQueue, TransformConfig
# Create dead letter queue
event_dlq = DeadLetterQueue[RawEvent](name="EventDLQ")
# Define transformation function, including errors to trigger DLQ
def process_event(event: RawEvent) -> ProcessedEvent:
# This transform might fail for invalid data
if not event.user_id or len(event.user_id) == 0:
raise ValueError("Invalid user_id: cannot be empty")
if event.timestamp < 0:
raise ValueError("Invalid timestamp: cannot be negative")
return ProcessedEvent(
user_id=event.user_id,
action=event.action,
processed_at=datetime.now(),
is_valid=True
)
# Add transform with DLQ configuration
raw_events.get_stream().add_transform(
destination=processed_events.get_stream(),
transformation=process_event,
config=TransformConfig(
dead_letter_queue=event_dlq # Configure DLQ for this transform
)
)
```
### Configuring Consumers with Dead Letter Queues
Add a dead letter queue to your Consumer Function configuration, and any errors thrown in the function will trigger the event to be routed to the dead letter queue.
```py filename="consumer-with-dlq.py" copy
from moose_lib import ConsumerConfig
# Define consumer function with errors to trigger DLQ
def process_event_consumer(event: RawEvent) -> None:
# This consumer might fail for certain events
if event.action == "forbidden_action":
raise ValueError("Forbidden action detected")
# Process the event (e.g., send to external API)
print(f"Processing event for user {event.user_id}")
# Add consumer with DLQ configuration
raw_events.get_stream().add_consumer(
consumer=process_event_consumer,
config=ConsumerConfig(
dead_letter_queue=event_dlq # Configure DLQ for this consumer
)
)
```
### Configuring Ingest APIs with Dead Letter Queues
Add a dead letter queue to your Ingest API configuration, and any runtime data validation failures at the API will trigger the event to be routed to the dead letter queue.
```python filename="ValidationExample.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig, IngestConfig
from pydantic import BaseModel
class Properties(BaseModel):
device: Optional[str]
version: Optional[int]
class ExampleModel(BaseModel):
id: str
userId: str
timestamp: datetime
properties: Properties
api = IngestApi[ExampleModel]("your-api-route", IngestConfig(
destination=Stream[ExampleModel]("your-stream-name"),
dead_letter_queue=DeadLetterQueue[ExampleModel]("your-dlq-name")
))
```
## Processing Dead Letter Messages
### Monitoring Dead Letter Queues
```py filename="dlq-monitoring.py" copy
def monitor_dead_letters(dead_letter: DeadLetterModel[RawEvent]) -> None:
print("Dead letter received:")
print(f"Error: {dead_letter.error_message}")
print(f"Error Type: {dead_letter.error_type}")
print(f"Failed At: {dead_letter.failed_at}")
print(f"Source: {dead_letter.source}")
# Access the original typed data
original_event: RawEvent = dead_letter.as_typed()
print(f"Original User ID: {original_event.user_id}")
# Add consumer to monitor dead letter messages
event_dlq.add_consumer(monitor_dead_letters)
```
### Recovery and Retry Logic
```py filename="dlq-recovery.py" copy
from moose_lib import Stream
from typing import Optional
# Create a recovery stream for fixed messages
recovered_events = Stream[ProcessedEvent]("recovered_events", {
"destination": processed_events.get_table() # Send recovered data to main table
})
def recover_event(dead_letter: DeadLetterModel[RawEvent]) -> Optional[ProcessedEvent]:
try:
original_event = dead_letter.as_typed()
# Apply fixes based on error type
if "Invalid user_id" in dead_letter.error_message:
# Skip events with invalid user IDs
return None
if "Invalid timestamp" in dead_letter.error_message:
# Fix negative timestamps
fixed_timestamp = abs(original_event.timestamp)
return ProcessedEvent(
user_id=original_event.user_id,
action=original_event.action,
processed_at=datetime.now(),
is_valid=True
)
return None # Skip other errors
except Exception as error:
print(f"Recovery failed: {error}")
return None
# Add recovery logic to the DLQ
event_dlq.add_transform(
destination=recovered_events,
transformation=recover_event
)
```
## Best Practices
## Common Patterns
### Circuit Breaker Pattern
### Retry with Exponential Backoff
Dead letter queues add overhead to stream processing. Use them judiciously and monitor their impact on throughput. Consider implementing sampling for high-volume streams where occasional message loss is acceptable.
Dead letter queue events can be integrated with monitoring systems like Prometheus, DataDog, or CloudWatch for alerting and dashboards. Consider tracking metrics like DLQ message rate, error types, and recovery success rates.
## Using Dead Letter Queues in Ingestion Pipelines
Dead Letter Queues (DLQs) can be directly integrated with your ingestion pipelines to capture records that fail validation or processing at the API entry point. This ensures that no data is lost, even if it cannot be immediately processed.
```python filename="IngestPipelineWithDLQ.py" copy
from moose_lib import IngestPipeline, DeadLetterQueue, IngestPipelineConfig
from pydantic import BaseModel
class ExampleSchema(BaseModel):
id: str
name: str
value: int
timestamp: datetime
example_dlq = DeadLetterQueue[ExampleSchema](name="exampleDLQ")
pipeline = IngestPipeline[ExampleSchema](
name="example",
config=IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True,
dead_letter_queue=True # Route failed ingestions to DLQ
)
)
```
See the [Ingestion API documentation](/moose/apis/ingest-api#validation) for more details and best practices on configuring DLQs for ingestion.
---
## Publish Data
Source: moose/streaming/from-your-code.mdx
Write data to streams from applications, APIs, or external sources
# Publishing Data to Streams
## Overview
Publishing data to streams allows you to write data from various sources into your Kafka/Redpanda topics. This is the first step in building real-time data pipelines.
## Publishing Methods
### Using REST APIs
The most common way to publish data is through Moose's built-in ingestion APIs. These are configured to automatically sit in front of your streams and publish data to them whenever a request is made to the endpoint:
```py filename="PublishViaAPI.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig
# When you create an IngestPipeline with ingest_api: True, Moose automatically creates an API endpoint
raw_data = IngestPipeline[RawData]("raw_data", IngestPipelineConfig(
ingest_api=True, # Creates POST /ingest/raw_data endpoint
stream=True,
table=True
))
# You can then publish data via HTTP POST requests
response = requests.post('/ingest/raw_data', json={
'id': '123',
'value': 42
})
```
See the [OpenAPI documentation](/stack/open-api) to learn more about how to generate type-safe client SDKs in your language of choice for all of your Moose APIs.
### Direct Stream Publishing
You can publish directly to a stream from your Moose code using the stream's `send` method.
This is useful when emitting events from workflows or other backend logic.
`send` accepts a single record or an array of records.
If your `Stream` is configured with `schemaConfig.kind = "JSON"`,
Moose produces using the Confluent envelope automatically (0x00 + schema id + JSON).
No code changes are needed beyond setting `schemaConfig`. See the [Schema Registry guide](/moose/streaming/schema-registry).
```py filename="DirectPublish.py" copy
from moose_lib import Stream, StreamConfig, Key
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
# Create a stream (optionally pass StreamConfig with destination/table settings)
events = Stream[UserEvent]("user_events", StreamConfig())
# Publish a single record
events.send(UserEvent(
id="evt_1",
user_id="user_123",
timestamp=datetime.now(),
event_type="click"
))
# Publish multiple records
events.send([
UserEvent(id="evt_2", user_id="user_456", timestamp=datetime.now(), event_type="view"),
UserEvent(id="evt_3", user_id="user_789", timestamp=datetime.now(), event_type="signup"),
])
```
Moose builds the Kafka topic name from your stream name,
optional namespace, and optional version (dots become underscores).
For example, a stream named `events` with version `1.2.0` becomes `events_1_2_0`
(or `my_ns.events_1_2_0` when the namespace is `"my_ns"`).
### Using the Kafka/Redpanda Client from External Applications
You can also publish to streams from external applications using Kafka/Redpanda clients:
```py filename="ExternalPublish.py" copy
from kafka import KafkaProducer
from datetime import datetime
producer = KafkaProducer(
bootstrap_servers=['localhost:19092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
# Publish to the stream topic
producer.send('user-events', { # Stream name becomes the topic name
'id': 'event-123',
'user_id': 'user-456',
'timestamp': datetime.now().isoformat(),
'event_type': 'page_view'
})
```
#### Locating Redpanda Connection Details
When running your Moose backend within your local dev environment, you can find the connection details for your Redpanda cluster in the `moose.config.toml` file in the root of your project:
```toml filename="moose.config.toml" copy
[redpanda_config]
broker = "localhost:19092"
message_timeout_ms = 1000
retention_ms = 30000
replication_factor = 1
```
---
## Schema Registry
Source: moose/streaming/schema-registry.mdx
Use Confluent Schema Registry with Moose streams (JSON Schema first)
# Schema Registry Integration
The first supported encoding is JSON Schema. Avro and Protobuf are planned.
## Overview
Moose can publish and consume Kafka/Redpanda messages using Confluent Schema Registry. The first supported encoding is JSON Schema; Avro and Protobuf are planned.
## Configure Schema Registry URL
Set the Schema Registry URL in `moose.config.toml` under `redpanda_config` (aliased as `kafka_config`). You can also override with environment variables.
```toml filename="moose.config.toml" copy
[redpanda_config]
broker = "localhost:19092"
schema_registry_url = "http://localhost:8081"
```
Environment overrides (either key works):
```bash filename="Terminal" copy
export MOOSE_REDPANDA_CONFIG__SCHEMA_REGISTRY_URL=http://localhost:8081
# or
export MOOSE_KAFKA_CONFIG__SCHEMA_REGISTRY_URL=http://localhost:8081
```
## Referencing Schemas
You can attach a Schema Registry reference to any `Stream` via `schemaConfig`. Use one of:
- Subject latest: `{ subjectLatest: string }`
- Subject and version: `{ subject: string, version: number }`
- Schema id: `{ id: number }`
```py filename="sr_stream.py" copy {13-21}
from moose_lib import Stream, StreamConfig
from moose_lib.dmv2.stream import KafkaSchemaConfig, SubjectLatest
from pydantic import BaseModel
class Event(BaseModel):
id: str
value: int
schema_config = KafkaSchemaConfig(
kind="JSON",
reference=SubjectLatest(name="event-value"),
)
events = Stream[Event](
"events",
StreamConfig(schema_config=schema_config),
)
events.send(Event(id="e1", value=42))
```
## Consuming SR JSON in Runners
Moose streaming runners automatically detect the Confluent JSON envelope
when consuming and strip the header before parsing the JSON.
Your transformation code continues to work unchanged.
## Ingestion APIs and SR
When an Ingest API routes to a topic that has a `schemaConfig` of kind JSON,
Moose resolves the schema id and publishes requests using the Schema Registry envelope.
You can also set the reference to a fixed `id` to skip lookups.
## Discover existing topics and schemas
Use the CLI to pull external topics and optionally fetch JSON Schemas from Schema Registry to emit typed models.
```bash filename="Terminal" copy
moose kafka pull \
--schema-registry http://localhost:8081 \
--path app/external-topics \
--include "*" \
--exclude "{__consumer_offsets,_schemas}"
```
This writes external topic declarations under the provided path based on language (default path is inferred).
## Current limitations
- JSON Schema only (Avro/Protobuf planned)
- Ingest API schema declared in code may not match the actual schema in registry.
---
## Sync to Table
Source: moose/streaming/sync-to-table.mdx
Automatically sync stream data to OLAP tables with intelligent batching
# Sync to Table
## Overview
Moose automatically handles batch writes between streams and OLAP tables through a **destination configuration**. When you specify a `destination` OLAP table for a stream, Moose provisions a background synchronization process that batches and writes data from the stream to the table.
### Basic Usage
```py filename="SyncToTable.py" copy {12}
from moose_lib import Stream, OlapTable, Key
class Event(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
events_table = OlapTable[Event]("events")
events_stream = Stream[Event]("events", StreamConfig(
destination=events_table # This configures automatic batching
))
```
## Setting Up Automatic Sync
### Using IngestPipeline (Easiest)
The simplest way to set up automatic syncing is with an `IngestPipeline`, which creates all components and wires them together:
```py filename="AutoSync.py" copy {14-15}
from moose_lib import IngestPipeline, IngestPipelineConfig, Key
from pydantic import BaseModel
from datetime import datetime
class Event(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
# Creates stream, table, API, and automatic sync
events_pipeline = IngestPipeline[Event]("events", IngestPipelineConfig(
ingest_api=True,
stream=True, # Creates stream
table=True # Creates destination table + auto-sync process
))
```
### Standalone Components
For more granular control, you can configure components individually:
```py filename="ManualSync.py" copy
from moose_lib import Stream, OlapTable, IngestApi, StreamConfig, Key
from pydantic import BaseModel
from datetime import datetime
class Event(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
# Create table first
events_table = OlapTable[Event]("events")
# Create stream with destination table (enables auto-sync)
events_stream = Stream[Event]("events", StreamConfig(
destination=events_table # This configures automatic batching
))
# Create API that writes to the stream
events_api = IngestApi[Event]("events", {
"destination": events_stream
})
```
## How Automatic Syncing Works
When you configure a stream with a `destination` table, Moose automatically handles the synchronization by managing a Rust process process in the background.
Moose creates a **Rust background process** that:
1. **Consumes** messages from the stream (Kafka/Redpanda topic)
2. **Batches** records up to 100,000 or flushes every second (whichever comes first)
3. **Executes** optimized ClickHouse `INSERT` statements
4. **Commits** stream offsets after successful writes
5. **Retries** failed batches with exponential backoff
Default batching parameters:
| Parameter | Value | Description |
|-----------|-------|-------------|
| `MAX_BATCH_SIZE` | 100,000 records | Maximum records per batch insert |
| `FLUSH_INTERVAL` | 1 second | Automatic flush regardless of batch size |
Currently, you cannot configure the batching parameters, but we're interested in adding this feature. If you need this capability, let us know on slack!
[ClickHouse inserts need to be batched for optimal performance](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance). Moose automatically handles this optimization internally, ensuring your data is efficiently written to ClickHouse without any configuration required.
## Data Flow Example
Here's how data flows through the automatic sync process:
```py filename="DataFlow.py" copy
# 1. Data sent to ingestion API
requests.post('http://localhost:4000/ingest/events', json={
"id": "evt_123",
"user_id": "user_456",
"timestamp": "2024-01-15T10:30:00Z",
"event_type": "click"
})
# 2. API validates and writes to stream
# 3. Background sync process batches stream data
# 4. Batch automatically written to ClickHouse table when:
# - Batch reaches 100,000 records, OR
# - 1 second has elapsed since last flush
# 5. Data available for queries in events table
# SELECT * FROM events WHERE user_id = 'user_456';
```
## Monitoring and Observability
The sync process provides built-in observability within the Moose runtime:
- **Batch Insert Logs**: Records successful batch insertions with sizes and offsets
- **Error Handling**: Logs transient failures with retry information
- **Metrics**: Tracks throughput, batch sizes, and error rates
- **Offset Tracking**: Maintains Kafka consumer group offsets for reliability
---
## Transformation Functions
Source: moose/streaming/transform-functions.mdx
Process and transform data in-flight between streams
# Transformation Functions
## Overview
Transformations allow you to process and reshape data as it flows between streams. You can filter, enrich, reshape, and combine data in-flight before it reaches its destination.
## Implementing Transformations
### Reshape and Enrich Data
Transform data shape or enrich records:
```py filename="DataTransform.py" copy
from moose_lib import Stream, Key
from pydantic import BaseModel
from datetime import datetime
class EventProperties(BaseModel):
user_id: str
platform: str
app_version: str
ip_address: str
class RawEvent(BaseModel):
id: Key[str]
timestamp: str
data: EventProperties
class EnrichedEventProperties(BaseModel):
platform: str
version: str
country: str
class EnrichedEventMetadata(BaseModel):
originalTimestamp: str
processedAt: datetime
class EnrichedEvent(BaseModel):
eventId: Key[str]
timestamp: datetime
userId: Key[str]
properties: EnrichedEventProperties
metadata: EnrichedEventMetadata
raw_stream = Stream[RawEvent]("raw_events")
enriched_stream = Stream[EnrichedEvent]("enriched_events")
raw_stream.add_transform(destination=enriched_stream, transformation=lambda record: EnrichedEvent(
eventId=record.id,
timestamp=datetime.fromisoformat(record.timestamp),
userId=record.data.user_id,
properties=EnrichedEventProperties(
platform=record.data.platform,
version=record.data.app_version,
country=lookupCountry(record.data.ip_address)
),
metadata=EnrichedEventMetadata(
originalTimestamp=record.timestamp,
processedAt=datetime.now()
)
))
```
### Filtering
Remove or filter records based on conditions:
```py filename="FilterStream.py" copy
from moose_lib import Stream, Key
from pydantic import BaseModel
class MetricRecord(BaseModel):
id: Key[str]
name: str
value: float
timestamp: Date
class ValidMetrics(BaseModel):
id: Key[str]
name: str
value: float
timestamp: Date
input_stream = Stream[MetricRecord]("input_metrics")
valid_metrics = Stream[ValidMetrics]("valid_metrics")
def filter_function(record: MetricRecord) -> ValidMetrics | None:
if record.value > 0 and record.timestamp > getStartOfDay() and not record.name.startswith('debug_'):
return ValidMetrics(
id=record.id,
name=record.name,
value=record.value,
timestamp=record.timestamp
)
return None
input_stream.add_transform(destination=valid_metrics, transformation=filter_function)
```
### Fan Out (1:N)
Send data to multiple downstream processors:
```py filename="FanOut.py" copy
from moose_lib import Stream, Key
from pydantic import BaseModel
# Define data models
class Order(BaseModel):
orderId: Key[str]
userId: Key[str]
amount: float
items: List[str]
class HighPriorityOrder(Order):
priority: str = 'high'
class ArchivedOrder(Order):
archivedAt: Date
# Create source and destination streams
order_stream = Stream[Order]("orders")
analytics_stream = Stream[Order]("order_analytics")
notification_stream = Stream[HighPriorityOrder]("order_notifications")
archive_stream = Stream[ArchivedOrder]("order_archive")
# Send all orders to analytics
def analytics_transform(order: Order) -> Order:
return order
order_stream.add_transform(destination=analytics_stream, transformation=analytics_transform)
# Send large orders to notifications
def high_priority_transform(order: Order) -> HighPriorityOrder | None:
if order.amount > 1000:
return HighPriorityOrder(
orderId=order.orderId,
userId=order.userId,
amount=order.amount,
items=order.items,
priority='high'
)
return None # Skip small orders
order_stream.add_transform(destination=notification_stream, transformation=high_priority_transform)
# Archive all orders with timestamp
def archive_transform(order: Order) -> ArchivedOrder | None:
return ArchivedOrder(
orderId=order.orderId,
userId=order.userId,
amount=order.amount,
items=order.items,
archivedAt=datetime.now()
)
order_stream.add_transform(destination=archive_stream, transformation=archive_transform)
```
### Fan In (N:1)
Combine data from multiple sources:
```py filename="FanIn.py" copy
from moose_lib import Stream, OlapTable, Key, StreamConfig
class UserEvent(BaseModel):
userId: Key[str]
eventType: str
timestamp: Date
source: str
# Create source and destination streams
web_events = Stream[UserEvent]("web_events")
mobile_events = Stream[UserEvent]("mobile_events")
api_events = Stream[UserEvent]("api_events")
# Create a stream and table for the combined events
events_table = OlapTable[UserEvent]("all_events")
all_events = Stream[UserEvent]("all_events", StreamConfig(
destination=events_table
))
# Fan in from web
def web_transform(event: UserEvent) -> UserEvent:
return UserEvent(
userId=event.userId,
eventType=event.eventType,
timestamp=event.timestamp,
source='web'
)
web_events.add_transform(destination=all_events, transformation=web_transform)
# Fan in from mobile
def mobile_transform(event: UserEvent) -> UserEvent:
return UserEvent(
userId=event.userId,
eventType=event.eventType,
timestamp=event.timestamp,
source='mobile'
)
mobile_events.add_transform(destination=all_events, transformation=mobile_transform)
# Fan in from API
def api_transform(event: UserEvent) -> UserEvent:
return UserEvent(
userId=event.userId,
eventType=event.eventType,
timestamp=event.timestamp,
source='api'
)
api_events.add_transform(destination=all_events, transformation=api_transform)
```
### Unnesting
Flatten nested records:
```py filename="Unnest.py" copy
from moose_lib import Stream, Key
class NestedRecord(BaseModel):
id: Key[str]
nested: List[NestedValue]
class FlattenedRecord(BaseModel):
id: Key[str]
value: int
nested_stream = Stream[NestedRecord]("nested_records")
flattened_stream = Stream[FlattenedRecord]("flattened_records")
def unnest_transform(record: NestedRecord) -> List[FlattenedRecord]:
result = []
for nested in record.nested:
result.append(FlattenedRecord(
id=record.id,
value=nested.value
))
return result
nested_stream.add_transform(flattened_stream, unnest_transform)
```
You cannot have multiple transforms between the same source and destination stream. If you need multiple transformation routes, you must either:
- Use conditional logic inside a single streaming function to handle different cases, or
- Implement a fan-out/fan-in pattern, where you route records to different intermediate streams and then merge them back into the destination stream.
## Error Handling with Dead Letter Queues
When stream processing fails, you can configure dead letter queues to capture failed messages for later analysis and recovery. This prevents single message failures from stopping your entire pipeline.
```py filename="DeadLetterQueue.py" copy
from moose_lib import DeadLetterQueue, IngestPipeline, IngestPipelineConfig, TransformConfig, DeadLetterModel
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
user_id: str
action: str
timestamp: float
class ProcessedEvent(BaseModel):
user_id: str
action: str
processed_at: datetime
is_valid: bool
# Create pipelines
raw_events = IngestPipeline[UserEvent]("raw_events", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=False
))
processed_events = IngestPipeline[ProcessedEvent]("processed_events", IngestPipelineConfig(
ingest_api=False,
stream=True,
table=True
))
# Create dead letter queue for failed transformations
event_dlq = DeadLetterQueue[UserEvent](name="EventDLQ")
def process_event(event: UserEvent) -> ProcessedEvent:
# This might fail for invalid data
if not event.user_id or len(event.user_id) == 0:
raise ValueError("Invalid user_id: cannot be empty")
return ProcessedEvent(
user_id=event.user_id,
action=event.action,
processed_at=datetime.now(),
is_valid=True
)
# Add transform with error handling
raw_events.get_stream().add_transform(
destination=processed_events.get_stream(),
transformation=process_event,
config=TransformConfig(
dead_letter_queue=event_dlq # Failed messages go here
)
)
def monitor_dead_letters(dead_letter: DeadLetterModel[UserEvent]) -> None:
print(f"Error: {dead_letter.error_message}")
print(f"Failed at: {dead_letter.failed_at}")
# Access original typed data
original_event: UserEvent = dead_letter.as_typed()
print(f"Original User ID: {original_event.user_id}")
# Monitor dead letter messages
event_dlq.add_consumer(monitor_dead_letters)
```
For comprehensive dead letter queue patterns, recovery strategies, and best practices, see the [Dead Letter Queues guide](./dead-letter-queues).
---
## Moose Workflows
Source: moose/workflows.mdx
Build ETL pipelines, scheduled jobs, and long-running tasks with orchestration
# Moose Workflows
## Overview
The Workflows module provides standalone task orchestration and automation. You can use this capability independently to build ETL pipelines, run scheduled jobs, trigger background tasks, and manage long-running tasks without requiring other MooseStack components like databases or streams.
### Basic Usage
```python filename="DataFlow.py" copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig
from pydantic import BaseModel
class Foo(BaseModel):
name: str
class Bar(BaseModel):
name: str
greeting: str
counter: int
def run_task1(ctx: TaskContext[Foo]) -> Bar:
greeting = f"hello, {ctx.input.name}!"
return Bar(
name=ctx.input.name,
greeting=greeting,
counter=1
)
def run_task2(ctx: TaskContext[Bar]) -> None:
print(f"{ctx.input.greeting} (count: {ctx.input.counter})")
task1 = Task[Foo, Bar](
name="task1",
config=TaskConfig(run=run_task1, on_complete=[task2])
)
task2 = Task[Bar, None](
name="task2",
config=TaskConfig(run=run_task2)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1)
)
# No export needed - Python modules are automatically discovered
```
### Enabling Workflows
To enable workflows, you need to add the `workflows` feature to your `moose.config.toml` file:
```toml filename="moose.config.toml" copy
[features]
workflows = true
```
## Core Capabilities
## Integration with Other Capabilities
While the Workflows capability works independently, it is designed to be used in conjunction with other MooseStack capabilities:
---
## workflows/cancel-workflow
Source: moose/workflows/cancel-workflow.mdx
# Cancel a Running Workflow
To stop a workflow before it has finished running, use the `workflow cancel` command.
```bash filename="Terminal" copy
moose workflow cancel
```
### Implementing Cancelation Callbacks
For workflows that are running and have clean up operations to perform, you can implement a termination callback.
This is especially useful for any long running tasks that have open connections or subscriptions to other services that need to be closed.
You may also use the `state` within the run/cancel context to supplement your business logic.
```python filename="workflows/workflows.py" copy
def run_task1(ctx: TaskContext[Foo]) -> None:
connection.open()
def on_cancel(ctx: TaskContext[Foo]) -> None:
# Clean up any resources
connection.close()
task1 = Task[Foo, None](
name="task1",
config=TaskConfig(run=run_task1, on_cancel=on_cancel)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, retries=3)
)
```
---
## Define Workflows
Source: moose/workflows/define-workflow.mdx
Create workflow definitions with task sequences and data flow
# Define Workflows
## Overview
Workflows automate task sequences with built-in reliability and monitoring. Tasks execute in order, passing data between steps.
Built on Temporal for reliability, retries, and monitoring via GUI dashboard.
## Writing Workflow Tasks
```python filename="app/main.py" copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig
from pydantic import BaseModel
class Foo(BaseModel):
name: str;
def run_task1(ctx: TaskContext[Foo]) -> None:
name = ctx.input.name or "world"
greeting = f"hello, {name}!"
task1 = Task[Foo, None](
name="task1",
config=TaskConfig(run=run_task1)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1)
)
```
Export `Task` and `Workflow` objects. Specify `starting_task` in the `WorkflowConfig`.
## Data Flow Between Tasks
Tasks communicate through their return values. Each task can return an object that is automatically passed as input to the next task in the workflow.
- Only values inside the object are passed to the next task.
- The object must be JSON-serializable.
```python filename="app/main.py" copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig, Logger
from pydantic import BaseModel
class Foo(BaseModel):
name: str
class Bar(BaseModel):
name: str
greeting: str
counter: int
def run_task2(ctx: TaskContext[Bar]) -> None:
logger = Logger(action="run_task2")
logger.info(f"task2 input: {ctx.input.model_dump_json()}")
task2 = Task[Bar, None](
name="task2",
config=TaskConfig(run=run_task2)
)
def run_task1(ctx: TaskContext[Foo]) -> Bar:
name = ctx.input.name or "world"
greeting = f"hello, {name}!"
return Bar(
name=name,
greeting=greeting,
counter=1
)
task1 = Task[Foo, Bar](
name="task1",
config=TaskConfig(
run=run_task1,
on_complete=[task2]
)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1)
)
```
## Debugging Workflows
While the Temporal dashboard is a helpful tool for debugging, you can also leverage the Moose CLI to monitor and debug workflows. This is useful if you want to monitor a workflow without having to leave your terminal.
Use the `moose workflow status` command to monitor a workflow:
```bash filename="Terminal" copy
moose workflow status example
```
This will print high level information about the workflow run:
```txt filename="Terminal"
Workflow Workflow Status: example
Run ID: 446eab6e-663d-4913-93fe-f79d6109391f
Status: WORKFLOW_EXECUTION_STATUS_COMPLETED ✅
Execution Time: 66s
```
If you want more detailed information about the workflow's status, including task level logs and inputs/outputs, you can use the `--verbose` flag:
```bash filename="Terminal" copy
moose workflow status example --verbose
```
```txt filename="Terminal"
Workflow Workflow Status: example
Run ID: 446eab6e-663d-4913-93fe-f79d6109391f
Status: WORKFLOW_EXECUTION_STATUS_COMPLETED ✅
Execution Time: 66s
Request: GetWorkflowExecutionHistoryRequest { namespace: "default", execution: Some(WorkflowExecution { workflow_id: "example", run_id: "446eab6e-663d-4913-93fe-f79d6109391f" }), maximum_page_size: 0, next_page_token: [], wait_new_event: false, history_event_filter_type: Unspecified, skip_archival: false }
Found 17 events
Event History:
• [2025-02-21T14:16:56.234808764+00:00] EVENT_TYPE_WORKFLOW_EXECUTION_STARTED
• [2025-02-21T14:16:56.235132389+00:00] EVENT_TYPE_WORKFLOW_TASK_SCHEDULED
• [2025-02-21T14:16:56.259341847+00:00] EVENT_TYPE_WORKFLOW_TASK_STARTED
• [2025-02-21T14:16:56.329856180+00:00] EVENT_TYPE_WORKFLOW_TASK_COMPLETED
• [2025-02-21T14:16:56.329951889+00:00] EVENT_TYPE_ACTIVITY_TASK_SCHEDULED
Activity: example/task1
• [2025-02-21T14:16:56.333761680+00:00] EVENT_TYPE_ACTIVITY_TASK_STARTED
• [2025-02-21T14:16:56.497156055+00:00] EVENT_TYPE_ACTIVITY_TASK_COMPLETED
Result:
{
"counter": 1,
"greeting": "hello, no name!",
"name": "no name",
}
```
With this more detailed output, you can see the exact sequence of events and the inputs and outputs of each task. This is useful for debugging and understanding the workflow's behavior.
The result of each task is included in the output, allowing you to inspect the data that was passed between task for debugging purposes.
If your workflow fails due to some runtime error, you can use the event history timeline to identify the task that failed.
---
## workflows/retries-and-timeouts
Source: moose/workflows/retries-and-timeouts.mdx
# Error Detection and Handling
Moose provides multiple layers of error protection, both at the workflow and task level:
### Workflow-Level Retries and Timeouts
Moose automatically catches any runtime errors during workflow execution. Errors are logged for debugging, and the orchestrator will retry failed tasks according to the `retries` option.
In your `Workflow`, you can configure the following options to control workflow behavior, including timeouts and retries:
```python filename="app/main.py" {5} copy
from moose_lib import Task, TaskConfig, Workflow, WorkflowConfig
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, retries=1, timeout="10m")
)
```
### Task-Level Errors and Retries
For more granular control over task-level errors and retries, you can configure your individual tasks to have their own retry behavior.
For workflows & tasks that may not have a predefined timeout, you may set `never` as the timeout.
```python filename="app/main.py" {8} copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig
def run_task1(ctx: TaskContext[Foo]) -> None:
pass
task1 = Task[Foo, None](
name="task1",
config=TaskConfig(run=run_task1, retries=1, timeout="5m")
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, retries=2, timeout="10m")
)
```
### Example: Workflow and Task Retry Interplay
When configuring retries, it's important to understand how workflow-level and task-level retries interact. Consider the following scenario:
```python filename="app/main.py" {8,13} copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig
def run_task1(ctx: TaskContext[Foo]) -> None:
pass
task1 = Task[Foo, None](
name="task1",
config=TaskConfig(run=run_task1, retries=2)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, retries=3)
)
```
If the execution of the workflow encounters an error, the retry sequence would proceed as follows:
1. **Workflow Attempt 1**
- **Task Attempt 1**: Task fails
- **Task Attempt 2**: Task fails
- **Task Attempt 3**: Task fails
- Workflow attempt fails after exhausting task retries
2. **Workflow Attempt 2**
- **Task Attempt 1**: Task fails
- **Task Attempt 2**: Task fails
- **Task Attempt 3**: Task fails
- Workflow attempt fails after exhausting task retries
In this example, the workflow will make a total of 2 attempts, and each task within those attempts will retry up to 3 times before the workflow itself retries.
---
## Schedule Workflows
Source: moose/workflows/schedule-workflow.mdx
Set up recurring and scheduled workflow execution
# Schedule Workflows
## Overview
Moose workflows can be configured to run automatically on a schedule using cron expressions or interval-based scheduling. This enables you to automate recurring tasks, data processing jobs, and maintenance operations.
## Scheduling Workflows
Workflows can be configured to run on a schedule using the `schedule` field in `Workflow`. This field is optional and blank by default.
### Cron Expressions
```python filename="app/scheduled_workflow.py" copy
from moose_lib import Task, TaskConfig, Workflow, WorkflowConfig
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, schedule="0 12 * * *") # Runs at 12:00 PM every day
)
```
#### Cron Expression Format
```text
|------------------------------- Minute (0-59)
| |------------------------- Hour (0-23)
| | |------------------- Day of the month (1-31)
| | | |------------- Month (1-12; or JAN to DEC)
| | | | |------- Day of the week (0-6; or SUN to SAT; or 7 for Sunday)
| | | | |
| | | | |
* * * * *
```
#### Common Cron Examples
| Cron Expression | Description |
|-----------------|-------------|
| 0 12 * * * | Runs at 12:00 PM every day |
| 0 0 * * 0 | Runs at 12:00 AM every Sunday |
| 0 8 * * 1-5 | Runs at 8:00 AM on weekdays (Monday to Friday) |
| * * * * * | Runs every minute |
| 0 */6 * * * | Runs every 6 hours |
| 0 9 1 * * | Runs at 9:00 AM on the first day of every month |
| 0 0 1 1 * | Runs at midnight on January 1st every year |
Use an online cron expression visualizer like [crontab.guru](https://crontab.guru/) to help you understand how the cron expression will schedule your workflow.
### Interval Schedules
Interval schedules can be specified as a string `"@every "`. The interval follows standard duration format:
```python filename="app/interval_workflow.py" copy
from moose_lib import Task, TaskConfig, Workflow, WorkflowConfig
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, schedule="@every 1h") # Runs every hour
)
```
#### Interval Examples
| Interval | Description |
|----------|-------------|
| `@every 30s` | Every 30 seconds |
| `@every 5m` | Every 5 minutes |
| `@every 1h` | Every hour |
| `@every 12h` | Every 12 hours |
| `@every 24h` | Every 24 hours |
| `@every 7d` | Every 7 days |
## Practical Scheduling Examples
### Daily Data Processing
```python filename="app/daily_etl.py" copy
from moose_lib import Workflow, WorkflowConfig
daily_data_processing = Workflow(
name="daily-data-processing",
config=WorkflowConfig(
starting_task=extract_data_task,
schedule="0 2 * * *", # Run at 2 AM every day
retries=2,
timeout="2h"
)
)
```
### Weekly Reports
```python filename="app/weekly_reports.py" copy
weekly_reports = Workflow(
name="weekly-reports",
config=WorkflowConfig(
starting_task=generate_report_task,
schedule="0 9 * * 1", # Run at 9 AM every Monday
retries=1,
timeout="1h"
)
)
```
### High-Frequency Monitoring
```python filename="app/monitoring.py" copy
system_monitoring = Workflow(
name="system-monitoring",
config=WorkflowConfig(
starting_task=check_system_health_task,
schedule="@every 5m", # Check every 5 minutes
retries=0, # Don't retry monitoring checks
timeout="30s"
)
)
```
## Monitoring Scheduled Workflows
### Development Environment
If your dev server is running, you should see logs in the terminal when your scheduled workflow is executed:
```bash filename="Terminal" copy
moose dev
```
```txt filename="Terminal"
[2024-01-15 12:00:00] Scheduled workflow 'daily-data-processing' started
[2024-01-15 12:00:01] Task 'extract' completed successfully
[2024-01-15 12:00:15] Task 'transform' completed successfully
[2024-01-15 12:00:30] Task 'load' completed successfully
[2024-01-15 12:00:30] Workflow 'daily-data-processing' completed successfully
```
### Checking Workflow Status
You can check the status of scheduled workflows using the CLI:
```bash filename="Terminal" copy
# List all workflows defined in your project
moose workflow list
# Alternative command to list all workflows
moose ls --type workflows
# View workflow execution history
moose workflow history
# Check specific workflow status
moose workflow status daily-data-processing
# Get detailed execution history
moose workflow status daily-data-processing --verbose
```
### Temporal Dashboard
Access the Temporal dashboard to view scheduled workflow executions:
```bash filename="Terminal" copy
# Open Temporal dashboard (typically at http://localhost:8080)
open http://localhost:8080
```
The dashboard shows:
- Scheduled workflow definitions
- Execution history and timing
- Success/failure rates
- Retry attempts and errors
## Best Practices for Scheduled Workflows
### Timeout and Retry Configuration
Configure appropriate timeouts and retries for scheduled workflows:
```python filename="app/robust_scheduled_workflow.py" copy
def run_main_task() -> None:
# Long-running task logic
pass
main_task = Task[None, None](
name="main",
config=TaskConfig(
run=run_main_task,
retries=3, # Retry individual tasks
timeout="1h" # Task-level timeout
)
)
robust_scheduled_workflow = Workflow(
name="robust-scheduled",
config=WorkflowConfig(
starting_task=main_task,
schedule="0 3 * * *", # Run at 3 AM daily
retries=2, # Retry failed workflows
timeout="4h" # Allow sufficient time
)
)
```
## Troubleshooting Scheduled Workflows
### Common Issues
- **Timezone considerations**: Cron schedules use UTC by default
- **Resource conflicts**: Ensure scheduled workflows don't compete for resources
- **Long-running tasks**: Set appropriate timeouts for lengthy operations
- **Error handling**: Implement proper error handling and logging
---
## Trigger Workflows
Source: moose/workflows/trigger-workflow.mdx
Start workflows from events, APIs, or external triggers
# Trigger Workflows
## Overview
Moose workflows can be triggered programmatically from various sources including APIs, events, external systems, or manual execution. This enables you to build reactive data processing pipelines and on-demand task execution.
## Manual Workflow Execution
The simplest way to trigger a workflow is using the Moose CLI:
```bash filename="Terminal" copy
# Run a workflow manually
moose workflow run example
# Run with input parameters
moose workflow run example --input '{"name": "John", "email": "john@example.com"}'
```
### Passing Input to Workflows
When triggering workflows, you can pass input data that will be passed to the starting task:
```bash filename="Terminal" copy
moose workflow run data-processing --input '{
"sourceUrl": "https://api.example.com/data",
"apiKey": "your-api-key",
"batchSize": 100
}'
```
The input is parsed as JSON and passed to the workflow's starting task.
## API-Triggered Workflows
Trigger workflows directly via an HTTP POST endpoint exposed by the webserver.
- Endpoint: `/workflows/{workflowName}/trigger`
### Request
- Body: optional JSON payload passed to the workflow's starting task.
Example:
```bash filename="Terminal" copy
curl -X POST 'http://localhost:4000/workflows/data-processing/trigger' \
-H 'Content-Type: application/json' \
-d '{
"inputValue": "process-user-data",
"priority": "high"
}'
```
### Authentication
- Local development: no auth required.
- Production: protect the endpoint using an API key. Follow these steps:
1. Generate a token and hashed key (see the Token Generation section in the API Auth docs):
```bash filename="Terminal" copy
moose generate hash-token
# Outputs:
# - ENV API Key (hashed) → for environment/config
# - Bearer Token (plain) → for Authorization header
```
2. Configure the server with the hashed key:
```bash copy
MOOSE_CONSUMPTION_API_KEY=""
```
3. Call the endpoint using the plain Bearer token from step 1:
```bash filename="Terminal" copy
curl -X POST 'https://your-host/workflows/data-processing/trigger' \
-H 'Authorization: Bearer ' \
-H 'Content-Type: application/json' \
-d '{"inputValue":"process-user-data"}'
```
For details, see the API Auth page under “Token Generation” and “API Endpoints”.
### Response
```json filename="Response"
{
"workflowId": "data-processing-",
"runId": "",
}
```
In local development, the response also includes a `dashboardUrl` to Temporal UI:
```json filename="Response (dev)"
{
"workflowId": "data-processing-",
"runId": "",
"dashboardUrl": "http://localhost:8080/namespaces//workflows/data-processing-//history"
}
```
## Terminate a Running Workflow
After triggering a workflow, you can terminate it via an HTTP endpoint.
- Endpoint: `POST /workflows/{workflowId}/terminate`
### Request
- Local development (no auth):
```bash filename="Terminal" copy
curl -X POST 'http://localhost:4000/workflows/data-processing-/terminate'
```
- Production (Bearer token required):
```bash filename="Terminal" copy
curl -X POST 'https://your-host/workflows/data-processing-/terminate' \
-H 'Authorization: Bearer ''
```