# Moose Documentation – Python
## Included Files
1. index.mdx
2. moose/apis.mdx
3. moose/apis/admin-api.mdx
4. moose/apis/analytics-api.mdx
5. moose/apis/auth.mdx
6. moose/apis/ingest-api.mdx
7. moose/apis/openapi-sdk.mdx
8. moose/apis/trigger-api.mdx
9. moose/app-api-frameworks.mdx
10. moose/app-api-frameworks/express.mdx
11. moose/app-api-frameworks/fastapi.mdx
12. moose/app-api-frameworks/typescript-fastify.mdx
13. moose/app-api-frameworks/typescript-koa.mdx
14. moose/app-api-frameworks/typescript-raw-nodejs.mdx
15. moose/changelog.mdx
16. moose/configuration.mdx
17. moose/data-modeling.mdx
18. moose/deploying.mdx
19. moose/deploying/configuring-moose-for-cloud.mdx
20. moose/deploying/deploying-on-an-offline-server.mdx
21. moose/deploying/deploying-on-ecs.mdx
22. moose/deploying/deploying-on-kubernetes.mdx
23. moose/deploying/deploying-with-docker-compose.mdx
24. moose/deploying/monitoring.mdx
25. moose/deploying/packaging-moose-for-deployment.mdx
26. moose/deploying/preparing-clickhouse-redpanda.mdx
27. moose/getting-started/from-clickhouse.mdx
28. moose/getting-started/quickstart.mdx
29. moose/help/minimum-requirements.mdx
30. moose/help/troubleshooting.mdx
31. moose/in-your-stack.mdx
32. moose/index.mdx
33. moose/llm-docs.mdx
34. moose/local-dev.mdx
35. moose/mcp-dev-server.mdx
36. moose/metrics.mdx
37. moose/migrate.mdx
38. moose/migrate/lifecycle.mdx
39. moose/migrate/migration-types.mdx
40. moose/moose-cli.mdx
41. moose/olap.mdx
42. moose/olap/apply-migrations.mdx
43. moose/olap/db-pull.mdx
44. moose/olap/external-tables.mdx
45. moose/olap/indexes.mdx
46. moose/olap/insert-data.mdx
47. moose/olap/model-materialized-view.mdx
48. moose/olap/model-table.mdx
49. moose/olap/model-view.mdx
50. moose/olap/planned-migrations.mdx
51. moose/olap/read-data.mdx
52. moose/olap/schema-change.mdx
53. moose/olap/schema-optimization.mdx
54. moose/olap/schema-versioning.mdx
55. moose/olap/supported-types.mdx
56. moose/olap/ttl.mdx
57. moose/reference/py-moose-lib.mdx
58. moose/reference/ts-moose-lib.mdx
59. moose/streaming.mdx
60. moose/streaming/connect-cdc.mdx
61. moose/streaming/consumer-functions.mdx
62. moose/streaming/create-stream.mdx
63. moose/streaming/dead-letter-queues.mdx
64. moose/streaming/from-your-code.mdx
65. moose/streaming/schema-registry.mdx
66. moose/streaming/sync-to-table.mdx
67. moose/streaming/transform-functions.mdx
68. moose/templates-examples.mdx
69. moose/workflows.mdx
70. moose/workflows/cancel-workflow.mdx
71. moose/workflows/define-workflow.mdx
72. moose/workflows/retries-and-timeouts.mdx
73. moose/workflows/schedule-workflow.mdx
74. moose/workflows/trigger-workflow.mdx
75. release-notes/2025-10-24.mdx
76. release-notes/2025-11-01.mdx
77. release-notes/index.mdx
78. release-notes/upcoming.mdx
79. sloan/data-collection-policy.mdx
80. sloan/demos.mdx
81. sloan/demos/context.mdx
82. sloan/demos/egress.mdx
83. sloan/demos/ingest.mdx
84. sloan/demos/mvs.mdx
85. sloan/getting-started.mdx
86. sloan/getting-started/claude.mdx
87. sloan/getting-started/cursor.mdx
88. sloan/getting-started/other-clients.mdx
89. sloan/getting-started/vs-code.mdx
90. sloan/getting-started/windsurf.mdx
91. sloan/guides.mdx
92. sloan/guides/clickhouse-chat.mdx
93. sloan/guides/clickhouse-proj.mdx
94. sloan/guides/from-template.mdx
95. sloan/index.mdx
96. sloan/reference/cli-reference.mdx
97. sloan/reference/mcp-json-reference.mdx
98. sloan/reference/tool-reference.mdx
99. templates.mdx
100. templates/adsb.mdx
101. templates/brainmoose.mdx
102. templates/github.mdx
103. templates/heartrate.mdx
104. usage-data.mdx
## 514 Labs Documentation
Source: index.mdx
Documentation hub for Moose and Sloan, open source tools for building analytical backends and automated data engineering
# Docs
## Overview & Our Products
This is the home of the documentation for the 514 products:
```bash filename="Installation" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose,sloan
```
### Quick Start Options
## Get Involved in the Community
---
## Moose APIs
Source: moose/apis.mdx
Create type-safe ingestion and analytics APIs for data access and integration
# Moose APIs
## Overview
The APIs module provides standalone HTTP endpoints for data ingestion and analytics. Unlike other modules of the MooseStack, APIs are meant to be paired with other MooseStack modules like OLAP tables and streams.
If you'd prefer to use your own API framework of choice, instead of Moose APIs, see the [Bring Your Own API Framework documentation](/moose/app-api-frameworks) for comprehensive examples and patterns using frameworks such as Express, Koa, Fastify, or FastAPI.
## Core Capabilities
## Basic Examples
### Ingestion API
```py filename="IngestApi.py" copy
from moose_lib import IngestApi
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: str
user_id: str
timestamp: datetime
event_type: str
# Create a standalone ingestion API
user_events_api = IngestApi[UserEvent]("user-events", IngestConfig(destination=event_stream))
# No export needed - Python modules are automatically discovered
```
### Analytics API
```py filename="AnalyticsApi.py" copy
from moose_lib import Api, MooseClient
from pydantic import BaseModel
class Params(BaseModel):
user_id: str
limit: int
class ResultData(BaseModel):
id: str
name: str
email: str
def query_function(client: MooseClient, params: QueryParams) -> list[UserData]:
# Query external service or custom logic using parameter binding
query = "SELECT * FROM user_data WHERE user_id = {user_id} LIMIT {limit}"
return client.query.execute(query, {"user_id": params.user_id, "limit": params.limit})
user_data_api = Api[Params, ResultData]("get-data", query_function)
# No export needed - Python modules are automatically discovered
```
---
## moose/apis/admin-api
Source: moose/apis/admin-api.mdx
# Coming Soon
---
## APIs
Source: moose/apis/analytics-api.mdx
APIs for Moose
# APIs
## Overview
APIs are functions that run on your server and automatically exposed as HTTP `GET` endpoints.
They are designed to read data from your OLAP database. Out of the box, these APIs provide:
- Automatic type validation and type conversion for your query parameters, which are sent in the URL, and response body
- Managed database client connection
- Automatic OpenAPI documentation generation
Common use cases include:
- Powering user-facing analytics, dashboards and other front-end components
- Enabling AI tools to interact with your data
- Building custom APIs for your internal tools
### Enabling APIs
Analytics APIs are enabled by default. To explicitly control this feature in your `moose.config.toml`:
```toml filename="moose.config.toml" copy
[features]
apis = true
```
### Basic Usage
`execute` is the recommended way to execute queries. It provides a thin wrapper around the ClickHouse Python client so that you can safely pass `OlapTable` and `Column` objects to your query without needing to worry about ClickHouse identifiers:
```python filename="ExampleApi.py" copy
from moose_lib import Api, MooseClient
from pydantic import BaseModel
## Import the source pipeline
from app.path.to.SourcePipeline import SourcePipeline
# Define the query parameters
class QueryParams(BaseModel):
filter_field: str
max_results: int
# Define the response body
class ResponseBody(BaseModel):
id: int
name: str
value: float
SourceTable = SourcePipeline.get_table()
# Define the route handler function (parameterized)
def run(client: MooseClient, params: QueryParams) -> list[ResponseBody]:
query = """
SELECT
id,
name,
value
FROM {table}
WHERE category = {category}
LIMIT {limit}
"""
return client.query.execute(query, {"table": SourceTable, "category": params.filter_field, "limit": params.max_results})
# Create the API
example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run)
```
Use `execute_raw` with parameter binding for safe, typed queries:
```python filename="ExampleApi.py" copy
from moose_lib import Api, MooseClient
from pydantic import BaseModel
## Import the source pipeline
from app.path.to.SourcePipeline import SourcePipeline
# Define the query parameters
class QueryParams(BaseModel):
filterField: str
maxResults: int
# Define the response body
class ResponseBody(BaseModel):
id: int
name: str
value: float
SourceTable = SourcePipeline.get_table()
# Define the route handler function (using execute_raw with typed parameters)
def run(client: MooseClient, params: QueryParams) -> list[ResponseBody]:
query = """
SELECT
id,
name,
value
FROM Source
WHERE category = {category:String}
LIMIT {limit:UInt32}
"""
return client.query.execute_raw(query, {"category": params.filterField, "limit": params.maxResults})
# Create the API
example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run)
```
```python filename="SourcePipeline.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig, Key
from pydantic import BaseModel
class SourceSchema(BaseModel):
id: Key[int]
name: str
value: float
SourcePipeline = IngestPipeline[SourceSchema]("Source", IngestPipelineConfig(
ingest_api=False,
stream=True,
table=True,
))
```
The `Api` class takes:
- Route name: The URL path to access your API (e.g., `"example_endpoint"`)
- Handler function: Processes requests with typed parameters and returns the result
The generic type parameters specify:
- `QueryParams`: The structure of accepted URL parameters
- `ResponseBody`: The exact shape of your API's response data
You can name these types anything you want. The first type generates validation for query parameters, while the second defines the response structure for OpenAPI documentation.
## Type Validation
You can also model the query parameters and response body as Pydantic models, which Moose will use to provide automatic type validation and type conversion for your query parameters, which are sent in the URL, and response body.
### Modeling Query Parameters
Define your API's parameters as a Pydantic model:
```python filename="ExampleQueryParams.py" copy
from pydantic import BaseModel
from typing import Optional
class QueryParams(BaseModel):
filterField: str = Field(..., description="The field to filter by")
maxResults: int = Field(..., description="The maximum number of results to return")
optionalParam: Optional[str] = Field(None, description="An optional parameter")
```
Moose automatically handles:
- Runtime validation
- Clear error messages for invalid parameters
- OpenAPI documentation generation
Complex nested objects and arrays are not supported. Analytics APIs are `GET` endpoints designed to be simple and lightweight.
### Adding Advanced Type Validation
Moose uses Pydantic for runtime validation. Use Pydantic's `Field` class for more complex validation:
```python filename="ExampleQueryParams.py" copy
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
filterField: str = Field(pattern=r"^(id|name|email)$", description="The field to filter by") ## Only allow valid column names from the UserTable
maxResults: int = Field(gt=0, description="The maximum number of results to return") ## Positive integer
```
### Common Validation Options
```python filename="ValidationExamples.py" copy
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
# Numeric validations
id: int = Field(..., gt=0)
age: int = Field(..., gt=0, lt=120)
price: float = Field(..., gt=0, lt=1000)
discount: float = Field(..., gt=0, multiple_of=0.5)
# String validations
username: str = Field(..., min_length=3, max_length=20)
email: str = Field(..., format="email")
zipCode: str = Field(..., pattern=r"^[0-9]{5}$")
uuid: str = Field(..., format="uuid")
ipAddress: str = Field(..., format="ipv4")
# Date validations
startDate: str = Field(..., format="date")
# Enum validation
status: str = Field(..., enum=["active", "pending", "inactive"])
# Optional parameters
limit: int = Field(None, gt=0, lt=100)
```
For a full list of validation options, see the [Pydantic documentation](https://docs.pydantic.dev/latest/concepts/types/#customizing-validation-with-fields).
### Setting Default Values
You can set default values for parameters by setting values for each parameter in your Pydantic model:
```python filename="ExampleQueryParams.py" copy {9}
from pydantic import BaseModel
class QueryParams(BaseModel):
filterField: str = "example"
maxResults: int = 10
optionalParam: str | None = "default"
```
## Implementing Route Handler
API route handlers are regular functions, so you can implement whatever arbitrary logic you want inside these functions. Most of the time you will be use APIs to expose your data to your front-end applications or other tools:
### Connecting to the Database
Moose provides a managed `MooseClient` to your function execution context. This client provides access to the database and other Moose resources, and handles connection pooling/lifecycle management for you:
```python filename="ExampleApi.py" copy
from moose_lib import MooseClient
from app.UserTable import UserTable
def run(client: MooseClient, params: QueryParams):
# You can use a formatted string for simple static query
query = """
SELECT COUNT(*) FROM {table}
"""
## You can optionally pass the table object to the query
return client.query.execute(query, {"table": UserTable})
## Create the API
example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run)
```
Use `execute_raw` with parameter binding:
```python filename="ExampleApi.py" copy
from moose_lib import MooseClient
def run(params: QueryParams, client: MooseClient):
# Using execute_raw for safe queries
query = """
SELECT COUNT(*) FROM {table: Identifier}
"""
## Must be the name of the table, not the table object
return client.query.execute_raw(query, {"table": UserTable.name})
## Create the API
example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run)
```
### Constructing Safe SQL Queries
```python filename="SafeQueries.py" copy
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
min_age: int = Field(ge=0, le=150)
status: str = Field(pattern=r"^(active|inactive)$")
limit: int = Field(default=10, ge=1, le=1000)
search_text: str = Field(pattern=r'^[a-zA-Z0-9\s]*$')
def run(client: MooseClient, params: QueryParams):
query = """
SELECT *
FROM users
WHERE age >= {min_age}
AND status = '{status}'
AND name ILIKE '%{search_text}%'
LIMIT {limit}
"""
return client.query.execute(query, {"min_age": params.min_age, "status": params.status, "search_text": params.search_text, "limit": params.limit})
```
```python filename="SafeQueries.py" copy
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
min_age: int = Field(ge=0, le=150)
status: str = Field(pattern=r"^(active|inactive)$")
limit: int = Field(default=10, ge=1, le=1000)
search_text: str = Field(pattern=r'^[a-zA-Z0-9\s]*$')
def run(client: MooseClient, params: QueryParams):
query = """
SELECT *
FROM users
WHERE age >= {minAge:UInt32}
AND status = {status:String}
AND name ILIKE {searchPattern:String}
LIMIT {limit:UInt32}
"""
return client.query.execute_raw(query, {
"minAge": params.min_age,
"status": params.status,
"searchPattern": f"%{params.search_text}%",
"limit": params.limit
})
```
#### Basic Query Parameter Interpolation
#### Table and Column References
```python filename="ValidatedQueries.py" copy
from moose_lib import Api, MooseClient
from pydantic import BaseModel, Field, constr
from typing import Literal, Optional
from enum import Enum
from app.UserTable import UserTable
class QueryParams(BaseModel):
# When using f-strings, we need extremely strict validation
column: str = Field(pattern=r"^(id|name|email)$", description="Uses a regex pattern to only allow valid column names")
search_term: str = Field(
pattern=r'^[\w\s\'-]{1,50}$', # Allows letters, numbers, spaces, hyphens, apostrophes; Does not allow special characters that could be used in SQL injection
strip_whitespace=True,
min_length=1,
max_length=50
)
limit: int = Field(
default=10,
ge=1,
le=100,
description="Number of results to return"
)
def run(client: MooseClient, params: QueryParams):
query = """
SELECT {column}
FROM {table}
WHERE name ILIKE '%{search_term}%'
LIMIT {limit}
"""
return client.query.execute(query, {"column": UserTable.cols[params.column], "table": UserTable, "search_term": params.search_term, "limit": params.limit})
```
```python filename="UserTable.py" copy
from moose_lib import OlapTable, Key
from pydantic import BaseModel
class UserSchema(BaseModel):
id: Key[int]
name: str
email: str
UserTable = OlapTable[UserSchema]("users")
```
### Advanced Query Patterns
#### Dynamic Column & Table Selection
```python filename="DynamicColumns.py" copy
from app.UserTable import UserTable
class QueryParams(BaseModel):
colName: str = Field(pattern=r"^(id|name|email)$", description="Uses a regex pattern to only allow valid column names from the UserTable")
class QueryResult(BaseModel):
id: Optional[int]
name: Optional[str]
email: Optional[str]
def run(client: MooseClient, params: QueryParams):
# Put column and table in the dict for variables
query = "SELECT {column} FROM {table}"
return client.query.execute(query, {"column": UserTable.cols[params.colName], "table": UserTable})
## Create the API
bar = Api[QueryParams, QueryResult](name="bar", query_function=run)
## Call the API
## HTTP Request: GET http://localhost:4000/api/bar?colName=id
## EXECUTED QUERY: SELECT id FROM users
```
```python filename="UserTable.py" copy
from moose_lib import OlapTable, Key
from pydantic import BaseModel
class UserSchema(BaseModel):
id: Key[int]
name: str
email: str
UserTable = OlapTable[UserSchema]("users")
```
#### Conditional `WHERE` Clauses
Build `WHERE` clauses based on provided parameters:
```python filename="ConditionalColumns.py" copy
class FilterParams(BaseModel):
min_age: Optional[int]
status: Optional[str] = Field(pattern=r"^(active|inactive)$")
search_text: Optional[str] = Field(pattern=r"^[a-zA-Z0-9\s]+$", description="Alphanumeric search text without special characters to prevent SQL injection")
class QueryResult(BaseModel):
id: int
name: str
email: str
def build_query(client: MooseClient, params: FilterParams) -> QueryResult:
# Using f-strings with validated parameters
conditions = []
if params.min_age:
conditions.append("age >= {min_age}")
parameters["min_age"] = params.min_age
if params.status:
conditions.append("status = {status}")
parameters["status"] = params.status
if params.search_text:
conditions.append("(name ILIKE {search_text} OR email ILIKE {search_text})")
parameters["search_text"] = params.search_text
where_clause = f" WHERE {' AND '.join(conditions)}" if conditions else ""
query = f"""SELECT * FROM users {where_clause} ORDER BY created_at DESC"""
return client.query.execute(query, parameters)
## Create the API
bar = Api[FilterParams, QueryResult](name="bar", query_function=build_query)
## Call the API
## HTTP Request: GET http://localhost:4000/api/bar?min_age=20&status=active&search_text=John
## EXECUTED QUERY: SELECT * FROM users WHERE age >= 20 AND status = 'active' AND (name ILIKE '%John%' OR email ILIKE '%John%') ORDER BY created_at DESC
```
### Adding Authentication
Moose supports authentication via JSON web tokens (JWTs). When your client makes a request to your Analytics API, Moose will automatically parse the JWT and pass the **authenticated** payload to your handler function as the `jwt` object:
```python filename="Authentication.py" copy
def run(client: MooseClient, params: QueryParams, jwt: dict):
# Use parameter binding with JWT data
query = """SELECT * FROM userReports WHERE user_id = {user_id} LIMIT 5"""
return client.query.execute(query, {"user_id": jwt["userId"]})
```
Moose validates the JWT signature and ensures the JWT is properly formatted. If the JWT authentication fails, Moose will return a `401 Unauthorized error`.
## Understanding Response Codes
Moose automatically provides standard HTTP responses:
| Status Code | Meaning | Response Body |
|-------------|-------------------------|---------------------------------|
| 200 | Success | Your API's result data |
| 400 | Validation error | `{ "error": "Detailed message"}`|
| 401 | Unauthorized | `{ "error": "Unauthorized"}` |
| 500 | Internal server error | `{ "error": "Internal server error"}` |
## Post-Processing Query Results
After executing your database query, you can transform the data before returning it to the client. This allows you to:
```python filename="PostProcessingExample.py" copy
from datetime import datetime
from moose_lib import Api
from pydantic import BaseModel
class QueryParams(BaseModel):
category: str
max_results: int = 10
class ResponseItem(BaseModel):
itemId: int
displayName: str
formattedValue: str
isHighValue: bool
date: str
def run(client: MooseClient, params: QueryParams):
# 1. Fetch raw data using parameter binding
query = """
SELECT id, name, value, timestamp
FROM data_table
WHERE category = {category}
LIMIT {limit}
"""
raw_results = client.query.execute(query, {"category": params.category, "limit": params.max_results})
# 2. Post-process the results
processed_results = []
for row in raw_results:
processed_results.append(ResponseItem(
# Transform field names
itemId=row['id'],
displayName=row['name'].upper(),
# Add derived fields
formattedValue=f"${row['value']:.2f}",
isHighValue=row['value'] > 1000,
# Format dates
date=datetime.fromisoformat(row['timestamp']).date().isoformat()
))
return processed_results
# Create the API
process_data_api = Api[QueryParams, ResponseItem](name="process_data_endpoint", query_function=run)
```
### Best Practices
While post-processing gives you flexibility, remember that database operations are typically more efficient for heavy data manipulation. Reserve post-processing for transformations that are difficult to express in SQL or that involve application-specific logic.
## Client Integration
By default, all API endpoints are automatically integrated with OpenAPI/Swagger documentation. You can integrate your OpenAPI SDK generator of choice to generate client libraries for your APIs.
Please refer to the [OpenAPI](/moose/apis/open-api-sdk) page for more information on how to integrate your APIs with OpenAPI.
---
## API Authentication & Security
Source: moose/apis/auth.mdx
Secure your Moose API endpoints with JWT tokens or API keys
# API Authentication & Security
Moose supports two authentication mechanisms for securing your API endpoints:
- **[API Keys](#api-key-authentication)** - Simple, static authentication for internal applications and getting started
- **[JWT (JSON Web Tokens)](#jwt-authentication)** - Token-based authentication for integration with existing identity providers
Choose the method that fits your use case, or use both together with custom configuration.
## Do you want to use API Keys?
API keys are the simplest way to secure your Moose endpoints. They're ideal for:
- Internal applications and microservices
- Getting started quickly with authentication
- Scenarios where you control both client and server
### How API Keys Work
API keys use PBKDF2 HMAC SHA256 hashing for secure storage. You generate a token pair (plain-text and hashed) using the Moose CLI, store the hashed version in environment variables, and send the plain-text version in your request headers.
### Step 1: Generate API Keys
Generate tokens and hashed keys using the Moose CLI:
```bash
moose generate hash-token
```
**Output:**
- **ENV API Keys**: Hashed key for environment variables (use this in your server configuration)
- **Bearer Token**: Plain-text token for client applications (use this in `Authorization` headers)
Use the **hashed key** for environment variables and `moose.config.toml`. Use the **plain-text token** in your `Authorization: Bearer token` headers.
### Step 2: Configure API Keys with Environment Variables
Set environment variables with the **hashed** API keys you generated:
```bash
# For ingest endpoints
export MOOSE_INGEST_API_KEY='your_pbkdf2_hmac_sha256_hashed_key'
# For analytics endpoints
export MOOSE_CONSUMPTION_API_KEY='your_pbkdf2_hmac_sha256_hashed_key'
# For admin endpoints
export MOOSE_ADMIN_TOKEN='your_plain_text_token'
```
Or set the admin API key in `moose.config.toml`:
```toml filename="moose.config.toml"
[authentication]
admin_api_key = "your_pbkdf2_hmac_sha256_hashed_key"
```
Storing the `admin_api_key` (which is a PBKDF2 HMAC SHA256 hash) in your `moose.config.toml` file is an acceptable practice, even if the file is version-controlled. This is because the actual plain-text Bearer token (the secret) is not stored. The hash is computationally expensive to reverse, ensuring that your secret is not exposed in the codebase.
### Step 3: Make Authenticated Requests
All authenticated requests require the `Authorization` header with the **plain-text token**:
```bash
# Using curl
curl -H "Authorization: Bearer your_plain_text_token_here" \
https://your-moose-instance.com/ingest/YourDataModel
# Using JavaScript
fetch('https://your-moose-instance.com/api/endpoint', {
headers: {
'Authorization': 'Bearer your_plain_text_token_here'
}
})
```
## Do you want to use JWTs?
JWT authentication integrates with existing identity providers and follows standard token-based authentication patterns. Use JWTs when:
- You have an existing identity provider (Auth0, Okta, etc.)
- You need user-specific authentication and authorization
- You want standard OAuth 2.0 / OpenID Connect flows
### How JWT Works
Moose validates JWT tokens using RS256 algorithm with your identity provider's public key. You configure the expected issuer and audience, and Moose handles token verification automatically.
### Step 1: Configure JWT Settings
#### Option A: Configure in `moose.config.toml`
```toml filename=moose.config.toml
[jwt]
# Your JWT public key (PEM-formatted RSA public key)
secret = """
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAy...
-----END PUBLIC KEY-----
"""
# Expected JWT issuer
issuer = "https://my-auth-server.com/"
# Expected JWT audience
audience = "my-moose-app"
```
The `secret` field should contain your JWT **public key** used to verify signatures using RS256 algorithm.
#### Option B: Configure with Environment Variables
You can also set these values as environment variables:
```bash filename=".env" copy
MOOSE_JWT_PUBLIC_KEY=your_jwt_public_key # PEM-formatted RSA public key (overrides `secret` in `moose.config.toml`)
MOOSE_JWT_ISSUER=your_jwt_issuer # Expected JWT issuer (overrides `issuer` in `moose.config.toml`)
MOOSE_JWT_AUDIENCE=your_jwt_audience # Expected JWT audience (overrides `audience` in `moose.config.toml`)
```
### Step 2: Make Authenticated Requests
Send requests with the JWT token in the `Authorization` header:
```bash
# Using curl
curl -H "Authorization: Bearer your_jwt_token_here" \
https://your-moose-instance.com/ingest/YourDataModel
# Using JavaScript
fetch('https://your-moose-instance.com/api/endpoint', {
headers: {
'Authorization': 'Bearer your_jwt_token_here'
}
})
```
## Want to use both? Here's the caveats
You can configure both JWT and API Key authentication simultaneously. When both are configured, Moose's authentication behavior depends on the `enforce_on_all_*` flags.
### Understanding Authentication Priority
#### Default Behavior (No Enforcement)
By default, when both JWT and API Keys are configured, Moose tries JWT validation first, then falls back to API Key validation:
```toml filename="moose.config.toml"
[jwt]
# JWT configuration
secret = "..."
issuer = "https://my-auth-server.com/"
audience = "my-moose-app"
# enforce flags default to false
```
```bash filename=".env"
# API Key configuration
MOOSE_INGEST_API_KEY='your_pbkdf2_hmac_sha256_hashed_key'
MOOSE_CONSUMPTION_API_KEY='your_pbkdf2_hmac_sha256_hashed_key'
```
**For Ingest Endpoints (`/ingest/*`)**:
- Attempts JWT validation first (RS256 signature check)
- Falls back to API Key validation (PBKDF2 HMAC SHA256) if JWT fails
**For Analytics Endpoints (`/api/*`)**:
- Same fallback behavior as ingest endpoints
This allows you to use either authentication method for your clients.
#### Enforcing JWT Only
If you want to **only** accept JWT tokens (no API key fallback), set the enforcement flags:
```toml filename="moose.config.toml"
[jwt]
secret = "..."
issuer = "https://my-auth-server.com/"
audience = "my-moose-app"
# Only accept JWT, no API key fallback
enforce_on_all_ingest_apis = true
enforce_on_all_consumptions_apis = true
```
**Result**: When enforcement is enabled, API Key authentication is disabled even if the environment variables are set. Only valid JWT tokens will be accepted.
### Common Use Cases
#### Use Case 1: Different Auth for Different Endpoints
Configure JWT for user-facing analytics endpoints, API keys for internal ingestion:
```toml filename="moose.config.toml"
[jwt]
secret = "..."
issuer = "https://my-auth-server.com/"
audience = "my-moose-app"
enforce_on_all_consumptions_apis = true # JWT only for /api/*
enforce_on_all_ingest_apis = false # Allow fallback for /ingest/*
```
```bash filename=".env"
MOOSE_INGEST_API_KEY='hashed_key_for_internal_services'
```
#### Use Case 2: Migration from API Keys to JWT
Start with both configured, no enforcement. Gradually migrate clients to JWT. Once complete, enable enforcement:
```toml filename="moose.config.toml"
[jwt]
secret = "..."
issuer = "https://my-auth-server.com/"
audience = "my-moose-app"
# Start with both allowed during migration
enforce_on_all_ingest_apis = false
enforce_on_all_consumptions_apis = false
# Later, enable to complete migration
# enforce_on_all_ingest_apis = true
# enforce_on_all_consumptions_apis = true
```
### Admin Endpoints
Admin endpoints use API key authentication exclusively (configured separately from ingest/analytics endpoints).
**Configuration precedence** (highest to lowest):
1. `--token` CLI parameter (plain-text token)
2. `MOOSE_ADMIN_TOKEN` environment variable (plain-text token)
3. `admin_api_key` in `moose.config.toml` (hashed token)
**Example:**
```bash
# Option 1: CLI parameter
moose remote plan --token your_plain_text_token
# Option 2: Environment variable
export MOOSE_ADMIN_TOKEN='your_plain_text_token'
moose remote plan
# Option 3: Config file
# In moose.config.toml:
# [authentication]
# admin_api_key = "your_pbkdf2_hmac_sha256_hashed_key"
```
## Security Best Practices
- **Never commit plain-text tokens to version control** - Always use hashed keys in configuration files
- **Use environment variables for production** - Keep secrets out of your codebase
- **Generate unique tokens for different environments** - Separate development, staging, and production credentials
- **Rotate tokens regularly** - Especially for long-running production deployments
- **Choose the right method for your use case**:
- Use **API Keys** for internal services and getting started
- Use **JWT** when integrating with identity providers or need user-level auth
- **Store hashed keys safely** - The PBKDF2 HMAC SHA256 hash in `moose.config.toml` is safe to version control, but the plain-text token should only exist in secure environment variables or secret management systems
Never commit plain-text tokens to version control. Use hashed keys in configuration files and environment variables for production.
---
## Ingestion APIs
Source: moose/apis/ingest-api.mdx
Ingestion APIs for Moose
# Ingestion APIs
## Overview
Moose Ingestion APIs are the entry point for getting data into your Moose application. They provide a fast, reliable, and type-safe way to move data from your sources into streams and tables for analytics and processing.
## When to Use Ingestion APIs
Ingestion APIs are most useful when you want to implement a push-based pattern for getting data from your data sources into your streams and tables. Common use cases include:
- Instrumenting external client applications
- Receiving webhooks from third-party services
- Integrating with ETL or data pipeline tools that push data
## Why Use Moose's APIs Over Your Own?
Moose's ingestion APIs are purpose-built for high-throughput data pipelines, offering key advantages over other more general-purpose frameworks:
- **Built-in schema validation:** Ensures only valid data enters your pipeline.
- **Direct connection to streams/tables:** Instantly link HTTP endpoints to Moose data infrastructure to route incoming data to your streams and tables without any glue code.
- **Dead Letter Queue (DLQ) support:** Invalid records are automatically captured for review and recovery.
- **OpenAPI auto-generation:** Instantly generate client SDKs and docs for all endpoints, including example data.
- **Rust-powered performance:** Far higher throughput and lower latency than typical Node.js or Python APIs.
## Validation
Moose validates all incoming data against your Pydantic model. If a record fails validation, Moose can automatically route it to a Dead Letter Queue (DLQ) for later inspection and recovery.
```python filename="ValidationExample.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig, IngestConfig
from pydantic import BaseModel
class Properties(BaseModel):
device: Optional[str]
version: Optional[int]
class ExampleModel(BaseModel):
id: str
userId: str
timestamp: datetime
properties: Properties
api = IngestApi[ExampleModel]("your-api-route", IngestConfig(
destination=Stream[ExampleModel]("your-stream-name"),
dead_letter_queue=DeadLetterQueue[ExampleModel]("your-dlq-name")
))
```
If your IngestPipeline’s schema marks a field as optional but annotates a ClickHouse default, Moose treats:
- API request and Stream message: field is optional (you may omit it)
- ClickHouse table storage: field is required with a DEFAULT clause
Behavior: When the API/stream inserts into ClickHouse and the field is missing, ClickHouse sets it to the configured default value. This keeps request payloads simple while avoiding Nullable columns in storage.
Example:
`Annotated[int, clickhouse_default("18")]` (or equivalent annotation)
Send a valid event - routed to the destination stream
```python filename="ValidEvent.py" copy
requests.post("http://localhost:4000/ingest/your-api-route", json={
"id": "event1",
"userId": "user1",
"timestamp": "2023-05-10T15:30:00Z"
})
# ✅ Accepted and routed to the destination stream
# API returns 200 and { success: true }
```
Send an invalid event (missing required field) - routed to the DLQ
```python filename="InvalidEventMissingField.py" copy
requests.post("http://localhost:4000/ingest/your-api-route", json={
"id": "event1",
})
# ❌ Routed to DLQ, because it's missing a required field
# API returns 400 response
```
Send an invalid event (bad date format) - routed to the DLQ
```python filename="InvalidEventBadDate.py" copy
requests.post("http://localhost:4000/ingest/your-api-route", json={
"id": "event1",
"userId": "user1",
"timestamp": "not-a-date"
})
# ❌ Routed to DLQ, because the timestamp is not a valid date
# API returns 400 response
```
## Creating Ingestion APIs
You can create ingestion APIs in two ways:
- **High-level:** Using the `IngestPipeline` class (recommended for most use cases)
- **Low-level:** Manually configuring the `IngestApi` component for more granular control
### High-level: IngestPipeline (Recommended)
The `IngestPipeline` class provides a convenient way to set up ingestion endpoints, streams, and tables with a single declaration:
```python filename="IngestPipeline.py" copy
from moose_lib import Key, IngestPipeline, IngestPipelineConfig
from pydantic import BaseModel
class ExampleSchema(BaseModel):
id: Key[str]
name: str
value: int
timestamp: datetime
example_pipeline = IngestPipeline[ExampleSchema](
name="example-name",
config=IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True
)
)
```
### Low-level: Standalone IngestApi
For more granular control, you can manually configure the `IngestApi` component:
The types of the destination `Stream` and `Table` must match the type of the `IngestApi`.
## Configuration Reference
Configuration options for both high-level and low-level ingestion APIs are provided below.
```python filename="IngestPipelineConfig.py" copy
class IngestPipelineConfig(BaseModel):
table: bool | OlapConfig = True
stream: bool | StreamConfig = True
ingest_api: bool | IngestConfig = True
dead_letter_queue: bool | StreamConfig = True
version: Optional[str] = None
metadata: Optional[dict] = None
life_cycle: Optional[LifeCycle] = None
```
```python filename="IngestConfig.py" copy
@dataclass
class IngestConfigWithDestination[T: BaseModel]:
destination: Stream[T]
dead_letter_queue: Optional[DeadLetterQueue[T]] = None
version: Optional[str] = None
metadata: Optional[dict] = None
```
---
## OpenAPI SDK Generation
Source: moose/apis/openapi-sdk.mdx
Generate type-safe client SDKs from your Moose APIs
# OpenAPI SDK Generation
Moose automatically generates OpenAPI specifications for all your APIs, enabling you to create type-safe client SDKs in any language. This allows you to integrate your Moose APIs into any application with full type safety and IntelliSense support.
## Overview
While `moose dev` is running, Moose emits an OpenAPI spec at `.moose/openapi.yaml` covering:
- **Ingestion endpoints** with request/response schemas
- **Analytics APIs** with query parameters and response types
Every time you make a change to your Moose APIs, the OpenAPI spec is updated automatically.
## Generating Typed SDKs from OpenAPI
You can use your preferred generator to create a client from that spec. Below are minimal, tool-agnostic examples you can copy into your project scripts.
### Setup
The following example uses `openapi-python-client` to generate the SDK. Follow the setup instructions here: [openapi-python-client on PyPI](https://pypi.org/project/openapi-python-client/).
Add a generation script in your repository:
```bash filename="scripts/generate_python_sdk.sh" copy
#!/usr/bin/env bash
set -euo pipefail
openapi-python-client generate --path .moose/openapi.yaml --output ./generated/python --overwrite
```
Then configure Moose to run it after each dev reload:
```toml filename="moose.config.toml" copy
[http_server_config]
on_reload_complete_script = "bash scripts/generate_python_sdk.sh"
```
This will regenerate the Python client from the live spec on every reload.
### Hooks for automatic SDK generation
The `on_reload_complete_script` hook is available in your `moose.config.toml` file. It runs after each dev server reload when code/infra changes have been fully applied. This allows you to keep your SDKs continuously up to date as you make changes to your Moose APIs.
Notes:
- The script runs in your project root using your `$SHELL` (falls back to `/bin/sh`).
- Paths like `.moose/openapi.yaml` and `./generated/...` are relative to the project root.
- You can combine multiple generators with `&&` (as shown) or split into a shell script if preferred.
These hooks only affect local development (`moose dev`). The reload hook runs after Moose finishes applying your changes, ensuring `.moose/openapi.yaml` is fresh before regeneration.
## Integration
Import from the output path your generator writes to (see your chosen example repo). The Moose side is unchanged: the spec lives at `.moose/openapi.yaml` during `moose dev`.
## Generators
Use any OpenAPI-compatible generator:
### TypeScript projects
- [OpenAPI Generator (typescript-fetch)](https://github.com/OpenAPITools/openapi-generator) — mature, broad options; generates Fetch-based client
- [Kubb](https://github.com/kubb-project/kubb) — generates types + fetch client with simple config
- [Orval](https://orval.dev/) — flexible output (client + schemas), good DX
- [openapi-typescript](https://github.com/openapi-ts/openapi-typescript) — generates types only (pair with your own client)
- [swagger-typescript-api](https://github.com/acacode/swagger-typescript-api) — codegen for TS clients from OpenAPI
- [openapi-typescript-codegen](https://github.com/ferdikoomen/openapi-typescript-codegen) — TS client + models
- [oazapfts](https://github.com/oazapfts/oazapfts) — minimal TS client based on fetch
- [openapi-zod-client](https://github.com/astahmer/openapi-zod-client) — Zod schema-first client generation
### Python projects
- [openapi-python-client](https://pypi.org/project/openapi-python-client/) — modern typed client for OpenAPI 3.0/3.1
- [OpenAPI Generator (python)](https://github.com/OpenAPITools/openapi-generator) — multiple Python generators (python, python-nextgen)
---
## Trigger APIs
Source: moose/apis/trigger-api.mdx
Create APIs that trigger workflows and other processes
# Trigger APIs
## Overview
You can create APIs to initiate workflows, data processing jobs, or other automated processes.
## Basic Usage
```python filename="app/apis/trigger_workflow.py" copy
from moose_lib import MooseClient, Api
from pydantic import BaseModel, Field
from datetime import datetime
class WorkflowParams(BaseModel):
input_value: str
priority: str = Field(default="normal")
class WorkflowResponse(BaseModel):
workflow_id: str
status: str
def run(params: WorkflowParams, client: MooseClient) -> WorkflowResponse:
# Trigger the workflow with input parameters
workflow_execution = client.workflow.execute(
workflow="data-processing",
params={
"input_value": params.input_value,
"priority": params.priority,
"triggered_at": datetime.now().isoformat()
}
)
return WorkflowResponse(
workflow_id=workflow_execution.id,
status="started"
)
api = Api[WorkflowParams, WorkflowResponse]("trigger-workflow", run)
```
## Using the Trigger API
Once deployed, you can trigger workflows via HTTP requests:
```bash filename="Terminal" copy
curl "http://localhost:4000/api/trigger-workflow?inputValue=process-user-data&priority=high"
```
Response:
```json
{
"workflowId": "workflow-12345",
"status": "started"
}
```
---
## Bring Your Own API Framework
Source: moose/app-api-frameworks.mdx
Use Express, Koa, Fastify, or FastAPI with MooseStack
# Bring Your Own API Framework
MooseStack provides flexible approaches for building HTTP APIs to expose your data. You can use Moose's built-in [`Api` class](/moose/apis/analytics-api) for simple GET endpoints, or bring popular web frameworks like [Express](/moose/app-api-frameworks/express), [Koa](/moose/app-api-frameworks/typescript-koa), [Fastify](/moose/app-api-frameworks/typescript-fastify), or [FastAPI](/moose/app-api-frameworks/fastapi) for advanced features and custom middleware.
These framework integrations give you two complementary integration paths:
- **Embed supported frameworks inside your MooseStack project.** Mount Express, Koa, Fastify, or raw Node.js with `WebApp` so MooseStack deploys everything together and injects auth, `client`, and `sql` helpers on every request.
- **Call MooseStack from any framework using the client.** Keep an existing app (Next.js, Rails, etc.) in its current runtime and use the MooseStack TypeScript or Python client to query ClickHouse or trigger workflows.
## Overview
### Native APIs
Moose's built-in [`Api` class](/moose/apis/analytics-api) provides simple GET endpoints with automatic type validation and OpenAPI documentation. This is ideal for straightforward data queries without complex routing or middleware requirements.
**Use Native `Api` class when you need:**
- Simple GET endpoints with query parameters
- Automatic type validation and conversion
- Automatic OpenAPI documentation generation
- Minimal setup and boilerplate
### Embed a supported framework inside a MooseStack project
Use the `WebApp` class to mount your framework in your MooseStack project alongside tables, streams, and workflows.
**Why embed the framework:**
- One deployment pipeline managed by the MooseStack CLI (dev hot reload + production config)
- Access MooseStack utilities (`client`, `sql`, `jwt`) in every handler through `getMooseUtils`
- Optionally share auth, logging, and observability defaults with the rest of your MooseStack modules
- Compose MooseStack APIs and framework routes under the same hostname
### Use the MooseStack client from any framework
If you already have an application server, call MooseStack directly with the client SDKs—no MooseStack deployment required. Works great for Next.js, Rails, Python microservices, or anything else that can make HTTP requests.
Here's the client pattern:
```py filename="lib/moose.py" copy
from moose_lib import MooseClient
from app.tables.user_table import UserTable
client = MooseClient()
def list_users(limit: int = 25):
query = f"""
SELECT
{UserTable.columns.id},
{UserTable.columns.name},
{UserTable.columns.email}
FROM {UserTable}
WHERE {UserTable.columns.status} = 'active'
LIMIT {{limit:UInt32}}
"""
return client.query.execute_raw(query, {"limit": limit})
```
For more examples, see the [Querying Data guide](/moose/olap/read-data).
## Supported Frameworks
For basic GET endpoints that query your OLAP tables, MooseStack provides built-in [Analytics APIs](/moose/apis/analytics-api) using the `Api` class.
`WebApp` currently ships adapters for [Express](/moose/app-api-frameworks/express), [Koa](/moose/app-api-frameworks/typescript-koa), [Fastify](/moose/app-api-frameworks/typescript-fastify), [raw Node.js](/moose/app-api-frameworks/typescript-raw-nodejs), and [FastAPI](/moose/app-api-frameworks/fastapi). Frameworks like Next.js can call MooseStack today through the client pattern above.
### TypeScript
Express
Most popular Node.js web framework
Fastify
Fast and low overhead web framework
Koa
Expressive middleware framework by Express team
Raw Node.js
Use raw HTTP handlers without any framework
### Python
FastAPI
Modern, fast Python web framework
## Key Concepts
### WebApp Class
The `WebApp` class is the bridge between your web framework and MooseStack. It handles:
- Mounting your framework at a custom URL path
- Injecting MooseStack utilities (database clients, SQL helpers, JWT)
- Validating JWT tokens when authentication is configured
- Managing the request/response lifecycle
### Accessing Data
All frameworks can access your OLAP database through MooseStack's injected utilities:
```python
from moose_lib.dmv2.web_app_helpers import get_moose_utils
moose = get_moose_utils(request)
client = moose.client
jwt = moose.jwt
```
### Mount Paths
Each WebApp must specify a unique mount path where it will be accessible. Mount paths have specific rules:
- Cannot be `/` (root path)
- Cannot end with a trailing slash
- Cannot start with reserved paths: `/admin`, `/api`, `/consumption`, `/health`, `/ingest`, `/mcp`, `/moose`, `/ready`, `/workflows`
**Valid:** `/myapi`, `/v1/analytics`, `/custom/endpoint`
**Invalid:** `/`, `/myapi/`, `/api/myendpoint`
MooseStack's native `Api` class provides built-in validation and OpenAPI documentation. See the [Analytics API documentation](/moose/apis/analytics-api) for a guided walkthrough.
---
## Express with MooseStack
Source: moose/app-api-frameworks/express.mdx
Use Express framework with MooseStack
# Express with MooseStack
Mount Express applications within your MooseStack project using the `WebApp` class. Express is the most popular Node.js web framework with a rich ecosystem of middleware.
- Already run Express elsewhere? Keep it outside your MooseStack project and query data with the MooseStack client. The [Querying Data guide](/moose/olap/read-data) shows how to use the SDK.
- Want to mount Express in your MooseStack project? Follow the steps below with `WebApp` for unified deployment and access to MooseStack utilities.
## Basic Example
```ts filename="app/apis/expressApp.ts" copy
const app = express();
app.use(express.json());
app.use(expressMiddleware()); // Required for Express
app.get("/health", (req, res) => {
res.json({ status: "ok" });
});
app.get("/data", async (req, res) => {
const moose = getMooseUtils(req);
if (!moose) {
return res.status(500).json({ error: "Moose utilities not available" });
}
const { client, sql } = moose;
const limit = parseInt(req.query.limit as string) || 10;
try {
const query = sql`
SELECT
${MyTable.columns.id},
${MyTable.columns.name},
${MyTable.columns.createdAt}
FROM ${MyTable}
ORDER BY ${MyTable.columns.createdAt} DESC
LIMIT ${limit}
`;
const result = await client.query.execute(query);
const data = await result.json();
res.json(data);
} catch (error) {
res.status(500).json({ error: String(error) });
}
});
);
```
**Access your API:**
- `GET http://localhost:4000/express/health`
- `GET http://localhost:4000/express/data?limit=20`
Express applications must use `expressMiddleware()` to access Moose utilities:
```ts
app.use(expressMiddleware());
```
This middleware injects MooseStack utilities into the request object.
## Complete Example with Features
```ts filename="app/apis/advancedExpressApp.ts" copy
const app = express();
// Middleware setup
app.use(express.json());
app.use(expressMiddleware()); // Required!
// Custom logging middleware
app.use((req: Request, res: Response, next: NextFunction) => {
console.log(`${req.method} ${req.path}`);
next();
});
// Error handling middleware
const asyncHandler = (fn: Function) => (req: Request, res: Response, next: NextFunction) => {
Promise.resolve(fn(req, res, next)).catch(next);
};
// Health check endpoint
app.get("/health", (req, res) => {
res.json({
status: "ok",
timestamp: new Date().toISOString()
});
});
// GET endpoint with query parameters
app.get("/users/:userId/events", asyncHandler(async (req: Request, res: Response) => {
const moose = getMooseUtils(req);
if (!moose) {
return res.status(500).json({ error: "Moose utilities not available" });
}
const { client, sql } = moose;
const { userId } = req.params;
const limit = parseInt(req.query.limit as string) || 10;
const eventType = req.query.eventType as string;
const cols = UserEvents.columns;
const query = sql`
SELECT
${cols.id},
${cols.event_type},
${cols.timestamp}
FROM ${UserEvents}
WHERE ${cols.user_id} = ${userId}
${eventType ? sql`AND ${cols.event_type} = ${eventType}` : sql``}
ORDER BY ${cols.timestamp} DESC
LIMIT ${limit}
`;
const result = await client.query.execute(query);
const events = await result.json();
res.json({
userId,
count: events.length,
events
});
}));
// POST endpoint
app.post("/users/:userId/profile", asyncHandler(async (req: Request, res: Response) => {
const moose = getMooseUtils(req);
if (!moose) {
return res.status(500).json({ error: "Moose utilities not available" });
}
const { userId } = req.params;
const { name, email } = req.body;
// Validation
if (!name || !email) {
return res.status(400).json({ error: "Name and email are required" });
}
// Handle POST logic here
res.json({
success: true,
userId,
profile: { name, email }
});
}));
// Protected endpoint with JWT
app.get("/protected", asyncHandler(async (req: Request, res: Response) => {
const moose = getMooseUtils(req);
if (!moose?.jwt) {
return res.status(401).json({ error: "Unauthorized" });
}
const userId = moose.jwt.sub;
res.json({
message: "Authenticated",
userId,
claims: moose.jwt
});
}));
// Error handling
app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
console.error(err);
res.status(500).json({
error: "Internal Server Error",
message: err.message
});
});
// Register as WebApp
);
```
## WebApp Configuration
```ts
new WebApp(name, app, config)
```
**Parameters:**
- `name` (string): Unique identifier for your WebApp
- `app`: Your Express application instance
- `config` (WebAppConfig): Configuration object
**WebAppConfig:**
```ts
interface WebAppConfig {
mountPath: string; // Required: URL path (e.g., "/api/v1")
metadata?: { description?: string }; // Optional: Documentation metadata
injectMooseUtils?: boolean; // Optional: Inject utilities (default: true)
}
```
## Accessing Moose Utilities
```ts
app.get("/data", async (req, res) => {
const moose = getMooseUtils(req);
if (!moose) {
return res.status(500).json({ error: "Utilities not available" });
}
const { client, sql, jwt } = moose;
// Use client and sql for database queries
});
```
**Available utilities:**
- `client`: MooseClient for database queries
- `sql`: Template tag for safe SQL queries
- `jwt`: Parsed JWT payload (when authentication is configured)
## Middleware Integration
Express middleware works seamlessly with MooseStack:
```ts
const app = express();
// Security
app.use(helmet());
// Compression
app.use(compression());
// Rate limiting
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100 // limit each IP to 100 requests per windowMs
});
app.use(limiter);
// Body parsing
app.use(express.json());
// Moose utilities (must be after body parsing)
app.use(expressMiddleware());
```
## Router Pattern
Organize routes using Express Router:
```ts filename="app/apis/routers/usersRouter.ts"
);
```
```ts filename="app/apis/mainApp.ts"
const app = express();
app.use(express.json());
app.use(expressMiddleware());
// Mount routers
app.use("/users", usersRouter);
);
```
## Authentication with JWT
```ts
app.get("/protected", async (req, res) => {
const moose = getMooseUtils(req);
if (!moose?.jwt) {
return res.status(401).json({ error: "Unauthorized" });
}
const userId = moose.jwt.sub;
const userRole = moose.jwt.role;
// Check permissions
if (userRole !== "admin") {
return res.status(403).json({ error: "Forbidden" });
}
res.json({ message: "Authenticated", userId });
});
```
See [Authentication documentation](/moose/apis/auth) for JWT configuration.
## Best Practices
1. **Always use expressMiddleware()**: Required for accessing Moose utilities
2. **Check for moose utilities**: Always verify `getMooseUtils(req)` returns a value
3. **Use async error handling**: Wrap async routes with error handler
4. **Organize with routers**: Split large applications into multiple routers
5. **Apply middleware in order**: Body parsing before expressMiddleware
6. **Use TypeScript types**: Import Request, Response types from Express
7. **Handle errors globally**: Use Express error handling middleware
## Troubleshooting
### "Moose utilities not available"
**Solution:** Ensure `expressMiddleware()` is added after body parsing:
```ts
app.use(express.json());
app.use(expressMiddleware()); // Must come after body parsers
```
### TypeScript errors with getMooseUtils
**Solution:** The utilities may be undefined, always check:
```ts
const moose = getMooseUtils(req);
if (!moose) {
return res.status(500).json({ error: "Utilities not available" });
}
// Now moose.client, moose.sql are safely accessible
```
### Mount path conflicts
**Solution:** Ensure your mount path doesn't conflict with reserved paths:
- Avoid: `/api`, `/admin`, `/consumption`, `/health`, `/ingest`, `/mcp`
- Use: `/myapi`, `/v1`, `/custom`
---
## FastAPI with MooseStack
Source: moose/app-api-frameworks/fastapi.mdx
Use FastAPI framework with MooseStack
# FastAPI with MooseStack
Mount FastAPI applications within your MooseStack project using the `WebApp` class. FastAPI is a modern, fast Python framework with automatic API documentation and async support.
- Already operating FastAPI outside MooseStack? Keep it separate and call MooseStack data with the client SDK. The [Querying Data guide](/moose/olap/read-data) includes Python examples.
- Want to mount FastAPI in your MooseStack project? Use the `WebApp` flow below for unified deployment and access to MooseStack utilities.
## Basic Example
```python filename="app/apis/fastapi_app.py" copy
from fastapi import FastAPI, Request, HTTPException
from moose_lib.dmv2 import WebApp, WebAppConfig, WebAppMetadata
from moose_lib.dmv2.web_app_helpers import get_moose_utils
from app.tables.my_table import MyTable
app = FastAPI()
@app.get("/health")
async def health():
return {"status": "ok"}
@app.get("/data")
async def get_data(request: Request, limit: int = 10):
moose = get_moose_utils(request)
if not moose:
raise HTTPException(
status_code=500,
detail="Moose utilities not available"
)
try:
query = f"""
SELECT
{MyTable.columns.id},
{MyTable.columns.name},
{MyTable.columns.created_at}
FROM {MyTable}
ORDER BY {MyTable.columns.created_at} DESC
LIMIT {{limit}}
"""
result = moose.client.query.execute_raw(query, {
"limit": limit
})
return {"success": True, "data": result}
except Exception as error:
raise HTTPException(status_code=500, detail=str(error))
# Register as WebApp
fastapi_app = WebApp(
"fastApiApp",
app,
WebAppConfig(
mount_path="/fastapi",
metadata=WebAppMetadata(description="FastAPI application")
)
)
```
**Access your API:**
- `GET http://localhost:4000/fastapi/health`
- `GET http://localhost:4000/fastapi/data?limit=20`
## Complete Example with Features
```python filename="app/apis/advanced_fastapi_app.py" copy
from fastapi import FastAPI, Request, HTTPException, Depends, BackgroundTasks
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
from moose_lib.dmv2 import WebApp, WebAppConfig, WebAppMetadata
from moose_lib.dmv2.web_app_helpers import get_moose_utils, ApiUtil
from app.tables.user_events import UserEvents
from app.tables.user_profile import UserProfile
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional
app = FastAPI(
title="Advanced API",
description="Advanced FastAPI application with MooseStack",
version="1.0.0"
)
# CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Custom middleware
@app.middleware("http")
async def log_requests(request: Request, call_next):
start_time = datetime.now()
response = await call_next(request)
duration = (datetime.now() - start_time).total_seconds()
print(f"{request.method} {request.url.path} - {duration:.3f}s")
return response
# Request/Response models
class EventQuery(BaseModel):
limit: int = Field(10, ge=1, le=100)
event_type: Optional[str] = None
class EventData(BaseModel):
event_type: str = Field(..., min_length=1)
data: dict
class EventResponse(BaseModel):
id: str
event_type: str
timestamp: datetime
# Health check
@app.get("/health")
async def health():
return {
"status": "ok",
"timestamp": datetime.now().isoformat()
}
# GET with path and query parameters
@app.get("/users/{user_id}/events", response_model=dict)
async def get_user_events(
request: Request,
user_id: str,
limit: int = 10,
event_type: Optional[str] = None
):
moose = get_moose_utils(request)
if not moose:
raise HTTPException(status_code=500, detail="Moose utilities not available")
query = """
SELECT
id,
event_type,
timestamp
FROM {table}
WHERE user_id = {user_id}
{event_filter}
ORDER BY timestamp DESC
LIMIT {limit}
"""
event_filter = "AND event_type = {event_type}" if event_type else ""
params = {
"table": UserEvents,
"user_id": user_id,
"limit": limit
}
if event_type:
params["event_type"] = event_type
try:
result = moose.client.query.execute(
query.format(event_filter=event_filter),
params
)
return {
"user_id": user_id,
"count": len(result),
"events": result
}
except Exception as error:
raise HTTPException(status_code=500, detail=str(error))
# POST with validated body
@app.post("/users/{user_id}/events", status_code=201)
async def create_event(
request: Request,
user_id: str,
body: EventData,
background_tasks: BackgroundTasks
):
moose = get_moose_utils(request)
if not moose:
raise HTTPException(status_code=500, detail="Moose utilities not available")
# Background task
def log_event_creation(user_id: str, event_type: str):
print(f"Event created: {user_id} - {event_type}")
background_tasks.add_task(log_event_creation, user_id, body.event_type)
return {
"success": True,
"user_id": user_id,
"event_type": body.event_type,
"data": body.data
}
# Protected endpoint with dependency injection
async def require_auth(request: Request) -> ApiUtil:
moose = get_moose_utils(request)
if not moose or not moose.jwt:
raise HTTPException(status_code=401, detail="Unauthorized")
return moose
@app.get("/protected")
async def protected(moose: ApiUtil = Depends(require_auth)):
return {
"message": "Authenticated",
"user": moose.jwt.get("sub"),
"claims": moose.jwt
}
# Admin endpoint with role check
async def require_admin(moose: ApiUtil = Depends(require_auth)) -> ApiUtil:
role = moose.jwt.get("role")
if role != "admin":
raise HTTPException(status_code=403, detail="Forbidden")
return moose
@app.get("/admin/stats")
async def admin_stats(moose: ApiUtil = Depends(require_admin)):
return {"message": "Admin access granted"}
# WebSocket support
from fastapi import WebSocket
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
try:
while True:
data = await websocket.receive_text()
await websocket.send_text(f"Echo: {data}")
except Exception:
await websocket.close()
# Global exception handler
@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):
return JSONResponse(
status_code=500,
content={"error": "Internal Server Error", "message": str(exc)}
)
# Register as WebApp
fastapi_app = WebApp(
"advancedFastApi",
app,
WebAppConfig(
mount_path="/api/v1",
metadata=WebAppMetadata(
description="Advanced FastAPI application with middleware and dependencies"
)
)
)
```
## WebApp Configuration
```python
WebApp(name, app, config)
```
**Parameters:**
- `name` (str): Unique identifier for your WebApp
- `app` (FastAPI): Your FastAPI application instance
- `config` (WebAppConfig): Configuration object
**WebAppConfig:**
```python
@dataclass
class WebAppConfig:
mount_path: str # Required: URL path
metadata: Optional[WebAppMetadata] = None # Optional: Documentation
inject_moose_utils: bool = True # Optional: Inject utilities
@dataclass
class WebAppMetadata:
description: Optional[str] = None
```
## Accessing Moose Utilities
### Direct Access
```python
from moose_lib.dmv2.web_app_helpers import get_moose_utils
@app.get("/data")
async def get_data(request: Request):
moose = get_moose_utils(request)
if not moose:
raise HTTPException(status_code=500, detail="Utilities not available")
client = moose.client
jwt = moose.jwt
```
### Dependency Injection
Use FastAPI's dependency injection for cleaner code:
```python
from moose_lib.dmv2.web_app_helpers import get_moose_dependency, ApiUtil
from fastapi import Depends
@app.get("/data")
async def get_data(moose: ApiUtil = Depends(get_moose_dependency())):
# moose is automatically injected and guaranteed to exist
result = moose.client.query.execute(...)
return result
```
## Middleware
FastAPI middleware works seamlessly:
```python
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.middleware.gzip import GZipMiddleware
from starlette.middleware.sessions import SessionMiddleware
app = FastAPI()
# CORS
app.add_middleware(
CORSMiddleware,
allow_origins=["https://example.com"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Compression
app.add_middleware(GZipMiddleware, minimum_size=1000)
# Sessions
app.add_middleware(SessionMiddleware, secret_key="your-secret-key")
# Custom middleware
@app.middleware("http")
async def add_custom_header(request: Request, call_next):
response = await call_next(request)
response.headers["X-Custom-Header"] = "Value"
return response
```
## Dependency Patterns
### Reusable Dependencies
```python
from fastapi import Depends, HTTPException
from typing import Optional
# Auth dependency
async def get_current_user(moose: ApiUtil = Depends(get_moose_dependency())) -> dict:
if not moose.jwt:
raise HTTPException(status_code=401, detail="Not authenticated")
user_id = moose.jwt.get("sub")
# Fetch user from database
return {"id": user_id}
# Admin dependency
async def require_admin_user(user: dict = Depends(get_current_user)) -> dict:
if user.get("role") != "admin":
raise HTTPException(status_code=403, detail="Not authorized")
return user
# Use dependencies
@app.get("/user/profile")
async def get_profile(user: dict = Depends(get_current_user)):
return user
@app.get("/admin/dashboard")
async def admin_dashboard(user: dict = Depends(require_admin_user)):
return {"message": f"Welcome admin {user['id']}"}
```
### Dependency Classes
```python
from fastapi import Depends
from app.tables.user_table import UserTable
class Pagination:
def __init__(self, page: int = 1, size: int = 10):
self.page = page
self.size = size
self.skip = (page - 1) * size
@app.get("/users")
async def list_users(
pagination: Pagination = Depends(),
moose: ApiUtil = Depends(get_moose_dependency())
):
# Using parameterized query for user input values
query = f"""
SELECT
{UserTable.columns.id},
{UserTable.columns.name},
{UserTable.columns.email}
FROM {UserTable}
WHERE {UserTable.columns.status} = 'active'
LIMIT {{size:UInt32}}
OFFSET {{skip:UInt32}}
"""
result = moose.client.query.execute_raw(query, {
"size": pagination.size,
"skip": pagination.skip
})
return result
```
## Request Validation
FastAPI uses Pydantic for powerful validation:
```python
from pydantic import BaseModel, Field, validator
from datetime import datetime
from typing import Optional, Literal
class UserEventCreate(BaseModel):
event_type: Literal["click", "view", "purchase"]
timestamp: datetime = Field(default_factory=datetime.now)
properties: dict = Field(default_factory=dict)
value: Optional[float] = Field(None, ge=0, le=1000000)
@validator('properties')
def validate_properties(cls, v):
if len(v) > 50:
raise ValueError('Too many properties')
return v
@app.post("/events")
async def create_event(
event: UserEventCreate,
moose: ApiUtil = Depends(get_moose_dependency())
):
# event is fully validated
return {"success": True, "event": event}
```
## Background Tasks
```python
from fastapi import BackgroundTasks
def send_notification(user_id: str, message: str):
# Expensive operation
print(f"Sending notification to {user_id}: {message}")
@app.post("/notify/{user_id}")
async def notify_user(
user_id: str,
message: str,
background_tasks: BackgroundTasks
):
# Add task to run after response is sent
background_tasks.add_task(send_notification, user_id, message)
return {"message": "Notification queued"}
```
## Authentication with JWT
```python
# Manual check
@app.get("/protected")
async def protected(request: Request):
moose = get_moose_utils(request)
if not moose or not moose.jwt:
raise HTTPException(status_code=401, detail="Unauthorized")
user_id = moose.jwt.get("sub")
return {"message": "Authenticated", "user_id": user_id}
# Dependency pattern (recommended)
async def require_auth(request: Request) -> ApiUtil:
moose = get_moose_utils(request)
if not moose or not moose.jwt:
raise HTTPException(status_code=401, detail="Unauthorized")
return moose
@app.get("/protected")
async def protected(moose: ApiUtil = Depends(require_auth)):
return {"message": "Authenticated", "user": moose.jwt.get("sub")}
```
See [Authentication documentation](/moose/apis/auth) for JWT configuration.
## Best Practices
1. **Use dependency injection**: Leverage `Depends()` for cleaner code
2. **Validate with Pydantic**: Use `BaseModel` and `Field()` for validation
3. **Use response models**: Specify `response_model` for automatic validation
4. **Handle async properly**: Use `async def` for I/O operations
5. **Add type hints**: FastAPI uses types for validation and documentation
6. **Use background tasks**: For operations that don't need to complete before response
7. **Document with docstrings**: FastAPI includes docstrings in OpenAPI docs
8. **Organize with routers**: Split large applications into APIRouter instances
## Router Organization
```python filename="app/apis/routers/users.py"
from fastapi import APIRouter, Depends
from moose_lib.dmv2.web_app_helpers import get_moose_dependency, ApiUtil
router = APIRouter(prefix="/users", tags=["users"])
@router.get("/{user_id}")
async def get_user(
user_id: str,
moose: ApiUtil = Depends(get_moose_dependency())
):
return {"user_id": user_id}
```
```python filename="app/apis/main_app.py"
from fastapi import FastAPI
from moose_lib.dmv2 import WebApp, WebAppConfig
from .routers import users
app = FastAPI()
# Include routers
app.include_router(users.router)
webapp = WebApp("mainApp", app, WebAppConfig(mount_path="/api"))
```
## Troubleshooting
### "Moose utilities not available"
**Solution:** Verify `inject_moose_utils` is enabled (default):
```python
WebAppConfig(mount_path="/myapi", inject_moose_utils=True)
```
### Routes not responding
**Solution:** Pass the FastAPI app instance, not a router:
```python
# Correct
app = FastAPI()
webapp = WebApp("name", app, config)
# Incorrect
router = APIRouter()
webapp = WebApp("name", router, config) # Won't work
```
### Dependency injection errors
**Solution:** Ensure dependencies return correct types:
```python
# Correct
async def get_moose(request: Request) -> ApiUtil:
moose = get_moose_utils(request)
if not moose:
raise HTTPException(500, "Not available")
return moose # Return ApiUtil
# Usage
@app.get("/")
async def handler(moose: ApiUtil = Depends(get_moose)):
# moose is ApiUtil type
pass
```
---
## Fastify with MooseStack
Source: moose/app-api-frameworks/typescript-fastify.mdx
Use Fastify framework with MooseStack
# Fastify with MooseStack
Mount Fastify applications within your MooseStack project using the `WebApp` class. Fastify is a fast and low overhead web framework with powerful schema-based validation.
- Already running Fastify elsewhere? Keep it outside your MooseStack project and query data with the MooseStack client. The [Querying Data guide](/moose/olap/read-data) walks through the SDK.
- Want to mount Fastify in your MooseStack project? Follow the setup below with `WebApp` for unified deployment and access to MooseStack utilities.
## Basic Example
```ts filename="app/apis/fastifyApp.ts" copy
const app = Fastify({ logger: true });
app.get("/health", async (request, reply) => {
return { status: "ok" };
});
app.get("/data", async (request, reply) => {
const moose = getMooseUtils(request.raw);
if (!moose) {
reply.code(500);
return { error: "Moose utilities not available" };
}
const { client, sql } = moose;
const limit = parseInt((request.query as any).limit || "10");
try {
const query = sql`
SELECT
${MyTable.columns.id},
${MyTable.columns.name},
${MyTable.columns.createdAt}
FROM ${MyTable}
ORDER BY ${MyTable.columns.createdAt} DESC
LIMIT ${limit}
`;
const result = await client.query.execute(query);
return await result.json();
} catch (error) {
reply.code(500);
return { error: String(error) };
}
});
// Must call ready() before passing to WebApp
await app.ready();
);
```
**Access your API:**
- `GET http://localhost:4000/fastify/health`
- `GET http://localhost:4000/fastify/data?limit=20`
Fastify apps must call `.ready()` before passing to WebApp.
## Complete Example with Schema Validation
```ts filename="app/apis/advancedFastifyApp.ts" copy
const app = Fastify({
logger: true,
ajv: {
customOptions: {
removeAdditional: "all",
coerceTypes: true
}
}
});
// Schema definitions
const getUserEventsSchema = {
querystring: {
type: "object",
properties: {
limit: { type: "integer", minimum: 1, maximum: 100, default: 10 },
eventType: { type: "string" }
}
},
params: {
type: "object",
required: ["userId"],
properties: {
userId: { type: "string", pattern: "^[a-zA-Z0-9-]+$" }
}
},
response: {
200: {
type: "object",
properties: {
userId: { type: "string" },
count: { type: "integer" },
events: { type: "array" }
}
}
}
};
// GET with schema validation
app.get<{
Params: { userId: string };
Querystring: { limit?: number; eventType?: string };
}>("/users/:userId/events", {
schema: getUserEventsSchema
}, async (request, reply) => {
const moose = getMooseUtils(request.raw);
if (!moose) {
reply.code(500);
return { error: "Moose utilities not available" };
}
const { client, sql } = moose;
const { userId } = request.params;
const { limit = 10, eventType } = request.query;
const cols = UserEvents.columns;
const query = sql`
SELECT
${cols.id},
${cols.event_type},
${cols.timestamp}
FROM ${UserEvents}
WHERE ${cols.user_id} = ${userId}
${eventType ? sql`AND ${cols.event_type} = ${eventType}` : sql``}
ORDER BY ${cols.timestamp} DESC
LIMIT ${limit}
`;
const result = await client.query.execute(query);
const events = await result.json();
return {
userId,
count: events.length,
events
};
});
// POST with schema validation
const createEventSchema = {
body: {
type: "object",
required: ["eventType", "data"],
properties: {
eventType: { type: "string", minLength: 1 },
data: { type: "object" }
}
}
};
app.post<{
Params: { userId: string };
Body: { eventType: string; data: object };
}>("/users/:userId/events", {
schema: createEventSchema
}, async (request, reply) => {
const { userId } = request.params;
const { eventType, data } = request.body;
// Handle POST logic
return {
success: true,
userId,
eventType,
data
};
});
// Protected route with JWT
app.get("/protected", async (request, reply) => {
const moose = getMooseUtils(request.raw);
if (!moose?.jwt) {
reply.code(401);
return { error: "Unauthorized" };
}
return {
message: "Authenticated",
userId: moose.jwt.sub
};
});
// Error handler
app.setErrorHandler((error, request, reply) => {
request.log.error(error);
reply.code(500).send({
error: "Internal Server Error",
message: error.message
});
});
await app.ready();
);
```
## Accessing Moose Utilities
Use `request.raw` to access the underlying Node.js request:
```ts
const moose = getMooseUtils(request.raw);
if (!moose) {
reply.code(500);
return { error: "Utilities not available" };
}
const { client, sql, jwt } = moose;
```
## Plugins and Decorators
Fastify plugins work seamlessly:
```ts
const app = Fastify({ logger: true });
// CORS
await app.register(cors, {
origin: true
});
// Security headers
await app.register(helmet);
// Rate limiting
await app.register(rateLimit, {
max: 100,
timeWindow: "15 minutes"
});
// Custom decorator
app.decorate("utility", {
formatResponse: (data: any) => ({
success: true,
timestamp: new Date().toISOString(),
data
})
});
app.get("/data", async (request, reply) => {
const moose = getMooseUtils(request.raw);
if (!moose) {
reply.code(500);
return { error: "Utilities not available" };
}
const { client, sql } = moose;
const result = await client.query.execute(sql`
SELECT
${MyTable.columns.id},
${MyTable.columns.name},
${MyTable.columns.status}
FROM ${MyTable}
WHERE ${MyTable.columns.status} = 'active'
LIMIT 10
`);
const data = await result.json();
return app.utility.formatResponse(data);
});
await app.ready();
```
## Type-Safe Routes
Leverage TypeScript for type-safe routes:
```ts
interface UserQueryParams {
limit?: number;
offset?: number;
status?: "active" | "inactive";
}
interface UserResponse {
id: string;
name: string;
email: string;
}
app.get<{
Querystring: UserQueryParams;
Reply: UserResponse[]
}>("/users", async (request, reply) => {
const { limit = 10, offset = 0, status } = request.query;
// TypeScript knows the shape of query params
const moose = getMooseUtils(request.raw);
// ... query logic
// Return type is checked
return [
{ id: "1", name: "John", email: "john@example.com" }
];
});
```
## WebApp Configuration
```ts
new WebApp(name, app, config)
```
**WebAppConfig:**
```ts
interface WebAppConfig {
mountPath: string;
metadata?: { description?: string };
injectMooseUtils?: boolean; // default: true
}
```
## Best Practices
1. **Call .ready() before WebApp**: Always await `app.ready()` before creating WebApp
2. **Use request.raw for utilities**: Access Moose utilities via `getMooseUtils(request.raw)`
3. **Define schemas**: Use Fastify's JSON Schema validation for request/response
4. **Type your routes**: Use TypeScript generics for type-safe route handlers
5. **Leverage plugins**: Use Fastify's rich plugin ecosystem
6. **Handle errors**: Use `setErrorHandler` for global error handling
7. **Enable logging**: Use Fastify's built-in logger for debugging
## Troubleshooting
### "Moose utilities not available"
**Solution:** Use `request.raw` to access the underlying request:
```ts
const moose = getMooseUtils(request.raw); // Not request!
```
### App not responding after mounting
**Solution:** Ensure you called `.ready()`:
```ts
await app.ready(); // Must call before WebApp
interface Query { limit?: number }
```
---
## Koa with MooseStack
Source: moose/app-api-frameworks/typescript-koa.mdx
Use Koa framework with MooseStack
# Koa with MooseStack
Mount Koa applications within your MooseStack project using the `WebApp` class. Koa is an expressive, minimalist framework by the Express team, designed for modern async/await patterns.
- Already running Koa outside MooseStack? Keep it separate and call MooseStack data with the client SDK. The [Querying Data guide](/moose/olap/read-data) has TypeScript examples.
- Want to mount Koa in your MooseStack project? Continue below with `WebApp` for unified deployment and access to MooseStack utilities.
## Basic Example
```ts filename="app/apis/koaApp.ts" copy
const app = new Koa();
const router = new Router();
app.use(bodyParser());
router.get("/health", (ctx) => {
ctx.body = { status: "ok" };
});
router.get("/data", async (ctx) => {
const moose = getMooseUtils(ctx.req);
if (!moose) {
ctx.status = 500;
ctx.body = { error: "Moose utilities not available" };
return;
}
const { client, sql } = moose;
const limit = parseInt((ctx.query.limit as string) || "10");
try {
const query = sql`
SELECT
${MyTable.columns.id},
${MyTable.columns.name},
${MyTable.columns.createdAt}
FROM ${MyTable}
ORDER BY ${MyTable.columns.createdAt} DESC
LIMIT ${limit}
`;
const result = await client.query.execute(query);
ctx.body = await result.json();
} catch (error) {
ctx.status = 500;
ctx.body = { error: String(error) };
}
});
app.use(router.routes());
app.use(router.allowedMethods());
);
```
**Access your API:**
- `GET http://localhost:4000/koa/health`
- `GET http://localhost:4000/koa/data?limit=20`
## Complete Example with Middleware
```ts filename="app/apis/advancedKoaApp.ts" copy
const app = new Koa();
const router = new Router();
// Middleware
app.use(logger());
app.use(bodyParser());
// Custom error handling middleware
app.use(async (ctx: Context, next: Next) => {
try {
await next();
} catch (error) {
ctx.status = error.status || 500;
ctx.body = {
error: error.message || "Internal Server Error"
};
ctx.app.emit("error", error, ctx);
}
});
// Custom logging middleware
app.use(async (ctx: Context, next: Next) => {
const start = Date.now();
await next();
const ms = Date.now() - start;
console.log(`${ctx.method} ${ctx.url} - ${ms}ms`);
});
// Health check
router.get("/health", (ctx) => {
ctx.body = {
status: "ok",
timestamp: new Date().toISOString()
};
});
// GET with params and query
router.get("/users/:userId/events", async (ctx) => {
const moose = getMooseUtils(ctx.req);
if (!moose) {
ctx.throw(500, "Moose utilities not available");
}
const { client, sql } = moose;
const { userId } = ctx.params;
const limit = parseInt((ctx.query.limit as string) || "10");
const eventType = ctx.query.eventType as string;
const cols = UserEvents.columns;
const query = sql`
SELECT
${cols.id},
${cols.event_type},
${cols.timestamp}
FROM ${UserEvents}
WHERE ${cols.user_id} = ${userId}
${eventType ? sql`AND ${cols.event_type} = ${eventType}` : sql``}
ORDER BY ${cols.timestamp} DESC
LIMIT ${limit}
`;
const result = await client.query.execute(query);
const events = await result.json();
ctx.body = {
userId,
count: events.length,
events
};
});
// POST endpoint
router.post("/users/:userId/events", async (ctx) => {
const moose = getMooseUtils(ctx.req);
if (!moose) {
ctx.throw(500, "Moose utilities not available");
}
const { userId } = ctx.params;
const { eventType, data } = ctx.request.body as any;
// Validation
if (!eventType || !data) {
ctx.throw(400, "eventType and data are required");
}
// Handle POST logic
ctx.body = {
success: true,
userId,
eventType,
data
};
ctx.status = 201;
});
// Protected route with JWT
router.get("/protected", async (ctx) => {
const moose = getMooseUtils(ctx.req);
if (!moose?.jwt) {
ctx.throw(401, "Unauthorized");
}
const userId = moose.jwt.sub;
const userRole = moose.jwt.role;
ctx.body = {
message: "Authenticated",
userId,
role: userRole
};
});
// Multiple route handlers (middleware chain)
const checkAuth = async (ctx: Context, next: Next) => {
const moose = getMooseUtils(ctx.req);
if (!moose?.jwt) {
ctx.throw(401, "Unauthorized");
}
await next();
};
router.get("/admin/stats", checkAuth, async (ctx) => {
const moose = getMooseUtils(ctx.req);
// moose.jwt is guaranteed to exist here
ctx.body = { stats: "admin stats" };
});
app.use(router.routes());
app.use(router.allowedMethods());
// Error listener
app.on("error", (err, ctx) => {
console.error("Server error:", err);
});
);
```
## Accessing Moose Utilities
Use `ctx.req` to access the underlying Node.js request:
```ts
const moose = getMooseUtils(ctx.req);
if (!moose) {
ctx.throw(500, "Utilities not available");
}
const { client, sql, jwt } = moose;
```
## Middleware Patterns
### Composition
```ts
const authMiddleware = async (ctx: Context, next: Next) => {
const moose = getMooseUtils(ctx.req);
if (!moose?.jwt) {
ctx.throw(401, "Unauthorized");
}
await next();
};
const adminMiddleware = async (ctx: Context, next: Next) => {
const moose = getMooseUtils(ctx.req);
if (!moose?.jwt || moose.jwt.role !== "admin") {
ctx.throw(403, "Forbidden");
}
await next();
};
// Compose middleware
const requireAdmin = compose([authMiddleware, adminMiddleware]);
router.get("/admin", requireAdmin, async (ctx) => {
ctx.body = { message: "Admin access granted" };
});
```
### Custom Context Extensions
```ts
// Extend context type
interface CustomContext extends Context {
formatResponse: (data: any) => { success: boolean; data: any };
}
const app = new Koa();
// Add custom method to context
app.context.formatResponse = function(data: any) {
return {
success: true,
timestamp: new Date().toISOString(),
data
};
};
router.get("/data", async (ctx: CustomContext) => {
const moose = getMooseUtils(ctx.req);
if (!moose) {
ctx.throw(500, "Utilities not available");
}
const { client, sql } = moose;
const result = await client.query.execute(sql`
SELECT
${MyTable.columns.id},
${MyTable.columns.name},
${MyTable.columns.status}
FROM ${MyTable}
WHERE ${MyTable.columns.status} = 'active'
LIMIT 10
`);
const data = await result.json();
ctx.body = ctx.formatResponse(data);
});
```
## Error Handling
Koa uses try-catch for error handling:
```ts
// Error middleware
app.use(async (ctx, next) => {
try {
await next();
} catch (err) {
// Custom error handling
ctx.status = err.statusCode || err.status || 500;
ctx.body = {
error: {
message: err.message,
status: ctx.status
}
};
// Emit error event
ctx.app.emit("error", err, ctx);
}
});
// Error listener
app.on("error", (err, ctx) => {
console.error("Error:", err);
});
```
## Router Nesting
Organize routes with nested routers:
```ts filename="app/apis/routers/usersRouter.ts"
);
usersRouter.get("/:userId", async (ctx) => {
const moose = getMooseUtils(ctx.req);
if (!moose) {
ctx.throw(500, "Utilities not available");
}
const { userId } = ctx.params;
// Query logic
ctx.body = { userId };
});
```
```ts filename="app/apis/mainApp.ts"
const app = new Koa();
const mainRouter = new Router();
// Nest routers
mainRouter.use("/api", usersRouter.routes(), usersRouter.allowedMethods());
app.use(mainRouter.routes());
app.use(mainRouter.allowedMethods());
);
```
## WebApp Configuration
```ts
new WebApp(name, app, config)
```
**WebAppConfig:**
```ts
interface WebAppConfig {
mountPath: string;
metadata?: { description?: string };
injectMooseUtils?: boolean; // default: true
}
```
## Best Practices
1. **Use ctx.req for utilities**: Access Moose utilities via `getMooseUtils(ctx.req)`
2. **Use ctx.throw()**: Koa's built-in error throwing for cleaner code
3. **Leverage async/await**: Koa is designed for modern async patterns
4. **Compose middleware**: Use `koa-compose` for reusable middleware chains
5. **Handle errors globally**: Use error middleware at the top of middleware stack
6. **Type your context**: Extend Context type for custom properties
7. **Organize with routers**: Split large applications into nested routers
## Troubleshooting
### "Moose utilities not available"
**Solution:** Use `ctx.req` not `ctx.request`:
```ts
const moose = getMooseUtils(ctx.req); // Correct
const moose = getMooseUtils(ctx.request); // Wrong
```
### Middleware order issues
**Solution:** Apply middleware in correct order:
```ts
app.use(logger()); // 1. Logging
app.use(bodyParser()); // 2. Body parsing
app.use(errorHandler); // 3. Error handling
app.use(router.routes()); // 4. Routes
```
### TypeScript errors with Context
**Solution:** Import and use correct types:
```ts
router.get("/path", async (ctx: Context, next: Next) => {
// ctx is properly typed
});
```
---
## Raw Node.js with MooseStack
Source: moose/app-api-frameworks/typescript-raw-nodejs.mdx
Use raw Node.js HTTP handlers with MooseStack
# Raw Node.js with MooseStack
Use raw Node.js HTTP handlers without any framework. This gives you maximum control and minimal dependencies, ideal for performance-critical applications or when you want to avoid framework overhead.
- Running standalone Node.js elsewhere? Keep it separate and query Moose with the client SDK—see the [Querying Data guide](/moose/olap/read-data) for examples.
- Want to mount your raw handlers in your MooseStack project? Follow the `WebApp` approach below so your endpoints deploy alongside the rest of your Moose project.
## Basic Example
```ts filename="app/apis/rawApp.ts" copy
const handler = async (req: IncomingMessage, res: ServerResponse) => {
const url = parseUrl(req.url || "", true);
const pathname = url.pathname || "/";
if (pathname === "/health" && req.method === "GET") {
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({ status: "ok" }));
return;
}
if (pathname === "/data" && req.method === "GET") {
const moose = getMooseUtils(req);
if (!moose) {
res.writeHead(500, { "Content-Type": "application/json" });
res.end(JSON.stringify({ error: "Moose utilities not available" }));
return;
}
const { client, sql } = moose;
const limit = parseInt((url.query.limit as string) || "10");
try {
const query = sql`
SELECT
${MyTable.columns.id},
${MyTable.columns.name},
${MyTable.columns.createdAt}
FROM ${MyTable}
ORDER BY ${MyTable.columns.createdAt} DESC
LIMIT ${limit}
`;
const result = await client.query.execute(query);
const data = await result.json();
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify(data));
} catch (error) {
res.writeHead(500, { "Content-Type": "application/json" });
res.end(JSON.stringify({ error: String(error) }));
}
return;
}
res.writeHead(404, { "Content-Type": "application/json" });
res.end(JSON.stringify({ error: "Not found" }));
};
);
```
**Access your API:**
- `GET http://localhost:4000/raw/health`
- `GET http://localhost:4000/raw/data?limit=20`
## Complete Example with Advanced Features
```ts filename="app/apis/advancedRawApp.ts" copy
// Helper to parse request body
const parseBody = (req: IncomingMessage): Promise => {
return new Promise((resolve, reject) => {
let body = "";
req.on("data", chunk => body += chunk.toString());
req.on("end", () => {
try {
resolve(body ? JSON.parse(body) : {});
} catch (error) {
reject(new Error("Invalid JSON"));
}
});
req.on("error", reject);
});
};
// Helper to send JSON response
const sendJSON = (res: ServerResponse, status: number, data: any) => {
res.writeHead(status, {
"Content-Type": "application/json",
"X-Powered-By": "MooseStack"
});
res.end(JSON.stringify(data));
};
// Helper to send error
const sendError = (res: ServerResponse, status: number, message: string) => {
sendJSON(res, status, { error: message });
};
// Route handlers
const handleHealth = (req: IncomingMessage, res: ServerResponse) => {
sendJSON(res, 200, {
status: "ok",
timestamp: new Date().toISOString()
});
};
const handleGetUserEvents = async (
req: IncomingMessage,
res: ServerResponse,
userId: string,
query: any
) => {
const moose = getMooseUtils(req);
if (!moose) {
return sendError(res, 500, "Moose utilities not available");
}
const { client, sql } = moose;
const limit = parseInt(query.limit || "10");
const eventType = query.eventType;
try {
const cols = UserEvents.columns;
const querySQL = sql`
SELECT
${cols.id},
${cols.event_type},
${cols.timestamp}
FROM ${UserEvents}
WHERE ${cols.user_id} = ${userId}
${eventType ? sql`AND ${cols.event_type} = ${eventType}` : sql``}
ORDER BY ${cols.timestamp} DESC
LIMIT ${limit}
`;
const result = await client.query.execute(querySQL);
const events = await result.json();
sendJSON(res, 200, {
userId,
count: events.length,
events
});
} catch (error) {
sendError(res, 500, String(error));
}
};
const handleCreateEvent = async (
req: IncomingMessage,
res: ServerResponse,
userId: string
) => {
try {
const body = await parseBody(req);
const { eventType, data } = body;
if (!eventType || !data) {
return sendError(res, 400, "eventType and data are required");
}
// Handle POST logic
sendJSON(res, 201, {
success: true,
userId,
eventType,
data
});
} catch (error) {
sendError(res, 400, "Invalid request body");
}
};
const handleProtected = (req: IncomingMessage, res: ServerResponse) => {
const moose = getMooseUtils(req);
if (!moose?.jwt) {
return sendError(res, 401, "Unauthorized");
}
sendJSON(res, 200, {
message: "Authenticated",
userId: moose.jwt.sub,
claims: moose.jwt
});
};
// Main handler with routing
const handler = async (req: IncomingMessage, res: ServerResponse) => {
const url = parseUrl(req.url || "", true);
const pathname = url.pathname || "/";
const method = req.method || "GET";
// CORS headers
res.setHeader("Access-Control-Allow-Origin", "*");
res.setHeader("Access-Control-Allow-Methods", "GET, POST, OPTIONS");
res.setHeader("Access-Control-Allow-Headers", "Content-Type, Authorization");
// Handle preflight
if (method === "OPTIONS") {
res.writeHead(204);
res.end();
return;
}
// Route matching
if (pathname === "/health" && method === "GET") {
return handleHealth(req, res);
}
// Match /users/:userId/events
const userEventsMatch = pathname.match(/^\/users\/([^\/]+)\/events$/);
if (userEventsMatch) {
const userId = userEventsMatch[1];
if (method === "GET") {
return handleGetUserEvents(req, res, userId, url.query);
}
if (method === "POST") {
return handleCreateEvent(req, res, userId);
}
return sendError(res, 405, "Method not allowed");
}
if (pathname === "/protected" && method === "GET") {
return handleProtected(req, res);
}
// 404
sendError(res, 404, "Not found");
};
);
```
## Pattern Matching for Routes
```ts
// Simple pattern matching
const matchRoute = (pathname: string, pattern: string): { [key: string]: string } | null => {
const patternParts = pattern.split("/");
const pathParts = pathname.split("/");
if (patternParts.length !== pathParts.length) {
return null;
}
const params: { [key: string]: string } = {};
for (let i = 0; i < patternParts.length; i++) {
if (patternParts[i].startsWith(":")) {
const paramName = patternParts[i].slice(1);
params[paramName] = pathParts[i];
} else if (patternParts[i] !== pathParts[i]) {
return null;
}
}
return params;
};
// Usage
const handler = async (req: IncomingMessage, res: ServerResponse) => {
const url = parseUrl(req.url || "", true);
const pathname = url.pathname || "/";
const userParams = matchRoute(pathname, "/users/:userId");
if (userParams) {
const { userId } = userParams;
// Handle user route
return;
}
const eventParams = matchRoute(pathname, "/users/:userId/events/:eventId");
if (eventParams) {
const { userId, eventId } = eventParams;
// Handle event route
return;
}
};
```
## Streaming Responses
```ts
const handleStreamData = async (req: IncomingMessage, res: ServerResponse) => {
const moose = getMooseUtils(req);
if (!moose) {
return sendError(res, 500, "Utilities not available");
}
const { client, sql } = moose;
res.writeHead(200, {
"Content-Type": "application/x-ndjson",
"Transfer-Encoding": "chunked"
});
const query = sql`
SELECT
${MyTable.columns.id},
${MyTable.columns.name},
${MyTable.columns.data}
FROM ${MyTable}
ORDER BY ${MyTable.columns.createdAt} DESC
LIMIT 1000
`;
const result = await client.query.execute(query);
const data = await result.json();
// Stream data in chunks
for (const row of data) {
res.write(JSON.stringify(row) + "\n");
await new Promise(resolve => setTimeout(resolve, 10));
}
res.end();
};
```
## WebApp Configuration
```ts
new WebApp(name, handler, config)
```
**WebAppConfig:**
```ts
interface WebAppConfig {
mountPath: string;
metadata?: { description?: string };
injectMooseUtils?: boolean; // default: true
}
```
## Accessing Moose Utilities
```ts
const moose = getMooseUtils(req);
if (!moose) {
res.writeHead(500, { "Content-Type": "application/json" });
res.end(JSON.stringify({ error: "Utilities not available" }));
return;
}
const { client, sql, jwt } = moose;
```
## Best Practices
1. **Parse URL with url module**: Use `url.parse()` for query parameters and pathname
2. **Set Content-Type headers**: Always set appropriate response headers
3. **Handle errors gracefully**: Wrap async operations in try-catch
4. **Use helper functions**: Extract common patterns (sendJSON, parseBody)
5. **Implement routing logic**: Use pattern matching for dynamic routes
6. **Handle CORS**: Set CORS headers if needed for browser clients
7. **Stream large responses**: Use chunked encoding for large datasets
## Middleware Pattern
Create your own middleware pattern:
```ts
type Middleware = (
req: IncomingMessage,
res: ServerResponse,
next: () => Promise
) => Promise;
const createMiddlewareChain = (...middlewares: Middleware[]) => {
return async (req: IncomingMessage, res: ServerResponse) => {
let index = 0;
const next = async (): Promise => {
if (index < middlewares.length) {
const middleware = middlewares[index++];
await middleware(req, res, next);
}
};
await next();
};
};
// Example middleware
const loggerMiddleware: Middleware = async (req, res, next) => {
console.log(`${req.method} ${req.url}`);
await next();
};
const authMiddleware: Middleware = async (req, res, next) => {
const moose = getMooseUtils(req);
if (!moose?.jwt) {
sendError(res, 401, "Unauthorized");
return;
}
await next();
};
const routeHandler: Middleware = async (req, res, next) => {
sendJSON(res, 200, { message: "Success" });
};
// Create handler with middleware chain
const handler = createMiddlewareChain(
loggerMiddleware,
authMiddleware,
routeHandler
);
```
## When to Use Raw Node.js
**Ideal for:**
- Maximum control over request/response
- Performance-critical applications
- Minimal dependencies
- Custom protocols or streaming
- Learning HTTP fundamentals
**Not ideal for:**
- Rapid development (frameworks are faster)
- Complex routing (use Express/Koa instead)
- Large teams (frameworks provide structure)
- Standard REST APIs (frameworks have better DX)
## Troubleshooting
### Response not closing
**Solution:** Always call `res.end()`:
```ts
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify(data)); // Don't forget this!
```
### Query parameters not parsing
**Solution:** Use `url.parse()` with `true` for query parsing:
```ts
const url = parseUrl(req.url || "", true); // true enables query parsing
const limit = url.query.limit;
```
### POST body not available
**Solution:** Manually parse the request stream:
```ts
const parseBody = (req: IncomingMessage): Promise => {
return new Promise((resolve, reject) => {
let body = "";
req.on("data", chunk => body += chunk.toString());
req.on("end", () => {
try {
resolve(body ? JSON.parse(body) : {});
} catch (error) {
reject(new Error("Invalid JSON"));
}
});
req.on("error", reject);
});
};
```
---
## moose/changelog
Source: moose/changelog.mdx
ReleaseHighlights,
Added,
Changed,
Deprecated,
Fixed,
Security,
BreakingChanges
} from "@/components/changelog-category";
# Moose Changelog
## What is this page?
This changelog tracks all meaningful changes to Moose. Each entry includes the PR link and contributor credit, organized by date (newest first). Use this page to stay informed about new features, fixes, and breaking changes that might affect your projects.
## How to read this changelog:
Key features, enhancements, or fixes for each release.
New features and capabilities.
Updates to existing functionality or improvements.
Features that are no longer recommended for use and may be removed in the future.
Bug fixes and reliability improvements.
Changes that require user action or may break existing usage.
---
## 2025-08-21
- **Analytics API Standardization** — Standardized naming of classes and utilities for analytics APIs
- **Complete S3Queue Engine Support** — Full implementation of ClickHouse S3Queue engine with comprehensive parameter support, modular architecture, and generic settings framework.
- **S3Queue Engine Configuration** — Added complete support for all ClickHouse S3Queue engine parameters including authentication (`aws_access_key_id`, `aws_secret_access_key`), compression, custom headers, and NOSIGN for public buckets. [PR #2674](https://github.com/514-labs/moosestack/pull/2674)
- **Generic Settings Framework** — Introduced a flexible settings system that allows any engine to use configuration settings, laying groundwork for future engine implementations.
- **Enhanced Documentation** — Added comprehensive documentation for OlapTable S3Queue configuration in both TypeScript and Python SDKs.
- **Improved Architecture** — Moved ClickHouse-specific types from core infrastructure to ClickHouse module for better separation of concerns.
- **Settings Location** — Engine-specific settings are now properly encapsulated within their respective engine configurations (e.g., `s3QueueEngineConfig.settings` for S3Queue).
- **API Consistency** — Unified configuration APIs across TypeScript and Python SDKs for S3Queue engine.
- **Compilation Issues** — Fixed struct patterns to handle new S3Queue parameter structure correctly.
- **Diff Strategy** — Enhanced diff strategy to properly handle S3Queue parameter changes.
- `ConsumptionApi` renamed to `Api`
- `EgressConfig` renamed to `ApiConfig`
- `ConsumptionUtil` renamed to `ApiUtil`
- `ConsumptionHelpers` renamed to `ApiHelpers`
*[#2676](https://github.com/514-labs/moosestack/pull/2676) by [camelCasedAditya](https://github.com/camelCasedAditya)*
---
## 2025-08-20
- **Improved IngestPipeline API Clarity** — The confusing `ingest` parameter has been renamed to `ingestApi` (TypeScript) and `ingest_api` (Python) for better clarity. The old parameter names are still supported with deprecation warnings.
- **IngestPipeline Parameter Renamed** — The `ingest` parameter in IngestPipeline configurations has been renamed for clarity:
- **TypeScript**: `ingest: true` → `ingestApi: true`
- **Python**: `ingest=True` → `ingest_api=True`
The old parameter names continue to work with deprecation warnings to ensure backwards compatibility. *[Current PR]*
- **IngestPipeline `ingest` parameter** — The `ingest` parameter in IngestPipeline configurations is deprecated:
- **TypeScript**: Use `ingestApi` instead of `ingest`
- **Python**: Use `ingest_api` instead of `ingest`
The old parameter will be removed in a future major version. Please update your code to use the new parameter names. *[Current PR]*
None - Full backwards compatibility maintained
---
## 2025-06-12
- **Enhanced TypeScript Workflow Types** — Improved type safety for Tasks with optional input/output parameters, supporting `null` types for better flexibility.
- TypeScript workflow Task types now properly support optional input/output with `null` types, enabling more flexible task definitions like `Task` and `Task`.
*[#2442](https://github.com/514-labs/moose/pull/2442) by [DatGuyJonathan](https://github.com/DatGuyJonathan)*
None
---
## 2025-06-10
- **OlapTable Direct Insert API** — New comprehensive insert API with advanced error handling, typia validation, and multiple failure strategies. Enables direct data insertion into ClickHouse tables with production-ready reliability features.
- **Python Workflows V2** — Replaces static file-based routing with explicit `Task` and `Workflow` classes, enabling dynamic task composition and programmatic workflow orchestration. No more reliance on `@task` decorators or file naming conventions.
- OlapTable direct insert API with `insert()` method supporting arrays and Node.js streams. Features comprehensive typia-based validation, three error handling strategies (`fail-fast`, `discard`, `isolate`), configurable error thresholds, memoized ClickHouse connections, and detailed insertion results with failed record tracking.
*[#2437](https://github.com/514-labs/moose/pull/2437) by [callicles](https://github.com/callicles)*
- Enhanced typia validation integration for OlapTable and IngestPipeline with `validateRecord()`, `isValidRecord()`, and `assertValidRecord()` methods providing compile-time type safety and runtime validation.
*[#2437](https://github.com/514-labs/moose/pull/2437) by [callicles](https://github.com/callicles)*
- Python Workflows V2 with `Task[InputType, OutputType]` and `Workflow` classes for dynamic workflow orchestration. Replaces the legacy `@task` decorator approach with explicit task definitions, enabling flexible task composition, type-safe chaining via `on_complete`, retries, timeouts, and scheduling with cron expressions.
*[#2439](https://github.com/514-labs/moose/pull/2439) by [DatGuyJonathan](https://github.com/DatGuyJonathan)*
None
---
## 2025-06-06
- **TypeScript Workflows V2** — Replaces static file-based routing with explicit `Task` and `Workflow` classes, enabling dynamic task composition and programmatic workflow orchestration. No more reliance on file naming conventions for task execution order.
- TypeScript Workflows V2 with `Task` and `Workflow` classes for dynamic workflow orchestration. Replaces the legacy file-based routing approach with explicit task definitions, enabling flexible task composition, type-safe chaining via `onComplete`, configurable retries and timeouts, and flexible scheduling with cron expressions.
*[#2421](https://github.com/514-labs/moose/pull/2421) by [DatGuyJonathan](https://github.com/DatGuyJonathan)*
None
---
## 2025-05-23
- **TypeScript `DeadLetterQueue` support** — Handle failed streaming function messages with type-safe dead letter queues in TypeScript.
- **Improved Python `DeadLetterModel` API** — Renamed `as_t` to `as_typed` for better clarity.
- TypeScript `DeadLetterQueue` class with type guards and transform methods for handling failed streaming function messages with full type safety.
*[#2356](https://github.com/514-labs/moose/pull/2356) by [phiSgr](https://github.com/phiSgr)*
- Renamed `DeadLetterModel.as_t()` to `DeadLetterModel.as_typed()` in Python for better API clarity and consistency.
*[#2356](https://github.com/514-labs/moose/pull/2356) by [phiSgr](https://github.com/phiSgr)*
- `DeadLetterModel.as_t()` method renamed to `as_typed()` in Python. Update your code to use the new method name.
*[#2356](https://github.com/514-labs/moose/pull/2356) by [phiSgr](https://github.com/phiSgr)*
---
## 2025-05-22
- **Refactored CLI 'peek' command** — Now supports peeking into both tables and streams with unified parameters.
- **Simplified CLI experience** — Removed unused commands and routines for a cleaner interface.
- Updated CLI 'peek' command to use a unified 'name' parameter and new flags (`--table`, `--stream`) to specify resource type. Default is table. Documentation updated to match.
*[#2361](https://github.com/514-labs/moose/pull/2361) by [callicles](https://github.com/callicles)*
- Removed unused CLI commands and routines including `Function`, `Block`, `Consumption`, `DataModel`, and `Import`. CLI is now simpler and easier to maintain.
*[#2360](https://github.com/514-labs/moose/pull/2360) by [callicles](https://github.com/callicles)*
None
---
## 2025-05-21
- **Infrastructure state sync** — Auto-syncs DB state before changes, handling manual modifications and failed DDL runs.
- **Fixed nested data type support** — Use objects and arrays in your Moose models.
- State reconciliation for infrastructure planning — Moose now checks and updates its in-memory infra map to match the real database state before planning changes. Makes infra planning robust to manual DB changes and failed runs.
*[#2341](https://github.com/514-labs/moose/pull/2341) by [callicles](https://github.com/callicles)*
- Handling of nested data structures in Moose models for correct support of complex objects and arrays.
*[#2357](https://github.com/514-labs/moose/pull/2357) by [georgevanderson](https://github.com/georgevanderson)*
None
---
## 2025-05-27
- **IPv4 and IPv6 Type Support** — Added native support for IP address types in ClickHouse data models, enabling efficient storage and querying of network data.
- IPv4 and IPv6 data types for ClickHouse integration, supporting native IP address storage and operations.
*[#2373](https://github.com/514-labs/moose/pull/2373) by [phiSgr](https://github.com/phiSgr)*
- Enhanced type parser to handle IP address types across the Moose ecosystem.
*[#2374](https://github.com/514-labs/moose/pull/2374) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-20
- **ClickHouse `Date` type support** — Store and query native date values in your schemas.
- ClickHouse `Date` column support for native date types in Moose schemas and ingestion.
*[#2352](https://github.com/514-labs/moose/pull/2352), [#2351](https://github.com/514-labs/moose/pull/2351) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-19
- **Metadata map propagation** — Metadata is now tracked and available in the infra map for both Python and TypeScript. Improves LLM accuracy and reliability when working with Moose objects.
- Metadata map propagation to infra map for consistent tracking and availability in both Python and TypeScript.
*[#2326](https://github.com/514-labs/moose/pull/2326) by [georgevanderson](https://github.com/georgevanderson)*
None
---
## 2025-05-16
- **New `list[str]` support to Python `AggregateFunction`** — Enables more flexible aggregation logic in Materialized Views.
- **Python `DeadLetterQueue[T]` alpha release** — Automatically route exceptions to a dead letter queue in streaming functions.
- AggregateFunction in Python now accepts `list[str]` for more expressive and type-safe aggregations.
*[#2321](https://github.com/514-labs/moose/pull/2321) by [phiSgr](https://github.com/phiSgr)*
- Python dead letter queues for handling and retrying failed messages in Python streaming functions.
*[#2324](https://github.com/514-labs/moose/pull/2324) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-15
- **Hotfix — casing fix for `JSON` columns in TypeScript.
- TypeScript JSON columns to have consistent casing, avoiding confusion and errors in your code.
*[#2320](https://github.com/514-labs/moose/pull/2320) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-14
- **Introduced TypeScript JSON columns** — Use `Record` for type-safe JSON fields.
- **Ingestion config simplified** — Less config needed for ingestion setup.
- **Python `enum` support improved** — More robust data models.
- TypeScript ClickHouse JSON columns to use `Record` for type-safe JSON fields.
*[#2317](https://github.com/514-labs/moose/pull/2317) by [phiSgr](https://github.com/phiSgr)*
- Pydantic mixin for parsing integer enums by name for more robust Python data models.
*[#2316](https://github.com/514-labs/moose/pull/2316) by [phiSgr](https://github.com/phiSgr)*
- Better Python enum handling in data models for easier enum usage.
*[#2315](https://github.com/514-labs/moose/pull/2315) by [phiSgr](https://github.com/phiSgr)*
- `IngestionFormat` from `IngestApi` config for simpler ingestion setup.
*[#2306](https://github.com/514-labs/moose/pull/2306) by [georgevanderson](https://github.com/georgevanderson)*
None
---
## 2025-05-13
- **New `refresh` CLI command** — Quickly reload data and schema changes from changes applied directly to your database outside of Moose.
- **Python: `LowCardinality` type support** — Better performance for categorical data.
- `refresh` command to reload data and schema with a single command.
*[#2309](https://github.com/514-labs/moose/pull/2309) by [phiSgr](https://github.com/phiSgr)*
- Python support for `LowCardinality(T)` to improve performance for categorical columns.
*[#2313](https://github.com/514-labs/moose/pull/2313) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-10
- **Dependency-based execution order for Materialized Views** — Reduces migration errors and improves reliability.
- Order changes for materialized views based on dependency to ensure correct execution order for dependent changes.
*[#2294](https://github.com/514-labs/moose/pull/2294) by [callicles](https://github.com/callicles)*
None
---
## 2025-05-07
- **Python `datetime64` support** - Enables more precise datetime handling in Python data models.
- **Type mapping in Python `QueryClient`** - Automatically maps ClickHouse query result rows to the correct Pydantic model types.
- Row parsing in QueryClient with type mapping for Python.
*[#2299](https://github.com/514-labs/moose/pull/2299) by [phiSgr](https://github.com/phiSgr)*
- `datetime64` parsing and row parsing in QueryClient for more reliable data handling in Python.
*[#2299](https://github.com/514-labs/moose/pull/2299) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-06
- **`uint` type support in TypeScript** — Enables type safety for unsigned integer fields in Typescript data models.
- uint type support in TypeScript for unsigned integers in Moose models.
*[#2295](https://github.com/514-labs/moose/pull/2295) by [phiSgr](https://github.com/phiSgr)*
None
---
## 2025-05-01
- **Explicit dependency tracking for materialized views** — Improves data lineage, migration reliability, and documentation.
- Explicit dependency tracking for materialized views to make migrations and data lineage more robust and easier to understand.
*[#2282](https://github.com/514-labs/moose/pull/2282) by [callicles](https://github.com/callicles)*
- Required `selectTables` field in `MaterializedView` config that must specify an array of `OlapTable` objects for the source tables.
*[#2282](https://github.com/514-labs/moose/pull/2282) by [callicles](https://github.com/callicles)*
---
## 2025-04-30
- **More flexible `JSON_ARRAY` configuration for `IngestApi`** — Now accepts both arrays and single elements. Default config is now `JSON_ARRAY`.
- **Python rich ClickHouse type support** — Added support for advanced types in Python models:
- `Decimal`: `clickhouse_decimal(precision, scale)`
- `datetime` with precision: `clickhouse_datetime64(precision)`
- `date`: `date`
- `int` with size annotations: `Annotated[int, 'int8']`, `Annotated[int, 'int32']`, etc.
- `UUID`: `UUID`
- `JSON_ARRAY` to allow both array and single element ingestion for more flexible data handling.
*[#2285](https://github.com/514-labs/moose/pull/2285) by [phiSgr](https://github.com/phiSgr)*
- Python rich ClickHouse type support with:
- `Decimal`: `clickhouse_decimal(precision, scale)`
- `datetime` with precision: `clickhouse_datetime64(precision)`
- `date`: `date`
- `int` with size annotations: `Annotated[int, 'int8']`, `Annotated[int, 'int32']`, etc.
- `UUID`: `UUID`
for more expressive data modeling.
*[#2284](https://github.com/514-labs/moose/pull/2284) by [phiSgr](https://github.com/phiSgr)*
None
---
## Configuration
Source: moose/configuration.mdx
Configuration for Moose
) => {
const branch = process.env.NEXT_PUBLIC_VERCEL_GIT_COMMIT_REF || 'main';
const url = `https://github.com/514-labs/moose/blob/${branch}/${path}`;
return (
{children}
);
};
# Project Configuration
Moose provides flexible configuration through multiple sources, allowing you to customize your application for different environments while keeping sensitive data secure.
## Configuration Precedence
Moose loads configuration from multiple sources in the following order (highest to lowest priority):
1. **System environment variables** (`MOOSE_*`) - Highest priority, never overwritten
2. **`.env.local`** - Local development secrets (gitignored, only loaded in dev mode)
3. **`.env.{environment}`** - Environment-specific files (`.env.dev`, `.env.prod`)
4. **`.env`** - Base environment defaults (committed to git)
5. **`moose.config.toml`** - Structured configuration file
6. **Default values** - Built-in defaults from Moose
This precedence allows you to:
- Store safe defaults in `moose.config.toml`
- Override with environment-specific `.env` files
- Keep local secrets in `.env.local` (development only)
- Use system env vars for production secrets
See the configuration loading implementation in the source code:
- dotenv.rs - See `load_dotenv_files` function
- project.rs - See `load` function in the `Project` impl block
## Environment Variables & .env Files
### Overview
Moose automatically loads `.env` files based on the command you run:
- `moose dev` → Loads `.env` → `.env.dev` → `.env.local`
- `moose prod` → Loads `.env` → `.env.prod` (`.env.local` is NOT loaded)
- `moose build` → Loads `.env` → `.env.prod` (production mode)
`.env.local` is only loaded in development mode. In production, use system environment variables or `.env.prod` for configuration. This prevents accidentally exposing local development credentials in production.
### File Naming Convention
| File | Purpose | Committed? | When Loaded |
|:-----|:--------|:-----------|:------------|
| `.env` | Base defaults for all environments | ✅ Yes | Always |
| `.env.dev` | Development-specific config | ✅ Yes | `moose dev` |
| `.env.prod` | Production-specific config | ✅ Yes | `moose prod`, `moose build` |
| `.env.local` | Local secrets and overrides | ❌ No (gitignored) | `moose dev` only |
### Setting Up .env Files
**1. Create a `.env` file** with safe, non-secret defaults:
```bash
# .env - Committed to git
MOOSE_HTTP_SERVER_CONFIG__PORT=4000
MOOSE_CLICKHOUSE_CONFIG__DB_NAME=local
```
**2. Create environment-specific files:**
```bash
# .env.dev - Development settings
MOOSE_LOGGER__LEVEL=debug
MOOSE_FEATURES__WORKFLOWS=false
```
```bash
# .env.prod - Production settings
MOOSE_LOGGER__LEVEL=info
MOOSE_LOGGER__FORMAT=Json
MOOSE_FEATURES__WORKFLOWS=true
```
**3. Create `.env.local`** for your local secrets (gitignored):
```bash
# .env.local - NOT committed to git
MOOSE_CLICKHOUSE_CONFIG__PASSWORD=my-local-password
MOOSE_REDIS_CONFIG__URL=redis://localhost:6380
```
**4. Update `.gitignore`:**
```gitignore
# Environment files with secrets
.env.local
```
### Environment Variable Naming
All Moose environment variables use the `MOOSE_` prefix with double underscores (`__`) to separate nested configuration sections:
```bash
MOOSE___=value
```
**Examples:**
| Config in moose.config.toml | Environment Variable |
|:----------------------------|:---------------------|
| `clickhouse_config.host` | `MOOSE_CLICKHOUSE_CONFIG__HOST` |
| `clickhouse_config.port` | `MOOSE_CLICKHOUSE_CONFIG__PORT` |
| `http_server_config.port` | `MOOSE_HTTP_SERVER_CONFIG__PORT` |
| `features.workflows` | `MOOSE_FEATURES__WORKFLOWS` |
| `redis_config.url` | `MOOSE_REDIS_CONFIG__URL` |
### Complete Example
**File structure:**
```
my-moose-project/
├── .env # Base config
├── .env.dev # Dev overrides
├── .env.prod # Prod overrides
├── .env.local # Local secrets (gitignored)
└── moose.config.toml # Structured config
```
**.env** (committed):
```bash
# Base configuration for all environments
MOOSE_HTTP_SERVER_CONFIG__PORT=4000
MOOSE_CLICKHOUSE_CONFIG__DB_NAME=moose_db
```
**.env.dev** (committed):
```bash
# Development environment
MOOSE_LOGGER__LEVEL=debug
MOOSE_CLICKHOUSE_CONFIG__HOST=localhost
MOOSE_REDIS_CONFIG__URL=redis://localhost:6379
```
**.env.prod** (committed):
```bash
# Production environment
MOOSE_LOGGER__LEVEL=info
MOOSE_LOGGER__FORMAT=Json
MOOSE_CLICKHOUSE_CONFIG__USE_SSL=true
```
**.env.local** (gitignored):
```bash
# Local development secrets
MOOSE_CLICKHOUSE_CONFIG__PASSWORD=my-dev-password
MOOSE_TEMPORAL_CONFIG__API_KEY=my-temporal-key
```
**Usage:**
```bash
# Development - loads .env → .env.dev → .env.local
moose dev
# Production - loads .env → .env.prod (NOT .env.local)
moose prod
```
## moose.config.toml Reference
The `moose.config.toml` file is the primary way to configure all Moose infrastructure including ClickHouse, Redpanda, Redis, Temporal, and HTTP servers.
Do not use docker-compose overrides to modify Moose-managed services. See [Development Mode](/moose/local-dev#extending-docker-infrastructure) for guidelines on when to use docker-compose extensions.
```toml
# Programming language used in the project (`Typescript` or `Python`)
language = "Typescript"
# Map of supported old versions and their locations (Default: {})
# supported_old_versions = { "0.1.0" = "path/to/old/version" }
#Telemetry configuration for usage tracking and metrics
[telemetry]
# Whether telemetry collection is enabled
enabled = true
# Whether to export metrics to external systems
export_metrics = true
# Flag indicating if the user is a Moose developer
is_moose_developer = false
# Redpanda streaming configuration (also aliased as `kafka_config`)
[redpanda_config]
# Broker connection string (e.g., "host:port") (Default: "localhost:19092")
broker = "localhost:19092"
# Confluent Schema Registry URL (optional)
# schema_registry_url = "http://localhost:8081"
# Message timeout in milliseconds (Default: 1000)
message_timeout_ms = 1000
# Default retention period in milliseconds (Default: 30000)
retention_ms = 30000
# Replication factor for topics (Default: 1)
replication_factor = 1
# SASL username for authentication (Default: None)
# sasl_username = "user"
# SASL password for authentication (Default: None)
# sasl_password = "password"
# SASL mechanism (e.g., "PLAIN", "SCRAM-SHA-256") (Default: None)
# sasl_mechanism = "PLAIN"
# Security protocol (e.g., "SASL_SSL", "PLAINTEXT") (Default: None)
# security_protocol = "SASL_SSL"
# Namespace for topic isolation (Default: None)
# namespace = "my_namespace"
# ClickHouse database configuration
[clickhouse_config]
# Database name (Default: "local")
db_name = "local"
# ClickHouse user (Default: "panda")
user = "panda"
# ClickHouse password (Default: "pandapass")
password = "pandapass"
# Whether to use SSL for connection (Default: false)
use_ssl = false
# ClickHouse host (Default: "localhost")
host = "localhost"
# ClickHouse HTTP port (Default: 18123)
host_port = 18123
# ClickHouse native protocol port (Default: 9000)
native_port = 9000
# Optional host path to mount as the ClickHouse data volume (uses Docker volume if None) (Default: None)
# host_data_path = "/path/on/host/clickhouse_data"
# Optional list of additional databases to create on startup (Default: [])
# additional_databases = ["analytics", "staging"]
# HTTP server configuration for local development
[http_server_config]
# Host to bind the webserver to (Default: "localhost")
host = "localhost"
# Port for the main API server (Default: 4000)
port = 4000
# Port for the management server (Default: 5001)
management_port = 5001
# Optional path prefix for all routes (Default: None)
# path_prefix = "api"
# Number of worker processes for consumption API cluster (TypeScript only) (Default: Auto-calculated - 70% of CPU cores)
# Python projects always use 1 worker regardless of this setting
# api_workers = 2
# Redis configuration
[redis_config]
# Redis connection URL (Default: "redis://127.0.0.1:6379")
url = "redis://127.0.0.1:6379"
# Namespace prefix for all Redis keys (Default: "MS")
key_prefix = "MS"
# State storage configuration for migrations
[state_config]
# Storage backend: "clickhouse" (default) or "redis"
# - "clickhouse": Store state in ClickHouse _MOOSE_STATE table (requires KeeperMap)
# - "redis": Store state in Redis (best for existing Redis infra or multi-tenant setups)
storage = "clickhouse"
# Git configuration
[git_config]
# Name of the main branch (Default: "main")
main_branch_name = "main"
# Temporal workflow configuration
[temporal_config]
# Temporal database user (Default: "temporal")
db_user = "temporal"
# Temporal database password (Default: "temporal")
db_password = "temporal"
# Temporal database port (Default: 5432)
db_port = 5432
# Temporal server host (Default: "localhost")
temporal_host = "localhost"
# Temporal server port (Default: 7233)
temporal_port = 7233
# Temporal server scheme - "http" or "https" (Default: auto-detect based on host)
# temporal_scheme = "https"
# Temporal server version (Default: "1.22.3")
temporal_version = "1.22.3"
# Temporal admin tools version (Default: "1.22.3")
admin_tools_version = "1.22.3"
# Temporal UI version (Default: "2.21.3")
ui_version = "2.21.3"
# Temporal UI port (Default: 8080)
ui_port = 8080
# Temporal UI CORS origins (Default: "http://localhost:3000")
ui_cors_origins = "http://localhost:3000"
# Temporal dynamic config path (Default: "config/dynamicconfig/development-sql.yaml")
config_path = "config/dynamicconfig/development-sql.yaml"
# PostgreSQL version for Temporal database (Default: "13")
postgresql_version = "13"
# Path to Temporal client certificate (mTLS) (Default: "")
client_cert = ""
# Path to Temporal client key (mTLS) (Default: "")
client_key = ""
# Path to Temporal CA certificate (mTLS) (Default: "")
ca_cert = ""
# API key for Temporal Cloud connection (Default: "")
api_key = ""
# JWT (JSON Web Token) authentication configuration (Optional)
[jwt]
# Enforce JWT on all consumption APIs (Default: false)
enforce_on_all_consumptions_apis = false
# Enforce JWT on all ingestion APIs (Default: false)
enforce_on_all_ingest_apis = false
# Secret key for JWT signing (Required if jwt section is present)
# secret = "your-jwt-secret"
# JWT issuer (Required if jwt section is present)
# issuer = "your-issuer-name"
# JWT audience (Required if jwt section is present)
# audience = "your-audience-name"
# General authentication configuration
[authentication]
# Optional hashed admin API key for auth (Default: None)
# admin_api_key = "hashed_api_key"
# Migration configuration
[migration_config]
# Operations to ignore during migration plan generation and drift detection
# Useful for managing TTL changes outside of Moose or when you don't want
# migration failures due to TTL drift
# ignore_operations = ["ModifyTableTtl", "ModifyColumnTtl"]
# Feature flags
[features]
# Enable the streaming engine (Default: true)
streaming_engine = true
# Enable Temporal workflows (Default: false)
workflows = false
# Enable OLAP database (Default: true)
olap = true
# Enable Analytics APIs server (Default: true)
apis = true
```
## Common Environment Variables
Here are the most commonly used environment variables for overriding `moose.config.toml` settings:
### HTTP Server
```bash
MOOSE_HTTP_SERVER_CONFIG__HOST=0.0.0.0
MOOSE_HTTP_SERVER_CONFIG__PORT=4000
MOOSE_HTTP_SERVER_CONFIG__MANAGEMENT_PORT=5001
```
### ClickHouse
```bash
MOOSE_CLICKHOUSE_CONFIG__HOST=localhost
MOOSE_CLICKHOUSE_CONFIG__PORT=9000
MOOSE_CLICKHOUSE_CONFIG__DB_NAME=local
MOOSE_CLICKHOUSE_CONFIG__USER=panda
MOOSE_CLICKHOUSE_CONFIG__PASSWORD=pandapass
MOOSE_CLICKHOUSE_CONFIG__USE_SSL=false
```
### Redis
```bash
MOOSE_REDIS_CONFIG__URL=redis://localhost:6379
MOOSE_REDIS_CONFIG__KEY_PREFIX=MS
```
### Redpanda/Kafka
```bash
MOOSE_REDPANDA_CONFIG__BROKER=localhost:19092
MOOSE_REDPANDA_CONFIG__NAMESPACE=my_namespace
MOOSE_REDPANDA_CONFIG__SASL_USERNAME=user
MOOSE_REDPANDA_CONFIG__SASL_PASSWORD=password
```
### Feature Flags
```bash
MOOSE_FEATURES__STREAMING_ENGINE=true
MOOSE_FEATURES__WORKFLOWS=false
MOOSE_FEATURES__OLAP=true
MOOSE_FEATURES__APIS=true
```
### Logging
```bash
MOOSE_LOGGER__LEVEL=info
MOOSE_LOGGER__STDOUT=true
MOOSE_LOGGER__FORMAT=Json
```
For a complete list of all available configuration options, see the [moose.config.toml Reference](#mooseconfigtoml-reference) above.
---
## Data Modeling
Source: moose/data-modeling.mdx
Data Modeling for Moose
# Data Modeling
## Overview
In Moose, data models are just Pydantic models that become the authoritative source for your infrastructure schemas.
Data Models are used to define:
- [OLAP Tables and Materialized Views](/moose/olap) (automatically generated DDL)
- [Redpanda/Kafka Streams](/moose/streaming) (schema registry and topic validation)
- [API Contracts](/moose/apis) (request/response validation and OpenAPI specs)
- [Workflow Task Input and Output Types](/moose/workflows) (typed function inputs/outputs)
## Philosophy
### Problem: Analytical Backends are Prone to Schema Drift
Analytical backends are unique in that they typically have to coordinate schemas across multiple systems that each have their own type systems and constraints.
Consider a typical pipeline for ingesting events into a ClickHouse table.
```python
# What you're building:
# API endpoint → Kafka topic → ClickHouse table → Analytics API
# Traditional approach: Define schema 4 times
# 1. API validation with Pydantic
class APIEvent(BaseModel):
user_id: str
event_type: Literal["click", "view", "purchase"]
timestamp: datetime
# 2. Kafka schema registration
kafka_schema = {
"type": "record",
"fields": [
{"name": "user_id", "type": "string"},
{"name": "event_type", "type": "string"},
{"name": "timestamp", "type": "string"}
]
}
# 3. ClickHouse DDL
# CREATE TABLE events (
# user_id String,
# event_type LowCardinality(String),
# timestamp DateTime
# ) ENGINE = MergeTree()
# 4. Analytics API response
class EventsResponse(BaseModel):
user_id: str
event_type: str
timestamp: datetime
```
**The Problem:** When you add a field or change a type, you must update it in multiple places. Miss one, and you get:
- Silent data loss (Kafka → ClickHouse sync fails)
- Runtime errors
- Data quality issues (validation gaps)
### Solution: Model In Code, Reuse Everywhere
With Moose you define your schemas in native language types with optional metadata. This lets you reuse your schemas across multiple systems:
```python filename="app/main.py"
# 1. Define your schema (WHAT your data looks like)
class MyFirstDataModel(BaseModel):
id: Key[str]
some_string: Annotated[str, "LowCardinality"]
some_number: int
some_date: datetime
some_json: Any
# This single model can be reused across multiple systems:
my_first_pipeline = IngestPipeline[MyFirstDataModel]("my_first_pipeline", IngestPipelineConfig(
ingest_api=True, # POST API endpoint
stream=True, # Kafka topic
table=True # ClickHouse table
))
```
## How It Works
The key idea is leveraging Annotated types to extend base Python types with "metadata" that represents specific optimizations and details on how to map that type in ClickHouse:
```python
from moose_lib import Key, clickhouse_decimal
from typing import Annotated
class Model(BaseModel):
# Base type: str
# ClickHouse: String with primary key
id: Key[str]
# Base type: Decimal
# ClickHouse: Decimal(10,2) for precise money
amount: clickhouse_decimal(10, 2)
# Base type: str
# ClickHouse: LowCardinality(String) for enums
status: Annotated[str, "LowCardinality"]
# Base type: datetime
# ClickHouse: DateTime
created_at: datetime
events = OlapTable[Event]("events")
# In your application code:
tx = Event(
id="id_123",
amount=Decimal("99.99"), # Regular Decimal in Python
status="completed", # Regular string in Python
created_at=datetime.now()
)
# In ClickHouse:
# CREATE TABLE events (
# id String,
# amount Decimal(10,2),
# status LowCardinality(String),
# created_at DateTime
# ) ENGINE = MergeTree()
# ORDER BY transaction_id
```
**The metadata annotations are compile-time only** - they don't affect your runtime code. Your application works with regular strings and numbers, while Moose uses the metadata to generate optimized infrastructure.
## Building Data Models: From Simple to Complex
Let's walk through how to model data for different infrastructure components and see how types behave across them.
### Simple Data Model Shared Across Infrastructure
A basic data model that works identically across all infrastructure components:
```python filename="app/datamodels/simple_shared.py"
from pydantic import BaseModel
from datetime import datetime
class SimpleShared(BaseModel):
id: str
name: str
value: float
timestamp: datetime
# This SAME model creates all infrastructure
pipeline = IngestPipeline[SimpleShared]("simple_shared", IngestPipelineConfig(
ingest_api=True, # Creates: POST /ingest/simple_shared
stream=True, # Creates: Kafka topic
table=True # Creates: ClickHouse table
))
# The exact same types work everywhere:
# - API validates: { "id": "123", "name": "test", "value": 42, "timestamp": "2024-01-01T00:00:00Z" }
# - Kafka stores: { "id": "123", "name": "test", "value": 42, "timestamp": "2024-01-01T00:00:00Z" }
# - ClickHouse table: id String, name String, value Float64, timestamp DateTime
```
**Key Point**: One model definition creates consistent schemas across all systems.
### Composite Types Shared Across Infrastructure
Complex types including nested objects, arrays, and enums work seamlessly across all components:
```python filename="app/datamodels/composite_shared.py"
from moose_lib import Key
from pydantic import BaseModel
from typing import List, Dict, Any, Optional, Literal
from datetime import datetime
class Metadata(BaseModel):
category: str
priority: float
tags: List[str]
class CompositeShared(BaseModel):
id: Key[str] # Primary key
status: Literal["active", "pending", "completed"] # Enum
# Nested object
metadata: Metadata
# Arrays and maps
values: List[float]
attributes: Dict[str, Any]
# Optional field
description: Optional[str] = None
created_at: datetime
# Using in IngestPipeline - all types preserved
pipeline = IngestPipeline[CompositeShared]("composite_shared", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True
))
# How the types map:
# - API validates nested structure and enum values
# - Kafka preserves the exact JSON structure
# - ClickHouse creates:
# - id String (with PRIMARY KEY)
# - status Enum8('active', 'pending', 'completed')
# - metadata.category String, metadata.priority Float64, metadata.tags Array(String)
# - values Array(Float64)
# - attributes String (JSON)
# - description Nullable(String)
# - created_at DateTime
```
**Key Point**: Complex types including nested objects and arrays work consistently across all infrastructure.
### ClickHouse-Specific Types (Standalone vs IngestPipeline)
ClickHouse type annotations optimize database performance but are **transparent to other infrastructure**:
```python filename="app/datamodels/clickhouse_optimized.py"
from moose_lib import Key, clickhouse_decimal, OlapTable, IngestPipeline, IngestPipelineConfig
from typing import Annotated
from pydantic import BaseModel
from datetime import datetime
class Details(BaseModel):
name: str
value: float
class ClickHouseOptimized(BaseModel):
id: Key[str]
# ClickHouse-specific type annotations
amount: clickhouse_decimal(10, 2) # Decimal(10,2) in ClickHouse
category: Annotated[str, "LowCardinality"] # LowCardinality(String) in ClickHouse
# Optimized nested type
details: Annotated[Details, "ClickHouseNamedTuple"] # NamedTuple in ClickHouse
timestamp: datetime
# SCENARIO 1: Standalone OlapTable - gets all optimizations
table = OlapTable[ClickHouseOptimized]("optimized_table", {
"order_by_fields": ["id", "timestamp"]
})
# Creates ClickHouse table with:
# - amount Decimal(10,2)
# - category LowCardinality(String)
# - details Tuple(name String, value Float64)
# SCENARIO 2: IngestPipeline - optimizations ONLY in ClickHouse
pipeline = IngestPipeline[ClickHouseOptimized]("optimized_pipeline", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True
))
# What happens at each layer:
# 1. API receives/validates: { "amount": "123.45", "category": "electronics", ... }
# - Sees amount as str, category as str (annotations ignored)
# 2. Kafka stores: { "amount": "123.45", "category": "electronics", ... }
# - Plain JSON, no ClickHouse types
# 3. ClickHouse table gets optimizations:
# - amount stored as Decimal(10,2)
# - category stored as LowCardinality(String)
# - details stored as NamedTuple
```
**Key Point**: ClickHouse annotations are metadata that ONLY affect the database schema. Your application code and other infrastructure components see regular TypeScript/Python types.
### API Contracts with Runtime Validators
APIs use runtime validation to ensure query parameters meet your requirements:
```python filename="app/apis/consumption_with_validation.py"
from moose_lib import Api
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional, List
# Query parameters with runtime validation
class SearchParams(BaseModel):
# Date range validation
start_date: str = Field(..., regex="^\\d{4}-\\d{2}-\\d{2}$") # Must be YYYY-MM-DD
end_date: str = Field(..., regex="^\\d{4}-\\d{2}-\\d{2}$")
# Numeric constraints
min_value: Optional[float] = Field(None, ge=0) # Optional, but if provided >= 0
max_value: Optional[float] = Field(None, le=1000) # Optional, but if provided <= 1000
# String validation
category: Optional[str] = Field(None, min_length=2, max_length=50)
# Pagination
page: Optional[int] = Field(None, ge=1)
limit: Optional[int] = Field(None, ge=1, le=100)
# Response data model
class SearchResult(BaseModel):
id: str
name: str
value: float
category: str
timestamp: datetime
# Create validated API endpoint
async def search_handler(params: SearchParams, client: MooseClient) -> List[SearchResult]:
# Params are already validated when this runs
# Build a parameterized query safely
clauses = [
"timestamp >= {startDate}",
"timestamp <= {endDate}"
]
params_dict = {
"startDate": params.start_date,
"endDate": params.end_date,
"limit": params.limit or 10,
"offset": ((params.page or 1) - 1) * (params.limit or 10)
}
if params.min_value is not None:
clauses.append("value >= {minValue}")
params_dict["minValue"] = params.min_value
if params.max_value is not None:
clauses.append("value <= {maxValue}")
params_dict["maxValue"] = params.max_value
if params.category is not None:
clauses.append("category = {category}")
params_dict["category"] = params.category
where_clause = " AND ".join(clauses)
query = f"""
SELECT * FROM data_table
WHERE {where_clause}
LIMIT {limit}
OFFSET {offset}
"""
results = await client.query.execute(query, params=params_dict)
return [SearchResult(**row) for row in results]
search_api = Api[SearchParams, List[SearchResult]](
"search",
handler=search_handler
)
# API Usage Examples:
# ✅ Valid: GET /api/search?start_date=2024-01-01&end_date=2024-01-31
# ✅ Valid: GET /api/search?start_date=2024-01-01&end_date=2024-01-31&min_value=100&limit=50
# ❌ Invalid: GET /api/search?start_date=Jan-1-2024 (wrong date format)
# ❌ Invalid: GET /api/search?start_date=2024-01-01&end_date=2024-01-31&limit=200 (exceeds max)
```
**Key Point**: Runtime validators ensure API consumers provide valid data, returning clear error messages for invalid requests before any database queries run.
## Additional Data Modeling Patterns
### Modeling for Stream Processing
When you need to process data in real-time before it hits the database:
```python filename="app/datamodels/stream_example.py"
from moose_lib import Key, Stream, OlapTable
from pydantic import BaseModel
from typing import Dict, Any, Annotated
from datetime import datetime
# Raw data from external source
class RawData(BaseModel):
id: Key[str]
timestamp: datetime
raw_payload: str
source_type: Annotated[str, "LowCardinality"]
# Processed data after transformation
class ProcessedData(BaseModel):
id: Key[str]
timestamp: datetime
field1: str
field2: Annotated[str, "LowCardinality"]
numeric_value: float
attributes: Dict[str, Any]
# Create streams
raw_stream = Stream[RawData]("raw-stream")
processed_table = OlapTable[ProcessedData]("processed_data", OlapConfig(
order_by_fields = ["id", "timestamp"]
))
processed_stream = Stream[ProcessedData]("processed-stream", StreamConfig(
destination=processed_table
))
# Transform raw data
async def process_data(raw: RawData):
parsed = json.loads(raw.raw_payload)
processed = ProcessedData(
id=raw.id,
timestamp=raw.timestamp,
field1=parsed["field_1"],
field2=parsed["field_2"],
numeric_value=float(parsed.get("value", 0)),
attributes=parsed.get("attributes", {})
)
raw_stream.add_transform(processed_stream, process_data)
```
### Modeling for Workflow Tasks
Define strongly-typed inputs and outputs for async jobs:
```python filename="app/workflows/task_example.py"
from moose_lib import Task, TaskContext
from pydantic import BaseModel, Field
from typing import Optional, List, Literal, Dict, Any
from datetime import datetime
# Input validation with constraints
class TaskOptions(BaseModel):
include_metadata: bool
max_items: Optional[int] = Field(None, ge=1, le=100)
class TaskInput(BaseModel):
id: str = Field(..., regex="^[0-9a-f-]{36}$")
items: List[str]
task_type: Literal["typeA", "typeB", "typeC"]
options: Optional[TaskOptions] = None
# Structured output
class ResultA(BaseModel):
category: str
score: float
details: Dict[str, Any]
class ResultB(BaseModel):
values: List[str]
metrics: List[float]
class ResultC(BaseModel):
field1: str
field2: str
field3: float
class TaskOutput(BaseModel):
id: str
processed_at: datetime
result_a: Optional[ResultA] = None
result_b: Optional[ResultB] = None
result_c: Optional[ResultC] = None
# Create workflow task
async def run_task(ctx: TaskContext[TaskInput]) -> TaskOutput:
# Process data based on task type
output = TaskOutput(
id=ctx.input.id,
processed_at=datetime.now()
)
if ctx.input.task_type == "typeA":
output.result_a = await process_type_a(ctx.input)
return output
example_task = Task[TaskInput, TaskOutput](
"example-task",
run_function=run_task,
retries=3,
timeout=30 # seconds
)
```
---
## Summary
Source: moose/deploying.mdx
Summary of deploying Moose into production
# Moose Deploy
## Overview
Once you've finished developing your Moose application locally, the next step is to deploy your Moose app into production. You have two options:
- Self-host your Moose application on your own servers
- Use the [Boreal Cloud hosting platform](https://www.fiveonefour.com/boreal) (from the makers of the Moose Stack)
Want this managed in production for you? Check out Boreal Cloud (from the makers of the Moose Stack).
## Getting Started With Self-Hosting
Moose makes it easy to package and deploy your applications, whether you're deploying to a server with or without internet access. The deployment process is designed to be flexible and can accommodate both containerized and non-containerized environments.
### Deployment Options
1. **Kubernetes Deployment**: Deploy your application to Kubernetes clusters (GKE, EKS, AKS, or on-premises)
2. **Standard Server Deployment**: Deploy your application to a server with internet access
3. **Containerized Cloud Deployment**: Deploy to cloud services like AWS ECS or Google Cloud Run
4. **Offline Server Deployment**: Deploy to an environment without internet access
### Key Deployment Steps
There are three main aspects to deploying a Moose application:
1. Setting up your build environment with Python and the Moose CLI
2. Building your application using `moose build`
3. Setting up your deployment environment with the necessary runtime dependencies (Python, Docker) and configuration
## Configuring Your Deployment
Based on our production experience, we recommend the following best practices for deploying Moose applications:
### Health Monitoring
Configure comprehensive health checks to ensure your application remains available:
- Startup probes to handle initialization
- Readiness probes for traffic management
- Liveness probes to detect and recover from deadlocks
### Zero-Downtime Deployments
Implement graceful termination and rolling updates:
- Pre-stop hooks to handle in-flight requests
- Appropriate termination grace periods
- Rolling update strategies that maintain service availability
### Resource Allocation
Properly size your deployments based on workload:
- CPU and memory requests tailored to your application
- Replicas scaled according to traffic patterns
- Horizontal scaling for high availability
### Environment Configuration
For any deployment type, you'll need to configure:
1. Runtime environment variables for logging, telemetry, and application settings
2. External service connections (ClickHouse, Redpanda, Redis)
3. Network settings and security configurations
4. Application-specific configurations
## Detailed Guides
The following pages provide detailed guides for each deployment scenario, including step-by-step instructions for both Python and TypeScript applications and production-ready configuration templates.
---
## Configuring Moose for cloud environments
Source: moose/deploying/configuring-moose-for-cloud.mdx
Configuring Moose for cloud environments
# Configuring Moose for cloud environments
In the [Packaging Moose for deployment](packaging-moose-for-deployment.mdx) page, we looked at how to package your moose
application into Docker containers (using the `moose build —-docker` command), and you've pushed them to your container repository.
We can connect and configure your container image with remote ClickHouse and Redis-hosted services. You can also optionally
use Redpanda for event streaming and Temporal for workflow orchestration.
The methods used to accomplish this are generally similar, but the specific details depend on your target cloud infrastructure.
So, we'll look at the overarching concepts and provide some common examples.
## Specifying your repository container
Earlier, we created two local containers and pushed them to a docker repository.
```txt filename="Terminal" copy
>docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
moose-df-deployment-aarch64-unknown-linux-gnu 0.3.175 c50674c7a68a About a minute ago 155MB
moose-df-deployment-x86_64-unknown-linux-gnu 0.3.175 e5b449d3dea3 About a minute ago 163MB
```
We pushed the containers to the `514labs` Docker Hub account. So, we have these two containers available for use:
```
514labs/moose-df-deployment-aarch64-unknown-linux-gnu:0.3.175
514labs/moose-df-deployment-x86_64-unknown-linux-gnu:0.3.175
```
In later examples, we'll use an AMD64 (x86_64) based machine, so we'll stick to using the following container image: `514labs/moose-df-deployment-x86_64-unknown-linux-gnu:0.3.175`
We'll also examine how the container image name can be used in various cloud providers and scenarios.
## General overview
The general approach is to use a cloud provider that supports specifying a container image to launch your application. Examples include the Google Kubernetes Engine (GKE), Amazon's Elastic Kubernetes Service (EKS), and Elastic Container Service (ECS). Each provider also offers a way of configuring container environment variables that your container application will have access to.
## Essential Environment Variables
Based on our production deployments, here are the essential environment variables you'll need to configure for your Moose application in cloud environments:
### Logging and Telemetry
```
# Logger configuration
MOOSE_LOGGER__LEVEL=Info
MOOSE_LOGGER__STDOUT=true
MOOSE_LOGGER__FORMAT=Json
# Telemetry configuration
MOOSE_TELEMETRY__ENABLED=false
MOOSE_TELEMETRY__EXPORT_METRICS=true
# For debugging
RUST_BACKTRACE=1
```
### HTTP Server Configuration
```
# HTTP server settings
MOOSE_HTTP_SERVER_CONFIG__HOST=0.0.0.0
MOOSE_HTTP_SERVER_CONFIG__PORT=4000
```
### External Service Connections
For detailed configuration of the external services, refer to the [Preparing ClickHouse and Redpanda](preparing-clickhouse-redpanda.mdx) page.
#### ClickHouse
```
MOOSE_CLICKHOUSE_CONFIG__DB_NAME=
MOOSE_CLICKHOUSE_CONFIG__USER=
MOOSE_CLICKHOUSE_CONFIG__PASSWORD=
MOOSE_CLICKHOUSE_CONFIG__HOST=
MOOSE_CLICKHOUSE_CONFIG__HOST_PORT=8443
MOOSE_CLICKHOUSE_CONFIG__USE_SSL=1
MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT=9440
```
#### Redis
Moose requires Redis for caching and message passing:
```
MOOSE_REDIS_CONFIG__URL=
MOOSE_REDIS_CONFIG__KEY_PREFIX=
```
#### Redpanda (Optional)
If you choose to use Redpanda for event streaming:
```
MOOSE_REDPANDA_CONFIG__BROKER=
MOOSE_REDPANDA_CONFIG__NAMESPACE=
MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS=10043
MOOSE_REDPANDA_CONFIG__SASL_USERNAME=
MOOSE_REDPANDA_CONFIG__SASL_PASSWORD=
MOOSE_REDPANDA_CONFIG__SASL_MECHANISM=SCRAM-SHA-256
MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL=SASL_SSL
MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR=3
```
#### Temporal (Optional)
If you choose to use Temporal for workflow orchestration:
```
MOOSE_TEMPORAL_CONFIG__CA_CERT=/etc/ssl/certs/ca-certificates.crt
MOOSE_TEMPORAL_CONFIG__API_KEY=
MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST=.tmprl.cloud
```
## Securing Sensitive Information
When deploying to cloud environments, it's important to handle sensitive information like passwords and API keys securely. Each cloud provider offers mechanisms for this:
- **Kubernetes**: Use Secrets to store sensitive data. See our [Kubernetes deployment guide](deploying-on-kubernetes.mdx) for examples.
- **Amazon ECS**: Use AWS Secrets Manager or Parameter Store to securely inject environment variables.
- **Other platforms**: Use the platform's recommended secrets management approach.
Never hardcode sensitive values directly in your deployment configuration files.
Please share your feedback about Moose monitoring capabilities through [our GitHub repository](https://github.com/514-labs/moose/issues/new?title=Feedback%20for%20%E2%80%9CMonitoring%E2%80%9D&labels=feedback).
---
## Deploying on an offline server
Source: moose/deploying/deploying-on-an-offline-server.mdx
Deploying on an offline server
# Building and Deploying Moose Applications
This guide will walk you through the process of building a Moose application and deploying it to a server that does not have internet access.
We'll cover both the build environment setup and the deployment environment requirements.
## Build Environment Setup
### Prerequisites
Before you can build a Moose application, you need to set up your build environment with the following dependencies:
OS:
- Debian 10+
- Ubuntu 18.10+
- Fedora 29+
- CentOS/RHEL 8+
- Amazon Linux 2023+
- Mac OS 13+
Common CLI utilities:
- zip
- curl (optional, for installing the Moose CLI)
Python build environment requirements:
1. Python 3.12 or later (we recommend using pyenv for Python version management)
2. pip
### Setting up the Build Environment
First, install the required system dependencies:
```bash
sudo apt update
sudo apt install build-essential libssl-dev zlib1g-dev libbz2-dev \
libreadline-dev libsqlite3-dev curl git libncursesw5-dev xz-utils \
tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
```
Install pyenv and configure your shell:
```bash
curl -fsSL https://pyenv.run | bash
```
Add the following to your `~/.bashrc` or `~/.zshrc`:
```bash
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
```
Install and set Python 3.12:
```bash
pyenv install 3.12
pyenv global 3.12
```
Verify the installation:
```bash
python --version
```
### Installing Moose CLI (Optional)
You can install the Moose CLI using the official installer:
```bash
curl -SfsL https://fiveonefour.com/install.sh | bash -s -- moose
source ~/.bashrc # Or restart your terminal
```
or
```bash
pip install moose-cli
```
## Building Your Application
### 1. Initialize a New Project (Optional)
This step is optional if you already have a Moose project.
Create a new Moose project:
```bash
moose init your-project-name py
cd your-project-name
```
### 2. Build the Application
Make sure you have the `zip` utility installed (`sudo apt install zip`) before building your application.
if you installed the moose cli to be available globally, you can build the application with the following command:
```bash
moose build
```
Or if you installed the moose cli to be available locally, you can build the application with the following command:
The build process will create a deployable package:
```bash
moose build
```
This will create a zip file in your project directory with a timestamp, for example: `your-project-name-YYYY-MM-DD.zip`
## Deployment Environment Setup
### Prerequisites
The deployment server requires:
1. Python 3.12 or later
3. Unzip utility
### Setting up the Deployment Environment
1. Install the runtime environment:
Follow the Python installation steps from the build environment setup section.
2. Install the unzip utility:
```bash
sudo apt install unzip
```
## Deploying Your Application
1. Copy your built application package to the deployment server
2. Extract the application:
```bash
unzip your-project-name-YYYY-MM-DD.zip -d ./app
cd ./app/packager
```
3. Start your application:
```bash
moose prod
```
Ensure all required environment variables and configurations are properly set before starting your application.
## Troubleshooting
- Verify that Python is properly installed using `python --version`
- Check that your application's dependencies are properly listed in `requirements.txt`
- If you encounter Python import errors, ensure your `PYTHONPATH` is properly set
---
## Deploying on Amazon ECS
Source: moose/deploying/deploying-on-ecs.mdx
Deploying on Amazon ECS
# Deploying on Amazon ECS
Moose can be deployed to Amazon's Elastic Container Service (ECS). ECS offers a managed container orchestrator at a fraction of the complexity of managing a Kubernetes cluster.
If you're relatively new to ECS we recommend the following resources:
- [Amazon Elastic Container Service (ECS) with a Load Balancer | AWS Tutorial with New ECS Experience](https://www.youtube.com/watch?v=rUgZNXKbsrY)
- [Tutorial: Deploy NGINX Containers On ECS Fargate with Load Balancer](https://bhaveshmuleva.hashnode.dev/tutorial-deploy-nginx-containers-on-ecs-fargate-with-load-balancer)
- [How to configure target groups ports with listeners and tasks](https://stackoverflow.com/questions/66275574/how-to-configure-target-groups-ports-with-listeners-and-tasks)
The first step is deciding whether you'll host your Moose container on Docker Hub or Amazon's Elastic Container Registry (ECR).
Amazon ECR is straightforward and is designed to work out of the box with ECS. Using Docker Hub works if your moose container is publicly available; however,
if your container is private, you'll need to do a bit more work to provide ECS with your Docker credentials.
> See: [Authenticating with Docker Hub for AWS Container Services](https://aws.amazon.com/blogs/containers/authenticating-with-docker-hub-for-aws-container-services/)
Here is an overview of the steps required:
1. You'll first need to create or use an existing ECS cluster.
2. Then, you'll need to create an ECS `Task definition.` This is where you'll specify whether you want to use AWS Fargate or AWS EC2 instances.
You'll also have options for selecting your OS and Architecture. Specify `Linux/X86-64` or `Linux/ARM-64`. This is important as you'll also need to
specify a matching moose container image, such as `moose-df-deployment-x86_64-unknown-linux-gnu:0.3.175` or `moose-df-deployment-aarch64-unknown-linux-gnu:0.3.175`
3. As with all AWS services, if you're using secrets to store credentials, you will need to specify an IAM role with an `AmazonECSTaskExecutionRolePolicy` and `SecretsManagerReadWrite`
policy.
4. Under the Container section, specify the name of your moose deployment and provide the container image name you're using.
5. Next, specify the Container Port as 4000.
## Configuring container environment variables
While still in the Amazon ECS Task definition section, you'll need to provide the environment variables on which your Moose application depends.
Scroll down to the Environment variables section and fill in each of the following variables.
ClickHouse and Redis are required components for Moose. Redpanda and Temporal are optional - configure them only if you're using these components in your application.
> Note: if you prefer, you can provide the environment variables below via an env file hosted on S3 or using AWS Secrets Manager for sensitive values.
### Core Configuration
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_LOGGER__LEVEL | Log level | Info |
| MOOSE_LOGGER__STDOUT | Enable stdout logging | true |
| MOOSE_LOGGER__FORMAT | Log format | Json |
| RUST_BACKTRACE | Enable backtraces for debugging | 1 |
### HTTP Server Configuration
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_HTTP_SERVER_CONFIG__HOST | Your moose network binding address | 0.0.0.0 |
| MOOSE_HTTP_SERVER_CONFIG__PORT | The network port your moose server is using | 4000 |
### ClickHouse Configuration (Required)
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_CLICKHOUSE_CONFIG__DB_NAME | The name of your ClickHouse database | moose_production |
| MOOSE_CLICKHOUSE_CONFIG__USER | The database user name | clickhouse_user |
| MOOSE_CLICKHOUSE_CONFIG__PASSWORD | The password to your ClickHouse database | (use AWS Secrets Manager) |
| MOOSE_CLICKHOUSE_CONFIG__HOST | The hostname for your ClickHouse database | your-clickhouse.cloud.example.com |
| MOOSE_CLICKHOUSE_CONFIG__HOST_PORT | The HTTPS port for your ClickHouse database | 8443 |
| MOOSE_CLICKHOUSE_CONFIG__USE_SSL | Whether your database connection requires SSL | 1 |
| MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT | The native port for your ClickHouse database | 9440 |
### Redis Configuration (Required)
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_REDIS_CONFIG__URL | Redis connection URL | redis://user:password@redis.example.com:6379 |
| MOOSE_REDIS_CONFIG__KEY_PREFIX | Prefix for Redis keys to isolate namespaces | moose_production |
### Redpanda Configuration (Optional)
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_REDPANDA_CONFIG__BROKER | The hostname for your Redpanda instance | seed-5fbcae97.example.redpanda.com:9092 |
| MOOSE_REDPANDA_CONFIG__NAMESPACE | Namespace for isolation | moose_production |
| MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS | The message timeout delay in milliseconds | 10043 |
| MOOSE_REDPANDA_CONFIG__SASL_USERNAME | Your Redpanda user name | redpanda_user |
| MOOSE_REDPANDA_CONFIG__SASL_PASSWORD | Your Redpanda password | (use AWS Secrets Manager) |
| MOOSE_REDPANDA_CONFIG__SASL_MECHANISM | SASL mechanism | SCRAM-SHA-256 |
| MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL | The Redpanda security protocol | SASL_SSL |
| MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR | Topic replication factor | 3 |
### Temporal Configuration (Optional)
| Key | Description | Example Value |
|-----|-------------|---------------|
| MOOSE_TEMPORAL_CONFIG__CA_CERT | Path to CA certificate | /etc/ssl/certs/ca-certificates.crt |
| MOOSE_TEMPORAL_CONFIG__API_KEY | Temporal Cloud API key | (use AWS Secrets Manager) |
| MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST | Temporal Cloud namespace host | your-namespace.tmprl.cloud |
Consider using a value of greater than 1000ms (1 second) for the Redpanda message timeout delay if you're using a hosted Redpanda cloud service.
Review other options on the Task Creation page and press the `Create` button when ready.
## Using AWS Secrets Manager
For sensitive information like passwords and API keys, we recommend using AWS Secrets Manager. To configure a secret:
1. Go to AWS Secrets Manager and create a new secret
2. Choose "Other type of secret" and add key-value pairs for your secrets
3. Name your secret appropriately (e.g., `moose/production/credentials`)
4. In your ECS task definition, reference the secret:
- For environment variables, select "ValueFrom" and enter the ARN of your secret with the key name
- Example: `arn:aws:secretsmanager:region:account:secret:moose/production/credentials:MOOSE_CLICKHOUSE_CONFIG__PASSWORD::`
## Building an ECS Service
Once you've completed creating an ECS Task, you're ready to create an ECS Service. An ECS Service is a definition that allows you to specify how your cluster will be managed.
Navigate to your cluster's Service page and press the `Create` button to create your new Moose service.
The section we're interested in is the `Deployment configuration` section. There, you'll specify the Task Definition you created earlier. You can also specify the name
of your service—perhaps something creative like `moose-service`—and the number of tasks to launch.
Note at this time, we recommend that you only launch a single instance of
Moose in your cluster. We're currently developing for multi-instance
concurrent usage.
The remaining sections on the create service page allow you to specify networking considerations and whether you'll use a load balancer.
You can press the `Create` button to launch an instance of your new ECS Moose service.
## Setting up health checks
Your generated Moose containers include a health check endpoint at `/health` that should be configured in your ECS service. We recommend configuring the following health check settings:
### Container-level Health Check
In your task definition's container configuration:
```
healthCheck:
command: ["CMD-SHELL", "curl -f http://localhost:4000/health || exit 1"]
interval: 30
timeout: 5
retries: 3
startPeriod: 60
```
### Load Balancer Health Check
If you're using an Application Load Balancer:
1. Create a target group for your service
2. Set the health check path to `/health`
3. Configure appropriate health check settings:
- Health check protocol: HTTP
- Health check port: 4000
- Health check path: /health
- Healthy threshold: 2
- Unhealthy threshold: 2
- Timeout: 5 seconds
- Interval: 15 seconds
- Success codes: 200
These health check configurations ensure that your Moose service is properly monitored and that traffic is only routed to healthy containers.
---
## Deploying on Kubernetes
Source: moose/deploying/deploying-on-kubernetes.mdx
Deploying on Kubernetes
# Deploying on Kubernetes
Moose applications can be deployed to Kubernetes clusters, whether it's your own on-prem
cluster or through a cloud service like Google's Kubernetes Engine (GKE) or Amazon's
Elastic Kubernetes Service (EKS).
Note at this time, we recommend that you only launch a single instance of
moose in one cluster. We're currently developing for multi-instance concurrent
usage.
Essentially you'll need to create a moose-deployment YAML file. Here is an example:
```yaml filename="moose-deployment.yaml-fragment" copy
apiVersion: apps/v1
kind: Deployment
metadata:
name: moosedeployment
spec:
replicas: 1
selector:
matchLabels:
app: moosedeploy
template:
metadata:
labels:
app: moosedeploy
spec:
containers:
- name: moosedeploy
image: 514labs/moose-df-deployment-x86_64-unknown-linux-gnu:latest
ports:
- containerPort: 4000
```
> Make sure to update the image key above with the location of your repository and image tag.
You may also need to configure a load balancer to route external traffic to your moose ingest points.
```yaml filename="moose-lb-service.yaml" copy
apiVersion: v1
kind: Service
metadata:
name: moose-service
spec:
selector:
app: moosedeploy
ports:
- protocol: TCP
port: 4000
targetPort: 4000
type: LoadBalancer
```
Another approach would be to use a service type of `ClusterIP`:
```yaml filename="moose-service.yaml" copy
apiVersion: v1
kind: Service
metadata:
name: moose-service
spec:
selector:
app: moosedeploy
type: ClusterIP
ports:
- protocol: TCP
port: 4000
targetPort: 4000
```
The approach you decide on will depend on your specific Kubernetes networking requirements.
## Setting up health checks and probes
Your generated Moose docker containers feature a health check endpoint at `/health` that can be used by Kubernetes to monitor the health of your application. Based on our production deployment, we recommend configuring the following probes:
```yaml
# Startup probe - gives Moose time to initialize before accepting traffic
startupProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 60
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 30
successThreshold: 3
# Readiness probe - determines when the pod is ready to receive traffic
readinessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 3
failureThreshold: 2
successThreshold: 5
# Liveness probe - restarts the pod if it becomes unresponsive
livenessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 5
successThreshold: 1
```
## Zero-downtime deployments with lifecycle hooks
For production deployments, we recommend configuring a preStop lifecycle hook to ensure graceful pod termination during updates:
```yaml
lifecycle:
preStop:
exec:
command: ["/bin/sleep", "60"]
```
This gives the pod time to finish processing in-flight requests before termination. You should also set an appropriate
`terminationGracePeriodSeconds` value (we recommend 70 seconds) to work with this hook.
## Resource requirements
Based on our production deployments, we recommend the following resource allocation for a standard Moose deployment:
```yaml
resources:
requests:
cpu: "1000m"
memory: "8Gi"
```
You can adjust these values based on your application's specific needs and workload.
## Configuring container environment variables
Inside your `moose-deployment.yaml` file, you will need to add an `env` section for environment variables.
The example below includes actual sample values for clarity. In production deployments, you should use Kubernetes secrets for sensitive information as shown in the second example.
Note that both Redpanda and Temporal are optional. If you're not using these components, you can omit their respective configuration sections.
### Example with hardcoded values (for development/testing only):
The example below includes actual sample values for clarity. In production deployments, you should use Kubernetes secrets for sensitive information as shown in the second example.
Note that both Redpanda and Temporal are optional. If you're not using these components, you can omit their respective configuration sections.
```yaml filename="moose-deployment-dev.yaml" copy
apiVersion: apps/v1
kind: Deployment
metadata:
name: moosedeployment
spec:
# For zero-downtime deployments
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
replicas: 1
selector:
matchLabels:
app: moosedeploy
template:
metadata:
labels:
app: moosedeploy
spec:
# For graceful shutdowns
terminationGracePeriodSeconds: 70
containers:
- name: moosedeploy
image: 514labs/moose-df-deployment-x86_64-unknown-linux-gnu:latest
ports:
- containerPort: 4000
# Lifecycle hook to delay pod shutdown
lifecycle:
preStop:
exec:
command: ["/bin/sleep", "60"]
# Startup probe
startupProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 60
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 30
successThreshold: 3
# Readiness probe
readinessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 3
failureThreshold: 2
successThreshold: 5
# Liveness probe
livenessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 5
successThreshold: 1
# Resource requirements
resources:
requests:
cpu: "1000m"
memory: "8Gi"
env:
# Logger configuration
- name: MOOSE_LOGGER__LEVEL
value: "Info"
- name: MOOSE_LOGGER__STDOUT
value: "true"
- name: MOOSE_LOGGER__FORMAT
value: "Json"
# Telemetry configuration
- name: MOOSE_TELEMETRY__ENABLED
value: "true"
- name: MOOSE_TELEMETRY__EXPORT_METRICS
value: "true"
# Debugging
- name: RUST_BACKTRACE
value: "1"
# HTTP server configuration
- name: MOOSE_HTTP_SERVER_CONFIG__HOST
value: "0.0.0.0"
- name: MOOSE_HTTP_SERVER_CONFIG__PORT
value: "4000"
# ClickHouse configuration
- name: MOOSE_CLICKHOUSE_CONFIG__DB_NAME
value: "moose_production"
- name: MOOSE_CLICKHOUSE_CONFIG__USER
value: "clickhouse_user"
- name: MOOSE_CLICKHOUSE_CONFIG__PASSWORD
value: "clickhouse_password_example"
- name: MOOSE_CLICKHOUSE_CONFIG__HOST
value: "your-clickhouse.cloud.example.com"
- name: MOOSE_CLICKHOUSE_CONFIG__HOST_PORT
value: "8443"
- name: MOOSE_CLICKHOUSE_CONFIG__USE_SSL
value: "1"
- name: MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT
value: "9440"
# Redis configuration
- name: MOOSE_REDIS_CONFIG__URL
value: "redis://redis_user:redis_password_example@redis.example.com:6379"
- name: MOOSE_REDIS_CONFIG__KEY_PREFIX
value: "moose_production"
# Redpanda configuration (Optional)
- name: MOOSE_REDPANDA_CONFIG__BROKER
value: "seed-5fbcae97.example.redpanda.com:9092"
- name: MOOSE_REDPANDA_CONFIG__NAMESPACE
value: "moose_production"
- name: MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS
value: "10043"
- name: MOOSE_REDPANDA_CONFIG__SASL_USERNAME
value: "redpanda_user"
- name: MOOSE_REDPANDA_CONFIG__SASL_PASSWORD
value: "redpanda_password_example"
- name: MOOSE_REDPANDA_CONFIG__SASL_MECHANISM
value: "SCRAM-SHA-256"
- name: MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL
value: "SASL_SSL"
- name: MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR
value: "3"
# Temporal configuration (Optional)
- name: MOOSE_TEMPORAL_CONFIG__CA_CERT
value: "/etc/ssl/certs/ca-certificates.crt"
- name: MOOSE_TEMPORAL_CONFIG__API_KEY
value: "temporal_api_key_example"
- name: MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST
value: "your-namespace.tmprl.cloud"
imagePullSecrets:
- name: moose-docker-repo-credentials
```
---
## moose/deploying/deploying-with-docker-compose
Source: moose/deploying/deploying-with-docker-compose.mdx
# Deploying with Docker Compose
Deploying a Moose application with all its dependencies can be challenging and time-consuming. You need to properly configure multiple services,
ensure they communicate with each other, and manage their lifecycle.
Docker Compose solves this problem by allowing you to deploy your entire stack with a single command.
This guide shows you how to set up a production-ready Moose environment on a single server using Docker Compose, with proper security,
monitoring, and maintenance practices.
This guide describes a single-server deployment. For high availability (HA) deployments, you'll need to:
- Deploy services across multiple servers
- Configure service replication and redundancy
- Set up load balancing
- Implement proper failover mechanisms
We are also offering an HA managed deployment option for Moose called [Boreal](https://fiveonefour.com/boreal).
## Prerequisites
Before you begin, you'll need:
- Ubuntu 24 or above (for this guide)
- Docker and Docker Compose (minimum version 2.23.1)
- Access to a server with at least 8GB RAM and 4 CPU cores
The Moose stack consists of:
- Your Moose Application
- [ClickHouse](https://clickhouse.com) (required)
- [Redis](https://redis.io) (required)
- [Redpanda](https://redpanda.com) (optional for event streaming)
- [Temporal](https://temporal.io) (optional for workflow orchestration)
## Setting Up a Production Server
### Installing Required Software
First, install Docker on your Ubuntu server:
```bash
# Update the apt package index
sudo apt-get update
# Install packages to allow apt to use a repository over HTTPS
sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
# Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# Set up the stable repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Update apt package index again
sudo apt-get update
# Install Docker Engine
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
```
Next, install Node.js or Python depending on your Moose application:
```bash
# For Node.js applications
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
# OR for Python applications
sudo apt-get install -y python3.12 python3-pip
```
### Configure Docker Log Size Limits
To prevent Docker logs from filling up your disk space, configure log rotation:
```bash
sudo mkdir -p /etc/docker
sudo vim /etc/docker/daemon.json
```
Add the following configuration:
```json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}
```
Restart Docker to apply the changes:
```bash
sudo systemctl restart docker
```
### Enable Docker Non-Root Access
To run Docker commands without sudo:
```bash
# Add your user to the docker group
sudo usermod -aG docker $USER
# Apply the changes (log out and back in, or run this)
newgrp docker
```
### Setting Up GitHub Actions Runner (Optional)
If you want to set up CI/CD automation, you can install a GitHub Actions runner:
1. Navigate to your GitHub repository
2. Go to Settings > Actions > Runners
3. Click "New self-hosted runner"
4. Select Linux and follow the instructions shown
To configure the runner as a service (to run automatically):
```bash
cd actions-runner
sudo ./svc.sh install
sudo ./svc.sh start
```
## Setting up a Foo Bar Moose Application (Optional)
If you already have a Moose application, you can skip this section.
You should copy the moose project to your server and then build the application with the `--docker` flag and get the built image
on the server.
### Install Moose CLI
```bash
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
### Create a new Moose Application
Please follow the initialization instructions for your language.
```bash
moose init test-ts typescript
cd test-ts
npm install
```
or
```bash
moose init test-py python
cd test-py
pip install -r requirements.txt
```
### Build the application on AMD64
```bash
moose build --docker --amd64
```
### Build the application on ARM64
```bash
moose build --docker --arm64
```
### Confirm the image was built
```bash
docker images
```
For more information on packaging Moose for deployment, see the full packaging guide.
## Preparing for Deployment
### Create Environment Configuration
First, create a file called `.env` in your project directory to specify component versions:
```bash
# Create and open the .env file
vim .env
```
Add the following content to the `.env` file:
```
# Version configuration for components
POSTGRESQL_VERSION=14.0
TEMPORAL_VERSION=1.22.0
TEMPORAL_UI_VERSION=2.20.0
REDIS_VERSION=7
CLICKHOUSE_VERSION=25.4
REDPANDA_VERSION=v24.3.13
REDPANDA_CONSOLE_VERSION=v3.1.0
```
Additionally, create a `.env.prod` file for your Moose application-specific secrets and configuration:
```bash
# Create and open the .env.prod file
vim .env.prod
```
Add your application-specific environment variables:
```
# Application-specific environment variables
APP_SECRET=your_app_secret
# Add other application variables here
```
## Deploying with Docker Compose
Create a file called `docker-compose.yml` in the same directory:
```bash
# Create and open the docker-compose.yml file
vim docker-compose.yml
```
Add the following content to the file:
```yaml file=./docker-compose.yml
name: moose-stack
volumes:
# Required volumes
clickhouse-0-data: null
clickhouse-0-logs: null
redis-0: null
# Optional volumes
redpanda-0: null
postgresql-data: null
configs:
temporal-config:
# Using the "content" property to inline the config
content: |
limit.maxIDLength:
- value: 255
constraints: {}
system.forceSearchAttributesCacheRefreshOnRead:
- value: true # Dev setup only. Please don't turn this on in production.
constraints: {}
services:
# REQUIRED SERVICES
# ClickHouse - Required analytics database
clickhouse-0:
container_name: clickhouse-0
restart: always
image: clickhouse/clickhouse-server:${CLICKHOUSE_VERSION}
volumes:
- clickhouse-0-data:/var/lib/clickhouse/
- clickhouse-0-logs:/var/log/clickhouse-server/
environment:
# Enable SQL-driven access control and user management
CLICKHOUSE_ALLOW_INTROSPECTION_FUNCTIONS: 1
# Default admin credentials
CLICKHOUSE_USER: admin
CLICKHOUSE_PASSWORD: adminpassword
# Disable default user
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
# Database setup
CLICKHOUSE_DB: moose
# Uncomment this if you want to access clickhouse from outside the docker network
# ports:
# - 8123:8123
# - 9000:9000
healthcheck:
test: wget --no-verbose --tries=1 --spider http://localhost:8123/ping || exit 1
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
ulimits:
nofile:
soft: 262144
hard: 262144
networks:
- moose-network
# Redis - Required for caching and pub/sub
redis-0:
restart: always
image: redis:${REDIS_VERSION}
volumes:
- redis-0:/data
command: redis-server --save 20 1 --loglevel warning
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
networks:
- moose-network
# OPTIONAL SERVICES
# --- BEGIN REDPANDA SERVICES (OPTIONAL) ---
# Remove this section if you don't need event streaming
redpanda-0:
restart: always
command:
- redpanda
- start
- --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
# Address the broker advertises to clients that connect to the Kafka API.
# Use the internal addresses to connect to the Redpanda brokers'
# from inside the same Docker network.
# Use the external addresses to connect to the Redpanda brokers'
# from outside the Docker network.
- --advertise-kafka-addr internal://redpanda-0:9092,external://localhost:19092
- --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082
# Address the broker advertises to clients that connect to the HTTP Proxy.
- --advertise-pandaproxy-addr internal://redpanda-0:8082,external://localhost:18082
- --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081
# Redpanda brokers use the RPC API to communicate with each other internally.
- --rpc-addr redpanda-0:33145
- --advertise-rpc-addr redpanda-0:33145
# Mode dev-container uses well-known configuration properties for development in containers.
- --mode dev-container
# Tells Seastar (the framework Redpanda uses under the hood) to use 1 core on the system.
- --smp 1
- --default-log-level=info
image: docker.redpanda.com/redpandadata/redpanda:${REDPANDA_VERSION}
container_name: redpanda-0
volumes:
- redpanda-0:/var/lib/redpanda/data
networks:
- moose-network
healthcheck:
test: ["CMD-SHELL", "rpk cluster health | grep -q 'Healthy:.*true'"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
# Optional Redpanda Console for visualizing the cluster
redpanda-console:
restart: always
container_name: redpanda-console
image: docker.redpanda.com/redpandadata/console:${REDPANDA_CONSOLE_VERSION}
entrypoint: /bin/sh
command: -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console'
environment:
CONFIG_FILEPATH: /tmp/config.yml
CONSOLE_CONFIG_FILE: |
kafka:
brokers: ["redpanda-0:9092"]
# Schema registry config moved outside of kafka section
schemaRegistry:
enabled: true
urls: ["http://redpanda-0:8081"]
redpanda:
adminApi:
enabled: true
urls: ["http://redpanda-0:9644"]
ports:
- 8080:8080
depends_on:
- redpanda-0
healthcheck:
test: ["CMD", "wget", "--spider", "--quiet", "http://localhost:8080/admin/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
networks:
- moose-network
# --- END REDPANDA SERVICES ---
# --- BEGIN TEMPORAL SERVICES (OPTIONAL) ---
# Remove this section if you don't need workflow orchestration
# Temporal PostgreSQL database
postgresql:
container_name: temporal-postgresql
environment:
POSTGRES_PASSWORD: temporal
POSTGRES_USER: temporal
image: postgres:${POSTGRESQL_VERSION}
restart: always
volumes:
- postgresql-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U temporal"]
interval: 10s
timeout: 5s
retries: 3
networks:
- moose-network
# Temporal server
# For initial setup, use temporalio/auto-setup image
# For production, switch to temporalio/server after first run
temporal:
container_name: temporal
depends_on:
postgresql:
condition: service_healthy
environment:
# Database configuration
- DB=postgres12
- DB_PORT=5432
- POSTGRES_USER=temporal
- POSTGRES_PWD=temporal
- POSTGRES_SEEDS=postgresql
# Namespace configuration
- DEFAULT_NAMESPACE=moose-workflows
- DEFAULT_NAMESPACE_RETENTION=72h
# Auto-setup options - set to false after initial setup
- AUTO_SETUP=true
- SKIP_SCHEMA_SETUP=false
# Service configuration - all services by default
# For high-scale deployments, run these as separate containers
# - SERVICES=history,matching,frontend,worker
# Logging and metrics
- LOG_LEVEL=info
# Addresses
- TEMPORAL_ADDRESS=temporal:7233
- DYNAMIC_CONFIG_FILE_PATH=/etc/temporal/config/dynamicconfig/development-sql.yaml
# For initial deployment, use the auto-setup image
image: temporalio/auto-setup:${TEMPORAL_VERSION}
# For production, after initial setup, switch to server image:
# image: temporalio/server:${TEMPORAL_VERSION}
restart: always
ports:
- 7233:7233
# Volume for dynamic configuration - essential for production
configs:
- source: temporal-config
target: /etc/temporal/config/dynamicconfig/development-sql.yaml
mode: 0444
networks:
- moose-network
healthcheck:
test: ["CMD", "tctl", "--ad", "temporal:7233", "cluster", "health", "|", "grep", "-q", "SERVING"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
# Temporal Admin Tools - useful for maintenance and debugging
temporal-admin-tools:
container_name: temporal-admin-tools
depends_on:
- temporal
environment:
- TEMPORAL_ADDRESS=temporal:7233
- TEMPORAL_CLI_ADDRESS=temporal:7233
image: temporalio/admin-tools:${TEMPORAL_VERSION}
restart: "no"
networks:
- moose-network
stdin_open: true
tty: true
# Temporal Web UI
temporal-ui:
container_name: temporal-ui
depends_on:
- temporal
environment:
- TEMPORAL_ADDRESS=temporal:7233
- TEMPORAL_CORS_ORIGINS=http://localhost:3000
image: temporalio/ui:${TEMPORAL_UI_VERSION}
restart: always
ports:
- 8081:8080
networks:
- moose-network
healthcheck:
test: ["CMD", "wget", "--spider", "--quiet", "http://localhost:8080/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
# --- END TEMPORAL SERVICES ---
# Your Moose application
moose:
image: moose-df-deployment-x86_64-unknown-linux-gnu:latest # Update with your image name
depends_on:
# Required dependencies
- clickhouse-0
- redis-0
# Optional dependencies - remove if not using
- redpanda-0
- temporal
restart: always
environment:
# Logging and debugging
RUST_BACKTRACE: "1"
MOOSE_LOGGER__LEVEL: "Info"
MOOSE_LOGGER__STDOUT: "true"
# Required services configuration
# ClickHouse configuration
MOOSE_CLICKHOUSE_CONFIG__DB_NAME: "moose"
MOOSE_CLICKHOUSE_CONFIG__USER: "moose"
MOOSE_CLICKHOUSE_CONFIG__PASSWORD: "your_moose_password"
MOOSE_CLICKHOUSE_CONFIG__HOST: "clickhouse-0"
MOOSE_CLICKHOUSE_CONFIG__HOST_PORT: "8123"
# Redis configuration
MOOSE_REDIS_CONFIG__URL: "redis://redis-0:6379"
MOOSE_REDIS_CONFIG__KEY_PREFIX: "moose"
# Optional services configuration
# Redpanda configuration (remove if not using Redpanda)
MOOSE_REDPANDA_CONFIG__BROKER: "redpanda-0:9092"
MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS: "1000"
MOOSE_REDPANDA_CONFIG__RETENTION_MS: "30000"
MOOSE_REDPANDA_CONFIG__NAMESPACE: "moose"
# Temporal configuration (remove if not using Temporal)
MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST: "temporal:7233"
MOOSE_TEMPORAL_CONFIG__NAMESPACE: "moose-workflows"
# HTTP Server configuration
MOOSE_HTTP_SERVER_CONFIG__HOST: 0.0.0.0
ports:
- 4000:4000
env_file:
- path: ./.env.prod
required: true
networks:
- moose-network
healthcheck:
test: ["CMD-SHELL", "curl -s http://localhost:4000/health | grep -q '\"unhealthy\": \\[\\]' && echo 'Healthy'"]
interval: 30s
timeout: 5s
retries: 10
start_period: 60s
# Define the network for all services
networks:
moose-network:
driver: bridge
```
At this point, don't start the services yet. First, we need to configure the individual services for production use as described in the following sections.
## Configuring Services for Production
### Configuring ClickHouse Securely (Required)
For production ClickHouse deployment, we'll use environment variables to configure users and access control
(as recommended in the [official Docker image documentation](https://hub.docker.com/r/clickhouse/clickhouse-server)):
1. First, start the ClickHouse container:
```bash
# Start just the ClickHouse container
docker compose up -d clickhouse-0
```
2. After ClickHouse has started, connect to create additional users:
```bash
# Connect to ClickHouse with the admin user
docker exec -it clickhouse-0 clickhouse-client --user admin --password adminpassword
# Create moose application user
CREATE USER moose IDENTIFIED BY 'your_moose_password';
GRANT ALL ON moose.* TO moose;
# Create read-only user for BI tools (optional)
CREATE USER power_bi IDENTIFIED BY 'your_powerbi_password' SETTINGS PROFILE 'readonly';
GRANT SHOW TABLES, SELECT ON moose.* TO power_bi;
```
3. To exit the ClickHouse client, type `\q` and press Enter.
4. Update your Moose environment variables to use the new moose user:
```bash
vim docker-compose.yml
```
```yaml
MOOSE_CLICKHOUSE_CONFIG__USER: "moose"
MOOSE_CLICKHOUSE_CONFIG__PASSWORD: "your_moose_password"
```
5. Remove the following environement variables from the clickhouse service in the docker-compose.yml file:
```yaml
MOOSE_CLICKHOUSE_CONFIG__USER: "admin"
MOOSE_CLICKHOUSE_CONFIG__PASSWORD: "adminpassword"
```
6. For additional security in production, consider using Docker secrets for passwords.
7. Restart the ClickHouse container to apply the changes:
```bash
docker compose restart clickhouse-0
```
8. Verify that the new configuration works by connecting with the newly created user:
```bash
# Connect with the new moose user
docker exec -it moose-stack-clickhouse-0-1 clickhouse-client --user moose --password your_moose_password
# Test access by listing tables
SHOW TABLES FROM moose;
# Exit the clickhouse client
\q
```
If you can connect successfully and run commands with the new user, your ClickHouse configuration is working properly.
### Securing Redpanda (Optional)
For production, it's recommended to restrict external access to Redpanda:
1. Modify your Docker Compose file to remove external access:
- Use only internal network access for production
- If needed, use a reverse proxy with authentication for external access
2. For this simple deployment, we'll keep Redpanda closed to the external world with no authentication required,
as it's only accessible from within the Docker network.
### Configuring Temporal (Optional)
If your Moose application uses Temporal for workflow orchestration, the configuration above includes all necessary services based on the
[official Temporal Docker Compose examples](https://github.com/temporalio/docker-compose).
If you're not using Temporal, simply remove the Temporal-related services (postgresql, temporal, temporal-ui)
and environment variables from the docker-compose.yml file.
#### Temporal Deployment Process: From Setup to Production
Deploying Temporal involves a two-phase process: initial setup followed by production operation. Here are step-by-step instructions for each phase:
##### Phase 1: Initial Setup
1. **Start the PostgreSQL database**:
```bash
docker compose up -d postgresql
```
2. **Wait for PostgreSQL to be healthy** (check the status):
```bash
docker compose ps postgresql
```
Look for `healthy` in the output before proceeding.
3. **Start Temporal with auto-setup**:
```bash
docker compose up -d temporal
```
During this phase, Temporal's auto-setup will:
- Create the necessary PostgreSQL databases
- Initialize the schema tables
- Register the default namespace (moose-workflows)
4. **Verify Temporal server is running**:
```bash
docker compose ps temporal
```
5. **Start the Admin Tools and UI**:
```bash
docker compose up -d temporal-admin-tools temporal-ui
```
6. **Create the namespace manually**:
```bash
# Register the moose-workflows namespace with a 3-day retention period
docker compose exec temporal-admin-tools tctl namespace register --retention 72h moose-workflows
```
Verify that the namespace was created:
```bash
# List all namespaces
docker compose exec temporal-admin-tools tctl namespace list
# Describe your namespace
docker compose exec temporal-admin-tools tctl namespace describe moose-workflows
```
You should see details about the namespace including its retention policy.
##### Phase 2: Transition to Production
After successful initialization, modify your configuration for production use:
1. **Stop Temporal services**:
```bash
docker compose stop temporal temporal-ui temporal-admin-tools
```
2. **Edit your docker-compose.yml file** to:
- Change image from `temporalio/auto-setup` to `temporalio/server`
- Set `SKIP_SCHEMA_SETUP=true`
Example change:
```yaml
# From:
image: temporalio/auto-setup:${TEMPORAL_VERSION}
# To:
image: temporalio/server:${TEMPORAL_VERSION}
# And change:
- AUTO_SETUP=true
- SKIP_SCHEMA_SETUP=false
# To:
- AUTO_SETUP=false
- SKIP_SCHEMA_SETUP=true
```
3. **Restart services with production settings**:
```bash
docker compose up -d temporal temporal-ui temporal-admin-tools
```
4. **Verify services are running with new configuration**:
```bash
docker compose ps
```
## Starting and Managing the Service
### Starting the Services
Start all services with Docker Compose:
```bash
docker compose up -d
```
### Setting Up Systemd Service for Docker Compose
For production, create a systemd service to ensure Docker Compose starts automatically on system boot:
1. Create a systemd service file:
```bash
sudo vim /etc/systemd/system/moose-stack.service
```
2. Add the following configuration (adjust paths as needed):
```
[Unit]
Description=Moose Stack
Requires=docker.service
After=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/path/to/your/compose/directory
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
```
3. Enable and start the service:
```bash
sudo systemctl enable moose-stack.service
sudo systemctl start moose-stack.service
```
## Deployment Workflow
You get a smooth deployment process with these options:
### Automated Deployment with CI/CD
1. Set up a CI/CD pipeline using GitHub Actions (if runner is configured)
2. When code is pushed to your repository:
- The GitHub Actions runner builds your Moose application
- Updates the Docker image
- Deploys using Docker Compose
### Manual Deployment
Alternatively, for manual deployment:
1. Copy the latest version of the code to the machine
2. Run `moose build`
3. Update the Docker image tag in your docker-compose.yml
4. Restart the stack with `docker compose up -d`
## Monitoring and Maintenance
No more worrying about unexpected outages or performance issues. Set up proper monitoring:
- Set up log monitoring with a tool like [Loki](https://grafana.com/oss/loki/)
- Regularly backup your volumes (especially ClickHouse data)
- Monitor disk space usage
- Set up alerting for service health
---
## Monitoring your Moose App
Source: moose/deploying/monitoring.mdx
This content has moved to the unified Observability page
> This page has moved. See the unified [/moose/metrics](/moose/metrics) page for observability across development and production.
---
## Packaging Moose for deployment
Source: moose/deploying/packaging-moose-for-deployment.mdx
Packaging Moose for deployment
# Packaging Moose for Deployment
Once you've developed your Moose application locally, you can package it for deployment to your on-prem or cloud infrastructure.
The first step is to navigate (`cd`) to your moose project in your terminal.
```txt filename="Terminal" copy
cd my-moose-project
```
The Moose CLI you've used to build your Moose project also has a handy flag that will automate the packaging and building of your project into docker images.
```txt filename="Terminal" copy
moose build --docker
```
After the above command completes you can view your newly created docker files by running the `docker images` command:
```txt filename="Terminal" copy
>docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
moose-df-deployment-aarch64-unknown-linux-gnu latest c50674c7a68a About a minute ago 155MB
moose-df-deployment-x86_64-unknown-linux-gnu latest e5b449d3dea3 About a minute ago 163MB
```
> Notice that you get two `moose-df-deployment` containers, one for the `aarch64` (ARM64) architecture and another for the `x86_64` architecture. This is necessary to allow you to choose the version that matches your cloud or on-prem machine architecture.
You can then use standard docker commands to push your new project images to your container repository of choice.
First tag your local images:
```txt filename="Terminal" copy
docker tag moose-df-deployment-aarch64-unknown-linux-gnu:latest {your-repo-user-name}/moose-df-deployment-aarch64-unknown-linux-gnu:latest
docker tag moose-df-deployment-x86_64-unknown-linux-gnu:latest {your-repo-user-name}/moose-df-deployment-x86_64-unknown-linux-gnu:latest
```
Then `push` your files to your container repository.
```txt filename="Terminal" copy
docker push {your-repo-user-name}/moose-df-deployment-aarch64-unknown-linux-gnu:latest
docker push {your-repo-user-name}/moose-df-deployment-x86_64-unknown-linux-gnu:latest
```
You can also use the following handy shell script to automate the steps above.
```bash filename="push.sh" copy
#!/bin/bash
version=$2
if [ -z "$1" ]
then
echo "You must specify the dockerhub repository as an argument. Example: ./push.sh container-repo-name"
echo "Note: you can also provide a second argument to supply a specific version tag - otherwise this script will use the same version as the latest moose-cli on Github."
exit 1
fi
if [ -z "$2" ]
then
output=$(npx @514labs/moose-cli -V)
version=$(echo "$output" | sed -n '2p' | awk '{print $2}')
fi
echo "Using version: $version"
arch="moose-df-deployment-aarch64-unknown-linux-gnu"
docker tag $arch:$version $1/$arch:$version
docker push $1/$arch:$version
arch="moose-df-deployment-x86_64-unknown-linux-gnu"
docker tag $arch:$version $1/$arch:$version
docker push $1/$arch:$version
```
---
## Preparing access to ClickHouse, Redis, Temporal and Redpanda
Source: moose/deploying/preparing-clickhouse-redpanda.mdx
Preparing access to ClickHouse, Redis, Temporal and Redpanda
# Preparing access to ClickHouse, Redis, Temporal and Redpanda
Your hosted Moose application requires access to hosted ClickHouse and Redis service instances. You can also optionally use Redpanda for event streaming.
You can stand up open source versions of these applications within your environments or opt to use cloud-hosted versions available at:
- [ClickHouse Cloud](https://clickhouse.com)
- [Redis Cloud](https://redis.com)
- [Redpanda Cloud](https://redpanda.com)
- [Temporal Cloud](https://temporal.io)
## ClickHouse Configuration
If you're using `state_config.storage = "clickhouse"` in your config (serverless mode without Redis), your ClickHouse instance must support the **KeeperMap** table engine. This is used for migration state storage and distributed locking.
✅ **ClickHouse Cloud**: Supported by default
✅ **`moose dev` / `moose prod`**: Already configured in our Docker setup
⚠️ **Self-hosted ClickHouse**: See [ClickHouse KeeperMap documentation](https://clickhouse.com/docs/en/engines/table-engines/special/keeper-map) for setup requirements
If you're using Redis for state storage (`state_config.storage = "redis"`), you don't need KeeperMap.
For ClickHouse, you'll need the following information:
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| DB_NAME | Database name to use | Your branch or application ID |
| USER | Username for authentication | - |
| PASSWORD | Password for authentication | - |
| HOST | Hostname or IP address | - |
| HOST_PORT | HTTPS port | 8443 |
| USE_SSL | Whether to use SSL (1 for true, 0 for false) | 1 |
| NATIVE_PORT | Native protocol port | 9440 |
These values are used to configure the Moose application's connection to ClickHouse through environment variables following this pattern:
```
MOOSE_CLICKHOUSE_CONFIG__=
```
For example:
```
MOOSE_CLICKHOUSE_CONFIG__DB_NAME=myappdb
MOOSE_CLICKHOUSE_CONFIG__HOST=myclickhouse.example.com
MOOSE_CLICKHOUSE_CONFIG__USE_SSL=1
MOOSE_CLICKHOUSE_CONFIG__HOST_PORT=8443
MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT=9440
```
## Redis Configuration
Moose requires Redis for caching and as a message broker. You'll need the following configuration:
| Parameter | Description |
|-----------|-------------|
| URL | Redis connection URL |
| KEY_PREFIX | Prefix for Redis keys to isolate namespaces |
These values are configured through:
```
MOOSE_REDIS_CONFIG__URL=redis://username:password@redis.example.com:6379
MOOSE_REDIS_CONFIG__KEY_PREFIX=myapp
```
## Temporal Configuration (Optional)
Temporal is an optional workflow orchestration platform that can be used with Moose. If you choose to use Temporal, you'll need the following configuration:
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| CA_CERT | Path to CA certificate | /etc/ssl/certs/ca-certificates.crt |
| API_KEY | Temporal Cloud API key | - |
| TEMPORAL_HOST | Temporal Cloud namespace host | Your namespace + .tmprl.cloud |
These values are configured through:
```
MOOSE_TEMPORAL_CONFIG__CA_CERT=/etc/ssl/certs/ca-certificates.crt
MOOSE_TEMPORAL_CONFIG__API_KEY=your-temporal-api-key
MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST=your-namespace.tmprl.cloud
```
## Redpanda Configuration (Optional)
Redpanda is an optional component that can be used for event streaming. If you choose to use Redpanda, you'll need the following information:
| Parameter | Description | Default Value |
|-----------|-------------|---------------|
| BROKER | Bootstrap server address | - |
| NAMESPACE | Namespace for isolation (often same as branch or app ID) | - |
| MESSAGE_TIMEOUT_MS | Message timeout in milliseconds | 10043 |
| SASL_USERNAME | SASL username for authentication | - |
| SASL_PASSWORD | SASL password for authentication | - |
| SASL_MECHANISM | SASL mechanism | SCRAM-SHA-256 |
| SECURITY_PROTOCOL | Security protocol | SASL_SSL |
| REPLICATION_FACTOR | Topic replication factor | 3 |
These values are used to configure the Moose application's connection to Redpanda through environment variables following this pattern:
```
MOOSE_REDPANDA_CONFIG__=
```
For example:
```
MOOSE_REDPANDA_CONFIG__BROKER=seed-5fbcae97.example.redpanda.com:9092
MOOSE_REDPANDA_CONFIG__NAMESPACE=myapp
MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL=SASL_SSL
MOOSE_REDPANDA_CONFIG__SASL_MECHANISM=SCRAM-SHA-256
MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR=3
```
## Using Environment Variables in Deployment
When deploying your Moose application, you'll need to pass these configurations as environment variables.
Refer to the deployment guides for your specific platform (Kubernetes, ECS, etc.) for details on how to securely
provide these values to your application.
---
## moose/getting-started/from-clickhouse
Source: moose/getting-started/from-clickhouse.mdx
# Use Moose with Your Existing ClickHouse
## What This Guide Does
This guide sets up a local ClickHouse development environment that mirrors your production database and enables code-first schema management:
1. **Introspect** your remote ClickHouse tables and generate TypeScript/Python data models
2. **Create** a local ClickHouse instance with your exact table schemas
3. **Seed** your local database with production data (optional)
4. **Build** APIs and pipelines on top of your ClickHouse data in code
## How It Works
**Local Development:**
- Your production ClickHouse remains untouched
- You get a local ClickHouse instance that copies your remote table schemas
- All development happens locally with hot-reload
**Production Deployment:**
- When you deploy your code, it connects to your remote ClickHouse
- Any new tables, materialized views, or schema changes you create in code are automatically migrated to your target database
- Your existing data and tables remain intact
## Prerequisites
## Step 1: Install Moose
Install the Moose CLI globally to your system:
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
After installation, you'll use `moose init` to create a new project that automatically connects to your ClickHouse and generates all the code you need.
## Step 2: Create Your Project
Use the ClickHouse Playground tab to try it out!
```bash filename="Initialize new project" copy
# Option 1: Provide connection string directly
moose init my-project --from-remote --language python
# Option 2: Run without connection string for interactive setup
moose init my-project --from-remote --language python
```
**Connection String Format:**
```
https://username:password@host:port/?database=database_name
```
If you don't provide a connection string, Moose will guide you through an interactive setup process where you'll be prompted to enter:
- **Host and port** (e.g., `https://your-service-id.region.clickhouse.cloud:8443`)
- **Username** (usually `default`)
- **Password** (your ClickHouse password)
- **Database name** (optional, defaults to `default`)
This is perfect if you're not sure about your connection details or prefer a guided experience.
Moose will create a complete project structure with:
- **Data models**: TypeScript/Python classes for every table in your ClickHouse
- **Type definitions**: Full type safety for all your data
- **Development environment**: Local ClickHouse instance that mirrors your production schema
- **Build tools**: Everything configured and ready to go
- Make sure you are using the `HTTPS` connection string, not the `HTTP` connection string.
- Make sure the port is correct. For `HTTPS` the default is `8443`
- The default username is `default`
See the section: Connect to your remote ClickHouse.
```bash filename="Initialize new project" copy
# Generate code models from your existing ClickHouse tables
moose init my-project --from-remote https://explorer:@play.clickhouse.com:443/?database=default --language python
```
```bash filename="Install dependencies" copy
cd my-project
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
You should see: `Successfully generated X models from ClickHouse tables`
### Explore Your Generated Models
Check what Moose created from your tables in the `app/main.py` file:
Your generated table models are imported here so Moose can detect them.
### If your database includes ClickPipes/PeerDB (CDC) tables
As noted above, when you use `moose init --from-remote`, Moose introspects your database. If it detects CDC‑managed tables (e.g., PeerDB/ClickPipes with fields like `_peerdb_synced_at`, `_peerdb_is_deleted`, `_peerdb_version`), it marks those as `EXTERNALLY_MANAGED` and writes them into a dedicated external models file. Your root file is updated to load these models automatically.
This separation is a best‑effort by the CLI to keep clearly CDC‑owned tables external. For other tables you don’t want Moose to manage, set the lifecycle to external and move them into the external file. See:
- [External Tables](/moose/olap/external-tables) documentation for more information on how external tables work.
- [DB Pull](/moose/olap/db-pull) for keeping models in sync with the remote schema.
## Step 3: Start Development
Start your development server. This spins up a local ClickHouse instance that perfectly mirrors your production schema:
```bash filename="Start your dev server" copy
moose dev
```
**What happens when you run `moose dev`:**
- 🏗️ Creates a local ClickHouse instance with your exact table schemas in your project code
- 🔄 Hot-reloads migrations to your local infrastructure as you save code changes
- 🚀 Starts a web server for building APIs
Your production ClickHouse remains completely untouched. This is a separate, local development environment.
```txt
Created docker compose file
⡗ Starting local infrastructure
Successfully started containers
Validated clickhousedb-1 docker container
Validated redpanda-1 docker container
Successfully validated red panda cluster
Validated temporal docker container
Successfully ran local infrastructure
Node Id: my-analytics-app::b15efaca-0c23-42b2-9b0c-642105f9c437
Starting development mode
Watching "/path/to/my-analytics-app/app"
Started Webserver.
Next Steps
💻 Run the moose 👉 `ls` 👈 command for a bird's eye view of your application and infrastructure
📥 Send Data to Moose
Your local development server is running at: http://localhost:4000/ingest
```
Don't see this output? [Check out the troubleshooting section](#troubleshooting)
### Seed Your Local Database (Optional)
Copy real data from your production ClickHouse to your local development environment. This gives you realistic data to work with during development.
**Why seed?** Your local database starts empty. Seeding copies real data so you can:
- Test with realistic data volumes
- Debug with actual production data patterns
- Develop features that work with real data structures
```bash filename="Terminal" copy
moose seed clickhouse --connection-string --limit 100000
```
**Connection String Format:**
The connection string must use ClickHouse native protocol:
```bash
# ClickHouse native protocol (secure connection)
clickhouse://username:password@host:9440/database
```
**Note:** Data transfer uses ClickHouse's native TCP protocol via `remoteSecure()`. The remote server must have the native TCP port accessible. The command automatically handles table mismatches gracefully.
```bash filename="Terminal" copy
moose seed clickhouse --connection-string clickhouse://explorer:@play.clickhouse.com:9440/default --limit 100000
```
```bash filename="Terminal" copy
# You can omit --connection-string by setting an env var
export MOOSE_SEED_CLICKHOUSE_URL='clickhouse://username:password@host:9440/database'
# copy a limited number of rows (batched under the hood)
moose seed clickhouse --limit 100000
```
- `--limit` and `--all` are mutually exclusive
- `--all` can be used to copy the entire table(s), use with caution as it can be very slow and computationally intensive.
- Large copies are automatically batched to avoid remote limits; you’ll see per-batch progress.
- If you stop with Ctrl+C, the current batch finishes and the command exits gracefully.
**Expected Output:**
```bash
✓ Database seeding completed
Seeded 'local_db' from 'remote_db'
✓ table1: copied from remote
⚠️ table2: skipped (not found on remote)
✓ table3: copied from remote
```
**Troubleshooting:**
- Tables that don't exist on remote are automatically skipped with warnings
- Use `--table ` to seed a specific table that exists in both databases
- Check `moose ls table` to see your local tables
## Step 4: Build Your First API
Now that you have your data models, let's build something useful! You can create APIs, materialized views, and applications with full type safety.
- **REST APIs** that expose your ClickHouse data to frontend applications
- **Materialized Views** for faster queries and aggregations
- **Streaming pipelines** for real-time data processing
- **Full-stack applications** with your ClickHouse data as the backend
### Add APIs
Build REST APIs on top of your existing tables to expose your data to your user-facing apps. This is a great way to get started with Moose without changing any of your existing pipelines.
Check out the MooseAPI module for more information on building APIs with Moose.
### Build Materialized Views
Build materialized views on top of your existing tables to improve query performance. If you have Materialized Views in your ClickHouse, you can use Moose to build new Materialized Views on top of your existing tables, or to migrate your existing Materialized Views to Moose.
Check out the MooseOLAP module for more information on building Materialized Views with Moose.
## Known Limitations
Some advanced ClickHouse features may not be fully supported yet. Join the Moose Slack and let us know if you have any issues, feedback, or requests.
**What we're working on:**
- **Selective table import** (currently imports all tables)
- **Default value annotations**
## Troubleshooting
### Error: Failed to connect to ClickHouse
This guide shows exactly where to find your host, port, username, and password, and how to construct a valid HTTPS connection string.
1. Log into your [ClickHouse Cloud console](https://clickhouse.cloud/)
2. Open your service details page
3. Click "Connect" in the sidebar
4. Select the `HTTPS` tab and copy the values shown
- **Host**: e.g. `your-service-id.region.clickhouse.cloud`
- **Port**: usually `8443`
- **Username**: usually `default`
- **Password**: the password you configured
5. Build your connection string:
```txt
https://USERNAME:PASSWORD@HOST:PORT/?database=DATABASE_NAME
```
6. Example (with placeholders):
```txt
https://default:your_password@your-service-id.region.clickhouse.cloud:8443/?database=default
```
7. Optional: Test with curl
```bash
curl --user "USERNAME:PASSWORD" --data-binary "SELECT 1" https://HOST:PORT
```
### Self-hosted or Docker
- Check your server config (usually `/etc/clickhouse-server/config.xml`)
- `` default: `8123`
- `` default: `8443`
- Check users in `/etc/clickhouse-server/users.xml` or `users.d/`
- For Docker, check environment variables in your compose/run config:
- `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, `CLICKHOUSE_DB`
Build the HTTPS connection string with your values:
```txt
https://USERNAME:PASSWORD@HOST:8443/?database=DB
```
If you only have HTTP enabled, enable HTTPS or use an HTTPS proxy; Moose init expects an HTTPS URL for remote introspection.
### `moose dev` fails to start
Double check Docker is running and you do not have any port conflicts.
- ClickHouse local runs on port `18123`
- Your local webserver runs on port `4000`
- Your local management API runs on port `5001`
## What's Next?
---
## 5-Minute Quickstart
Source: moose/getting-started/quickstart.mdx
Build your first analytical backend with Moose in 5 minutes
# 5-Minute Quickstart
## Prerequisites
Check that your pre-requisites are installed by running the following commands:
```bash filename="Terminal" copy
python --version
```
```bash filename="Terminal" copy
docker ps
```
Make sure Docker Desktop has at least **2.5GB of memory allocated**. To check or change this setting, open Docker Desktop, go to Settings → Resources → Memory, and adjust the slider if needed. [Learn more about Docker Desktop settings →](https://docs.docker.com/desktop/settings/)
Skip the tutorial and add Moose as a layer on top of your existing database
## Step 1: Install Moose (30 seconds)
### Run the installation script
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
You should see this message: `Moose vX.X.X installed successfully!` (note that X.X.X is the actual version number)
If you see an error instead, check [Troubleshooting](#need-help) below.
### Reload your shell configuration
**This step is required.** Your current terminal doesn't know about the `moose` command yet.
```bash filename="Terminal" copy
source ~/.zshrc
```
If `echo $SHELL` showed `/bin/bash` or `/usr/bin/bash`:
```bash filename="Terminal" copy
source ~/.bashrc
```
### Verify moose command works
```bash filename="Terminal" copy
moose --version
```
You should see:
```txt
moose X.X.X
```
**Try these steps in order:**
1. Re-run the correct `source` command for your shell
2. Close this terminal completely and open a new terminal window
3. Run `moose --version` again
4. If still failing, see [Troubleshooting](#need-help)
You should see the moose version number. Do not proceed to Step 2 until `moose --version` works.
## Step 2: Create Your Project (1 minute)
### Initialize your project
```bash filename="Terminal" copy
moose init my-analytics-app python
```
You should see output like:
```txt
✓ Created my-analytics-app
✓ Initialized Python project
```
### Navigate to your project directory
```bash filename="Terminal" copy
cd my-analytics-app
```
A virtual environment isolates your project's dependencies. We recommend creating one for your project.
**Create a virtual environment (Recommended)**
```bash filename="Terminal" copy
python3 -m venv .venv
```
**activate your virtual environment(Recommended)**
```bash filename="Terminal" copy
source .venv/bin/activate
```
This creates a `.venv` folder and activates it. Your terminal prompt should now look something like this:
```txt
(.venv) username@computer my-analytics-app %
```
### Install dependencies
```bash filename="Terminal" copy
pip install -r requirements.txt
```
**Wait for installation to complete.** You should see successful installation messages ending with:
```txt
Successfully installed [list of packages]
```
You should see `(.venv)` in your prompt and dependencies installed with no errors.
### Start your development environment
```bash filename="Terminal" copy
moose dev
```
Moose is:
- Downloading Docker images for ClickHouse, Redpanda, and Temporal
- Starting containers
- Initializing databases
- Starting the development server
Do not proceed until you see the "Started Webserver" message.
```txt
Created docker compose file
⡗ Starting local infrastructure
Successfully started containers
Validated clickhousedb-1 docker container
Validated redpanda-1 docker container
Successfully validated red panda cluster
Validated temporal docker container
Successfully ran local infrastructure
Node Id: my-analytics-app::b15efaca-0c23-42b2-9b0c-642105f9c437
Starting development mode
Watching "/path/to/my-analytics-app/app"
Started Webserver. 👈 WAIT FOR THIS
Next Steps
💻 Run the moose 👉 `ls` 👈 command for a bird's eye view of your application and infrastructure
📥 Send Data to Moose
Your local development server is running at: http://localhost:4000/ingest
```
Keep this terminal running. This is your Moose development server. You'll open a new terminal for the next step.
## Step 3: Understand Your Project (1 minute)
Your project includes a complete example pipeline:
**Important:** While your pipeline objects are defined in the child folders, they **must be imported** into the root `main.py` file for the Moose CLI to discover and use them.
```python filename="app/main.py"
from app.ingest.models import * # Data models & pipelines
from app.ingest.transform import * # Transformation logic
from app.apis.bar import * # API endpoints
from app.views.bar_aggregated import * # Materialized views
from app.workflows.generator import * # Background workflows
```
## Step 4: Test Your Pipeline (2 minutes)
**Keep your `moose dev` terminal running.** You need a second terminal for the next commands.
**macOS Terminal:**
- Press `Cmd+N` for a new window, or
- Right-click Terminal icon in dock → New Window
**VSCode:**
- Click the `+` button in the terminal panel, or
- Press `Ctrl+Shift+` ` (backtick)
**Linux Terminal:**
- Press `Ctrl+Shift+N`, or
- Use your terminal's File → New Window menu
### Navigate to your project in the new terminal
In your **new terminal window** (not the one running `moose dev`):
```bash filename="Terminal 2 (New Window)" copy
cd my-analytics-app
```
If not automatically activated, activate the virtual environment:
```bash filename="Terminal 2 (New Window)" copy
source .venv/bin/activate
```
### Run the data generator workflow
Your project comes with a pre-built [Workflow](../workflows) called `generator` that acts as a **data simulator**:
```bash filename="Terminal 2 (New Window)" copy
moose workflow run generator
```
You should see:
```txt
Workflow 'generator' triggered successfully
```
- Generates 1000 fake records with realistic data (using the Faker library)
- Sends each record to your ingestion API via HTTP POST
- Runs as a background task managed by Temporal
- Helps you test your entire pipeline without needing real data
You can see the code in the `/workflows/generator.py` file.
### Watch for data processing logs
**Switch to your first terminal** (where `moose dev` is running). You should see new logs streaming:
```txt
POST ingest/Foo
[POST] Data received at ingest API sink for Foo
Received Foo_0_0 -> Bar_0_0 1 message(s)
[DB] 17 row(s) successfully written to DB table (Bar)
```
These logs show your pipeline working: Workflow generates data → Ingestion API receives it → Data transforms → Writes to ClickHouse
**If you don't see logs after 30 seconds:**
- Verify `moose dev` is still running in Terminal 1
- Check Terminal 2 for error messages from the workflow command
- Run `docker ps` to verify containers are running
The workflow runs in the background, powered by [Temporal](https://temporal.io). You can see workflow status at `http://localhost:8080`.
```bash filename="Terminal" copy
moose peek Bar --limit 5 # This queries your Clickhouse database to show raw data; useful for debugging / verification
```
You should see output like:
```txt
┌─primaryKey─────────────────────────┬─utcTimestamp────────┬─hasText─┬─textLength─┐
│ 123e4567-e89b-12d3-a456-426614174000 │ 2024-01-15 10:30:00 │ 1 │ 42 │
│ 987fcdeb-51a2-43d1-b789-123456789abc │ 2024-01-15 10:31:00 │ 0 │ 0 │
└────────────────────────────────────┴─────────────────────┴─────────┴────────────┘
```
If you see 0 rows, wait a few seconds for the workflow to process data, then try again.
### Query your data
Your application has a pre-built [API](../apis) that reads from your database. The API runs on `localhost:4000`.
**In Terminal 2**, call the API with `curl`:
```bash filename="Terminal 2 (New Window)" copy
curl "http://localhost:4000/api/bar"
```
You should see JSON data like:
```json
[
{
"dayOfMonth": 15,
"totalRows": 67,
"rowsWithText": 34,
"maxTextLength": 142,
"totalTextLength": 2847
},
{
"dayOfMonth": 14,
"totalRows": 43,
"rowsWithText": 21,
"maxTextLength": 98,
"totalTextLength": 1923
}
]
```
You should see JSON data with analytics results. Your complete data pipeline is working!
**Try query parameters:**
```bash filename="Terminal 2 - Add filters and limits" copy
curl "http://localhost:4000/api/bar?limit=5&orderBy=totalRows"
```
- **Port 4000**: Your Moose application webserver (all APIs are running on this port)
- **Port 8080**: Temporal UI dashboard (workflow management)
- **Port 18123**: ClickHouse HTTP interface (direct database access)
**If the workflow command doesn't work:**
- Make sure you're in the project directory (`cd my-analytics-app`)
- Verify `moose dev` is still running in your first terminal
- Check that Docker containers are running: `docker ps`
**If curl returns an error:**
- Verify the URL is `http://localhost:4000` (not 8080)
- Make sure the workflow has had time to generate data (wait 30-60 seconds)
- Check your `moose dev` terminal for error messages
**If you get HTML instead of JSON:**
- You might be hitting the wrong port - use 4000, not 8080
- Port 8080 serves the Temporal UI (workflow dashboard), not your API
**If `moose peek Bar` shows 0 rows:**
- Wait for the workflow to complete (it processes 1000 records)
- Check the workflow is running: look for "Ingested X records..." messages
- Verify no errors in your `moose dev` terminal logs
**If you see connection refused:**
- Restart `moose dev` and wait for "Started Webserver" message
- Check if another process is using port 4000: `lsof -i :4000`
1. Install the [OpenAPI (Swagger) Viewer extension](https://marketplace.cursorapi.com/items?itemName=42Crunch.vscode-openapi) in your IDE
2. Open `.moose/openapi.yaml` in your IDE
3. Click the "Preview" icon to launch the interactive API explorer
4. Test the `POST /ingest/Foo` and `GET /api/bar` endpoints
## Step 5: Hot Reload Schema Changes (1 minute)
1. Open `app/ingest/models.py`
2. Add a new field to your data model:
```python filename="app/ingest/models.py" {16} copy
from moose_lib import Key, StringToEnumMixin
from typing import Optional, Annotated
from enum import IntEnum, auto
from pydantic import BaseModel
class Baz(StringToEnumMixin, IntEnum):
QUX = auto()
QUUX = auto()
class Bar(BaseModel):
primary_key: Key[str]
utc_timestamp: datetime
baz: Baz
has_text: bool
text_length: int
new_field: Optional[str] = None # New field
```
3. Save the file and watch your terminal
**Switch to Terminal 1** (where `moose dev` is running). You should see Moose automatically update your infrastructure:
```txt
⠋ Processing Infrastructure changes from file watcher
~ Table Bar:
Column changes:
+ new_field: String
```
You should see the column change logged. Your API, database schema, and streaming topic all updated automatically!
**Try it yourself:** Add another field with a different data type and watch the infrastructure update in real-time.
## Recap
You've built a complete analytical backend with:
## Need Help?
**Docker not running:**
```bash filename="Terminal" copy
# macOS
open -a Docker
# Linux
sudo systemctl start docker
# Verify Docker is running
docker ps
```
**Docker out of space:**
```bash filename="Terminal" copy
docker system prune -a
```
**Python version too old:**
```bash filename="Terminal" copy
# Check version
python3 --version
# Install Python 3.12+ with pyenv
curl https://pyenv.run | bash
pyenv install 3.12
pyenv local 3.12
```
**Port 4000 already in use:**
```bash filename="Terminal" copy
# Find what's using port 4000
lsof -i :4000
# Kill the process (replace PID)
kill -9
# Or use a different port
moose dev --port 4001
```
**Permission denied:**
```bash filename="Terminal" copy
# Fix Docker permissions (Linux)
sudo usermod -aG docker $USER
newgrp docker
# Fix file permissions
chmod +x ~/.moose/bin/moose
```
**Port 4000 already in use:**
```bash filename="Terminal" copy
# Find what's using port 4000
lsof -i :4000
# Kill the process (replace PID)
kill -9
# Or use a different port
moose dev --port 4001
```
**Permission denied:**
```bash filename="Terminal" copy
# Fix Docker permissions (Linux)
sudo usermod -aG docker $USER
newgrp docker
# Fix file permissions
chmod +x ~/.moose/bin/moose
```
**Still stuck?** Join our [Slack community](https://join.slack.com/t/moose-community/shared_invite/zt-2fjh5n3wz-cnOmM9Xe9DYAgQrNu8xKxg) or [open an issue](https://github.com/514-labs/moose/issues).
---
## Minimum Requirements
Source: moose/help/minimum-requirements.mdx
Minimum Requirements for Moose
## Development Setup
The development setup has higher requirements because Moose runs locally along with all its dependencies (Redpanda, ClickHouse, Temporal, Redis).
- **CPU:** 12 cores
- **Memory:** 18GB
- **Disk:** >500GB SSD
- **OS:**
- Windows with Linux subsystem (Ubuntu preferred)
- Linux (Debian 10+, Ubuntu 18.10+, Fedora 29+, CentOS/RHEL 8+)
- Mac OS 13+
- **Runtime:** Python 3.12+ or Node.js 20+, Docker 24.0.0+, and Docker Compose 2.23.1+
## Production Setup
The production setup has lower requirements, as external components (Redpanda, ClickHouse, Redis, and Temporal) are assumed to be deployed separately.
- **CPU:** 1vCPU
- **Memory:** 6GB
- **Disk:** >30GB SSD
- **OS:**
- Windows with Linux subsystem (Ubuntu preferred)
- Linux (Debian 10+, Ubuntu 18.10+, Fedora 29+, CentOS/RHEL 8+)
- Mac OS 13+
- **Runtime:** Python 3.12+ or Node.js 20+
---
## Troubleshooting
Source: moose/help/troubleshooting.mdx
Troubleshooting for Moose
# Troubleshooting
Common issues and their solutions when working with Moose.
## Development Environment
### Issue: `moose dev` fails to start
**Possible causes and solutions:**
1. **Port conflicts**
- Check if ports 4000-4002 are already in use
- Solution: Kill the conflicting processes or configure different ports
```bash
# Find processes using ports
lsof -i :4000-4002
# Kill process by PID
kill
2. **Missing dependencies**
- Solution: Ensure all dependencies are installed
```bash
pip install .
```
3. **Docker not running**
- Solution: Start Docker Desktop or Docker daemon
```bash
# Check Docker status
docker info
# Start Docker on Linux
sudo systemctl start docker
```
## Data Ingestion
### Issue: Data not appearing in tables
1. **Validation errors**
- Check logs for validation failures
- Solution: Ensure data matches schema
```bash filename="Terminal" copy
moose logs
```
2. **Stream processing errors**
- Solution: Check transform functions for errors
```bash filename="Terminal" copy
moose logs --filter functions
```
3. **Database connectivity**
- Solution: Verify database credentials in `.moose/config.toml`
```toml filename=".moose/config.toml" copy
[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000
```
## Stream Processing
### Issue: High processing latency
1. **Insufficient parallelism**
- Solution: Increase stream parallelism
```python
from moose_lib import Stream, StreamConfig
stream = Stream[Data]("high_volume", StreamConfig(parallelism=8) )
```
### Issue: Data transformations not working
1. **Transform function errors**
- Solution: Debug transformation logic
```python
# Add logging to transform
def transform(record: Data) -> Data:
print(f"Processing record: {record.id}")
try:
# Your transformation logic
return transformed_record
except Exception as e:
print(f"Transform error: {e}")
return None # Skip record on error
```
## Database Issues
### Issue: Slow queries
1. **Missing or improper indexes**
- Solution: Check orderByFields configuration
```typescript
const table = new OlapTable("slow_table", {
orderByFields: ["frequently_queried_field", "timestamp"]
});
```
2. **Large result sets**
- Solution: Add limits and pagination
```python
# In query API
results = client.query.execute(
# not an f-string, the values are provided in the dict
"""
SELECT * FROM large_table
WHERE category = {category}
LIMIT {limit}
""", {"category": "example", "limit": 100}
)
```
## Deployment Issues
### Issue: Deployment fails
1. **Configuration errors**
- Solution: Check deployment configuration
```bash
# Validate configuration
moose validate --config
```
2. **Resource limitations**
- Solution: Increase resource allocation
```yaml
# In kubernetes manifest
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
```
3. **Permission issues**
- Solution: Verify service account permissions
```bash
# Check permissions
moose auth check
```
### Issue: Migration stuck with "Migration already in progress"
**Cause:** A previous migration was interrupted without releasing its lock.
**Solution:**
1. **Wait 5 minutes** - locks expire automatically
2. **Or manually clear the lock:**
```sql
DELETE FROM _MOOSE_STATE WHERE key = 'migration_lock';
```
3. **Verify it worked:**
```sql
SELECT * FROM _MOOSE_STATE WHERE key = 'migration_lock';
-- Should return no rows
```
The `_MOOSE_STATE` table uses ClickHouse's KeeperMap engine for distributed locking, ensuring only one migration runs at a time across multiple deployments.
## Getting Help
If you can't resolve an issue:
1. Ask for help on the [Moose community Slack channel](https://join.slack.com/t/moose-community/shared_invite/zt-2fjh5n3wz-cnOmM9Xe9DYAgQrNu8xKxg)
2. Search existing [GitHub issues](https://github.com/514-labs/moose/issues)
3. Open a new issue with:
- Moose version (`moose --version`)
- Error messages and logs
- Steps to reproduce
- Expected vs. actual behavior
---
## moose/in-your-stack
Source: moose/in-your-stack.mdx
# Moose In Your Dev Stack
Moose handles the analytical layer of your application stack. The [Area Code](https://github.com/514-labs/area-code) repository contains two working implementations that show how to integrate Moose with existing applications.
## User Facing Analytics (UFA)
UFA shows how to add a dedicated analytics microservice to an existing application without impacting your primary database.
View the open source repository to see the full implementation and clone it on your own machine.
### Data Flow
1. Application writes to Supabase (transactional backend)
2. Supabase Realtime streams changes to Analytical Backend and Retrieval Backend
3. Moose ingest pipeline syncs change events from Redpanda into ClickHouse
4. Frontend queries analytics APIs for dashboards
### Architecture Components
The UFA template demonstrates a microservices architecture with specialized components for different data access patterns:
The user interface for dashboards and application interactions
Technologies: [Vite](https://vite.dev), [React](https://react.dev), [TanStack Query](https://tanstack.com/query), [TanStack Router](https://tanstack.com/router), [Tailwind CSS](https://tailwindcss.com)
Handles CRUD operations and maintains application state
Technologies: [Supabase](https://supabase.com), [Fastify](https://fastify.dev), [Drizzle ORM](https://orm.drizzle.team/)
Fast text search and complex queries across large datasets
Technologies: [Elasticsearch](https://www.elastic.co/) + [Fastify](https://fastify.dev)
High-performance analytical queries and aggregations
Technologies: [ClickHouse](https://clickhouse.com/) + [Moose OLAP](/moose/olap), [Redpanda](https://redpanda.com/) + [Moose Streaming](/moose/streaming), [Moose APIs](/moose/apis)
Keep data synchronized between transactional, retrieval, and analytics systems
Technologies: [Supabase Realtime](https://supabase.com/docs/guides/realtime), [Temporal](https://temporal.io/) + [Moose Workflows](/moose/workflows)
## Operational Data Warehouse (ODW)
ODW shows how to build a centralized data platform that ingests from multiple sources for business intelligence and reporting.
View the open source repository to see the full implementation and clone it on your own machine.
### Data Flow
1. Sources send data to Moose ingestion endpoints
2. Streaming functions validate and transform data
3. Data lands in ClickHouse tables
4. BI tools query via generated APIs or direct SQL
### Architecture Components
Handles incoming data from push-based sources (webhooks, application logs) with validation and transformation
Technologies: [Moose APIs](/moose/apis), [Redpanda](https://redpanda.com/) + [Moose Streaming](/moose/streaming)
Connects to your existing databases, object storage, or third-party APIs
Technologies: [Temporal](https://temporal.io/) + [Moose Workflows](/moose/workflows)
Centralized analytical database for raw and transformed data
Technologies: [ClickHouse](https://clickhouse.com/) + [Moose OLAP](/moose/olap)
Query interface for business intelligence and reporting
Technologies: [Streamlit](https://streamlit.io/) dashboards, [Moose APIs](/moose/apis), [ClickHouse Connect](https://clickhouse.com/docs/en/interfaces/http/connect)
---
## Overview
Source: moose/index.mdx
Modular toolkit for building real-time analytical backends
# MooseStack
Type-safe, code-first tooling for building real-time analytical backends--OLAP Databases, Data Streaming, ETL Workflows, Query APIs, and more.
## Get Started
```bash filename="Install Moose" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
## Everything as Code
Declare all infrastructure (e.g. ClickHouse tables, Redpanda streams, APIs, etc.) and pipelines in pure TypeScript or Python. Your code auto-wires everything together, so no integration boilerplate needed.
```ts filename="Complete Analytical Backend in 1 TS file" copy
interface DataModel {
primaryKey: Key;
name: string;
}
// Create a ClickHouse table
);
// Create an ingest API endpoint
);
// Create analytics API endpoint
interface QueryParams {
limit?: number;
}
: QueryParams, {client, sql}) => {
const result = await client.query.execute(sql`SELECT * FROM ${clickhouseTable} LIMIT ${limit}`);
return await result.json();
}
);
```
```python filename="Complete Analytical Backend in 1 Python file" copy
from moose_lib import Key, OlapTable, Stream, StreamConfig, IngestApi, IngestApiConfig, Api
from pydantic import BaseModel
class DataModel(BaseModel):
primary_key: Key[str]
name: str
# Create a ClickHouse table
clickhouse_table = OlapTable[DataModel]("TableName")
# Create a Redpanda streaming topic
redpanda_topic = Stream[DataModel]("TopicName", StreamConfig(
destination=clickhouse_table,
))
# Create an ingest API endpoint
ingest_api = IngestApi[DataModel]("post-api-route", IngestConfig(
destination=redpanda_topic,
))
# Create a analytics API endpoint
class QueryParams(BaseModel):
limit: int = 10
def handler(client, params: QueryParams):
return client.query.execute("SELECT * FROM {table: Identifier} LIMIT {limit: Int32}", {
"table": clickhouse_table.name,
"limit": params.limit,
})
analytics_api = Api[RequestParams, DataModel]("get-api-route", query_function=handler)
```
## Core Concepts
```ts
interface Event {
id: Key;
name: string;
createdAt: Date;
}
interface AggregatedEvent {
count: number;
name: string;
}
```
```bash
# Start local dev server
moose dev
⡏ Starting local infrastructure
Successfully started containers
Validated clickhousedb-1 docker container
Validated redpanda-1 docker container
Successfully validated red panda cluster
Validated temporal docker container
Successfully ran local infrastructure
```
## Modules
```ts
const table = new OlapTable("events");
const mv = new MaterializedView({
selectStatement: sql`
SELECT count(*) as count, name
FROM ${table}
GROUP BY name
`,
selectTables: [table],
tableName: "events",
materializedViewName: "aggregated_events"
});
```
```ts
const stream = new Stream("events", {
destination: table,
});
stream.addConsumer((event) => {
console.log(event);
});
```
```ts
const etl = new Workflow("my_etl", {
startingTask: startEtl,
schedule: "@every 1h",
retries: 3,
});
```
```ts
const postEvent = new IngestApi("post-event", {
destination: stream,
});
const getEvents = new Api("get-events", {
async handler({limit = 10}, {client, sql}) {
// query database and return results
return await client.query.execute(sql`
SELECT * FROM events LIMIT ${limit}
`);
}
});
```
Each module is independent and can be used on its own. You can start with one capability and incrementally adopt more over time.
## Tooling
```bash
# Build for production
moose build
```
```bash
# Create a plan
moose plan
# Example plan output:
~ Table events with column changes: [
Added(
Column {
name: "status",
data_type: String,
required: true,
unique: false,
primary_key: false,
default: None
})]
and order by changes: OrderByChange {
before: [], after: []
}
```
## Technology Partners
- [ClickHouse](https://clickhouse.com/) (Online Analytical Processing (OLAP) Database)
- [Redpanda](https://redpanda.com/) (Streaming)
- [Temporal](https://temporal.io/) (Workflow Orchestration)
- [Redis](https://redis.io/) (Internal State Management)
Want this managed in production for you? Check out Boreal Cloud (from the makers of the Moose Stack).
---
## LLM-Optimized Documentation
Source: moose/llm-docs.mdx
Language-scoped documentation feeds for AI assistants
# LLM-Optimized Documentation
Moose now publishes lightweight documentation bundles so AI assistants can reason about your project without scraping the entire site. Each docs page includes **LLM View** links for TypeScript and Python, and the CLI exposes HTTP endpoints that deliver pre-compiled reference text.
## Quick links
- TypeScript bundle: `/llm-ts.txt`
- Python bundle: `/llm-py.txt`
- Scoped bundle: append `?path=relative/docs/section` to either endpoint to fetch a specific subsection
You can open these URLs in a browser, pipe them into tooling, or share them with agents such as Claude, Cursor, and Windsurf.
```bash filename="Terminal"
# Fetch the TypeScript bundle for the OLAP docs from the hosted site
curl "https://docs.fiveonefour.com/llm-ts.txt?path=moose/olap/model-table"
```
For project-specific knowledge, combine these static bundles with live context from the [MCP server](/moose/mcp-dev-server).
---
## Development Mode
Source: moose/local-dev.mdx
Local development environment with hot reload and automatic infrastructure management
# Setting Up Your Development Environment
Development mode (`moose dev`) provides a full-featured local environment optimized for rapid iteration and debugging. It automatically manages Docker containers, provides hot reload capabilities, and includes enhanced debugging features.
## Getting Started
```bash
# Start development environment
moose dev
# View your running infrastructure
moose ls
```
## Container Management
Development mode automatically manages Docker containers for your infrastructure:
- **ClickHouse** (when `olap` feature is enabled)
- **Redpanda** (when `streaming_engine` feature is enabled)
- **Temporal** (when `workflows` feature is enabled)
- **Analytics APIs Server** (when `apis` feature is enabled)
- **Redis** (always enabled)
- **MCP Server** (always enabled) - Enables AI-assisted development. [Learn more](/moose/mcp-dev-server)
### Container Configuration
Control which containers start with feature flags:
```toml copy
# moose.config.toml
[features]
olap = true # Enables ClickHouse
streaming_engine = true # Enables Redpanda
workflows = false # Controls Temporal startup
apis = true # Enables Analytics APIs server
```
### Extending Docker Infrastructure
You can extend Moose's Docker Compose configuration with custom services by creating a `docker-compose.dev.override.yaml` file in your project root. This allows you to add additional infrastructure (databases, monitoring tools, etc.) that runs alongside your Moose development environment.
**Do not use docker-compose.dev.override.yaml to modify Moose-managed services** (ClickHouse, Redpanda, Redis, Temporal). The Docker Compose merge behavior makes it difficult to override existing configuration correctly, often leading to conflicts.
Instead, use `moose.config.toml` to configure Moose infrastructure. See [Configuration](/moose/configuration) for all available options including database connections, ports, volumes, and service-specific settings.
Use the override file **only for adding new services** that complement your Moose environment (e.g., PostgreSQL for application data, monitoring tools).
**How it works:**
When you run `moose dev`, Moose automatically detects and merges your override file with the generated Docker Compose configuration. The files are merged using Docker Compose's [standard merge behavior](https://docs.docker.com/compose/how-tos/multiple-compose-files/merge/).
**Example: Adding PostgreSQL for Application Data**
Create a `docker-compose.dev.override.yaml` file in your project root:
```yaml copy filename="docker-compose.dev.override.yaml"
services:
postgres:
image: postgres:16
environment:
POSTGRES_USER: myapp
POSTGRES_PASSWORD: mypassword
POSTGRES_DB: myapp_db
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
volumes:
postgres-data:
```
Now when you run `moose dev`, PostgreSQL will start alongside your other infrastructure. You'll see a message confirming the override file is being used:
```
[moose] Using docker-compose.dev.override.yaml for custom infrastructure
```
**Recommended Use Cases:**
- **Add databases**: PostgreSQL, MySQL, MongoDB for application data
- **Add monitoring**: Grafana, Prometheus for metrics visualization
- **Add custom services**: Additional message queues, caching layers, or development tools
**Not Recommended:**
- Modifying Moose-managed services (ClickHouse, Redpanda, Redis, Temporal)
- Overriding ports, volumes, or environment variables for Moose infrastructure
- Attempting to change database credentials or connection settings
For any Moose infrastructure configuration, use `moose.config.toml` instead. See [Configuration](/moose/configuration).
**Example: Adding Grafana for Monitoring**
```yaml copy filename="docker-compose.dev.override.yaml"
services:
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana-data:/var/lib/grafana
volumes:
grafana-data:
```
When merging files, Docker Compose follows these rules:
- **Services**: Merged by name with values from the override file taking precedence
- **Environment variables**: Appended (both files' values are used)
- **Volumes**: Appended
- **Ports**: Appended (use `!override` tag to replace instead of merge)
See [Docker's merge documentation](https://docs.docker.com/reference/compose-file/merge/) for complete details.
The override file is only used in development mode (`moose dev`). For production deployments, configure your infrastructure separately using your deployment platform's tools.
## Hot Reloading Development
The development runtime includes a file watcher that provides near-instantaneous feedback when you save code changes.
### Watched Files
The file watcher recursively monitors your entire `app/` directory structure and only rebuilds the components that actually changed.
Only the root file in your `app/` directory is run when changes are detected. In order for your tables/streams/apis/workflows to be detected, you must import them in your root file (`main.py`). If you change a file in your app directory and it is a dependency of your root file, then those changes WILL be detected.
## Quick Example
**❌ Doesn't work - No export from root:**
```py file="app/tables/users.py"
from schemas.user import UserSchema
users_table = OlapTable[UserSchema]("Users")
# Moose can't see this - not imported in main.py
```
**✅ Works - Import in main file:**
```py file="app/tables/users.py" {3}
from schemas.user import UserSchema
users_table = OlapTable[UserSchema]("Users") # No export needed - Python modules are automatically discovered
```
```py file="app/main.py"
from tables.users import users_table # Moose sees this
```
Now because we imported the table in the main file, Moose will detect the change and rebuild the table.
**✅ Works - Change dependency:**
```ts file="app/schemas/user.ts" {5}
export interface UserSchema {
id: string;
name: string;
email: string;
age: number; // Adding this triggers migration
}
```
*Moose detects this because `UserSchema` is imported in the root file via the dependency chain.*
Learn more about how Moose handles migrations.
## Script Execution Hooks
You can configure your dev server to run your own shell commands automatically during development. Use these hooks to keep generated artifacts in sync (e.g., refreshing external models, regenerating OpenAPI SDKs).
### Available hooks
- `on_first_start_script`: runs once when the dev server first starts in this process
- `on_reload_complete_script`: runs after each dev server reload when code/infra changes have been fully applied
Configure these in `moose.config.toml` under the `http_server_config` section:
```toml copy
# moose.config.toml
[http_server_config]
# One-time on first start
on_first_start_script = "echo 'dev started'"
# After every code/infra reload completes
on_reload_complete_script = "echo 'reload complete'"
```
Notes:
- Scripts run from your project root using your `$SHELL` (falls back to `/bin/sh`).
- Use `&&` to chain multiple commands or point to a custom script.
- Prefer passing credentials via environment variables or your secret manager.
### Use case: keep external models in sync (DB Pull)
Refresh `EXTERNALLY_MANAGED` table models from a remote ClickHouse on dev start so your local code matches the live schema.
```bash filename="Terminal" copy
export REMOTE_CLICKHOUSE_URL="https://username:password@host:8443/?database=default"
```
```toml copy
# moose.config.toml
[http_server_config]
on_first_start_script = "moose db pull --connection-string $REMOTE_CLICKHOUSE_URL"
```
See the full guide: [/moose/olap/db-pull](/moose/olap/db-pull)
### Use case: regenerate OpenAPI SDKs on reload
Automatically regenerate client SDKs after Moose finishes applying code/infra changes so `.moose/openapi.yaml` is fresh.
```toml copy
# moose.config.toml
[http_server_config]
on_first_start_script = "command -v openapi-generator-cli >/dev/null 2>&1 || npm i -g @openapitools/openapi-generator-cli"
on_reload_complete_script = "openapi-generator-cli generate -i .moose/openapi.yaml -g typescript-fetch -o ./generated/ts"
```
More examples: [/moose/apis/openapi-sdk](/moose/apis/openapi-sdk)
## Local Infrastructure
### Port Allocation
Development mode uses the following default ports:
- **4000**: Main API server
- **5001**: Management API (health checks, metrics, admin, OpenAPI docs)
### Service URLs
Access your development services at:
```bash
# Main application
http://localhost:4000
# Management interface
curl http://localhost:5001/metrics
# OpenAPI documentation
http://localhost:5001/openapi.yaml
```
### Container Networking
All containers run in an isolated Docker network with automatic service discovery:
- Containers communicate using service names
- Port mapping only for external access
- Automatic DNS resolution between services
### MCP Server for AI-Assisted Development
Development mode includes a built-in Model Context Protocol (MCP) server that lets AI assistants interact with your local infrastructure through natural language.
**What you can do:**
- Query your ClickHouse database with natural language
- Inspect streaming topics and messages
- Search and filter development logs
- Explore your infrastructure map
**Quick setup:**
The MCP server runs automatically at `http://localhost:4000/mcp`. For Claude Code, just run:
```bash copy
claude mcp add --transport http moose-dev http://localhost:4000/mcp
```
For other AI clients (Windsurf, VS Code, Cursor, Claude Desktop), see the [full setup guide](/moose/mcp-dev-server).
**Example prompts:**
- *"What errors are in the logs?"*
- *"What tables exist in my project?"*
- *"Show me the schema of all tables"*
- *"Sample 5 messages from the Foo stream"*
See the complete guide for all available tools, detailed configuration for each AI client, and example workflows.
## Troubleshooting
### Common Issues
**Container Startup Failures**
```bash
# Check Docker is running
docker info
# View container logs
moose logs
```
**Port Conflicts**
```bash
# Check what's using your ports
lsof -i :4000
lsof -i :5001
# Use custom ports
export MOOSE_HTTP_PORT=4040
export MOOSE_MANAGEMENT_PORT=5010
moose dev
```
---
## MCP Server
Source: moose/mcp-dev-server.mdx
Built-in Model Context Protocol server for AI-assisted development
# MCP Server for AI-Assisted Development
The Moose development server includes a built-in Model Context Protocol (MCP) server that enables AI agents and IDEs to interact directly with your local development infrastructure. This allows you to use natural language to query data, inspect logs, explore infrastructure, and debug your Moose project.
## What is MCP?
[Model Context Protocol (MCP)](https://modelcontextprotocol.io) is an open protocol that standardizes how AI assistants communicate with development tools and services. Moose's MCP server exposes your local development environment—including ClickHouse, Redpanda, logs, and infrastructure state—through a set of tools that AI agents can use.
## Quick Start
The MCP server runs automatically when you start development mode:
```bash
moose dev
```
The MCP server is available at: `http://localhost:4000/mcp`
The MCP server is enabled by default. To disable it, use `moose dev --mcp=false`.
## Configure Your AI Client
Connect your AI assistant to the Moose MCP server. Most clients now support native HTTP transport for easier setup.
**Setup**: Use the Claude Code CLI (easiest method)
```bash copy
claude mcp add --transport http moose-dev http://localhost:4000/mcp
```
That's it! Claude Code will automatically connect to your Moose dev server.
**Scope**: This command adds the MCP server to Claude Code's project configuration, making it available to your project when using Claude Code. Other AI clients (Cursor, Windsurf, etc.) require separate configuration - see the tabs below.
Make sure `moose dev` is running before adding the server. The CLI will verify the connection.
**Alternative**: Manual configuration at `~/.claude/config.json`
```json filename="config.json" copy
{
"mcpServers": {
"moose-dev": {
"transport": "http",
"url": "http://localhost:4000/mcp"
}
}
}
```
**Location**: `~/.codeium/windsurf/mcp_config.json`
Windsurf supports native Streamable HTTP transport:
```json filename="mcp_config.json" copy
{
"mcpServers": {
"moose-dev": {
"serverUrl": "http://localhost:4000/mcp"
}
}
}
```
**Prerequisites**:
- VS Code 1.102+ (built-in MCP support)
- Or install the [Cline extension](https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev)
**Option 1: Native HTTP Support (VS Code 1.102+)**
Add to `.vscode/settings.json` or User Settings:
```json filename=".vscode/settings.json" copy
{
"mcp.servers": {
"moose-dev": {
"transport": "http",
"url": "http://localhost:4000/mcp"
}
}
}
```
**Option 2: Cline Extension**
Configure in Cline's MCP settings:
```json copy
{
"moose-dev": {
"transport": "sse",
"url": "http://localhost:4000/mcp"
}
}
```
**Location**: `.cursor/mcp.json` (project-level) or `~/.cursor/settings/mcp.json` (global)
Cursor currently uses stdio transport. Use `mcp-remote` to bridge to HTTP servers:
```json filename=".cursor/mcp.json" copy
{
"mcpServers": {
"moose-dev": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"http://localhost:4000/mcp"
]
}
}
}
```
**Location**: `~/Library/Application Support/Claude/claude_desktop_config.json`
Access via: Claude > Settings > Developer > Edit Config
```json filename="claude_desktop_config.json" copy
{
"servers": {
"moose-dev": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"http://localhost:4000/mcp"
]
}
}
}
```
The `-y` flag automatically installs `mcp-remote` if not already installed.
Make sure `moose dev` is running before using the MCP tools. The AI client will connect to `http://localhost:4000/mcp`.
## Available Tools
The Moose MCP server provides five tools for interacting with your local development environment:
### `get_logs`
Retrieve and filter Moose development server logs for debugging and monitoring.
**What you can ask for:**
- Filter by log level (ERROR, WARN, INFO, DEBUG, TRACE)
- Limit the number of log lines returned
- Search for specific text patterns in logs
**Example prompts:**
*"Show me the last 10 ERROR logs"*
```
Showing 10 most recent log entries from /Users/user/.moose/2025-10-10-cli.log
Filters applied:
- Level: ERROR
[2025-10-10T17:44:42Z ERROR] Foo -> Bar (worker 1): Unsupported SASL mechanism: undefined
[2025-10-10T17:44:43Z ERROR] FooDeadLetterQueue (consumer) (worker 1): Unsupported SASL mechanism
[2025-10-10T17:51:48Z ERROR] server error on API server (port 4000): connection closed
...
```
*"What WARN level logs do I have?"*
```
Showing 6 most recent log entries
Filters applied:
- Level: WARN
[2025-10-10T16:45:04Z WARN] HTTP client not configured - missing API_KEY
[2025-10-10T16:50:05Z WARN] HTTP client not configured - missing API_KEY
...
```
**Tip**: Combine filters for better results. For example: "Show me ERROR logs with 'ClickHouse' in them" combines level filtering with search.
**Use cases:**
- Debugging application errors
- Monitoring infrastructure health
- Tracking data processing issues
- Finding specific events or patterns
---
### `get_infra_map`
Retrieve and explore the infrastructure map showing all components in your Moose project.
**What you can ask for:**
- List specific component types (tables, topics, API endpoints, workflows, etc.)
- Get a complete overview of all infrastructure
- Search for components by name
- See detailed configuration or just a summary
**Example prompts:**
*"What tables exist in my project?"*
```
# Moose Infrastructure Map (Summary)
## Tables (28)
- MergeTreeTest
- ReplacingMergeTreeVersion
- Bar
- BasicTypes
- UserEvents_1_0
- UserEvents_2_0
- FooDeadLetter
- BarAggregated
- FooWorkflow
...
```
*"Give me an overview of my Moose infrastructure"*
```
# Moose Infrastructure Map (Summary)
## Topics (11)
- Bar, BasicTypes, Foo, FooDeadLetterQueue, SimpleArrays...
## API Endpoints (11)
- INGRESS_Foo (INGRESS -> topic: Foo)
- INGRESS_BasicTypes (INGRESS -> topic: BasicTypes)
- EGRESS_bar (EGRESS (4 params))
...
## Tables (28)
- MixedComplexTypes, Bar, UserEvents_1_0...
## Topic-to-Table Sync Processes (10)
- Bar_Bar, BasicTypes_BasicTypes...
## Function Processes (3)
- Foo__Bar_Foo_Bar, Foo_Foo...
```
*"Find all components with 'User' in the name"*
```
## Tables (2)
- UserEvents_1_0
- UserEvents_2_0
```
**Tip**: Search is case-sensitive by default. Use capital letters to match your component names, or ask the AI to search case-insensitively.
**Use cases:**
- Understanding project structure
- Discovering available components
- Debugging infrastructure issues
- Documenting your data pipeline
---
### `query_olap`
Execute read-only SQL queries against your local ClickHouse database.
**What you can ask for:**
- Query table data with filters, sorting, and aggregations
- Inspect table schemas and column information
- Count rows and calculate statistics
- List all tables in your database
- Results in table or JSON format
**Example prompts:**
*"What columns are in the UserEvents_1_0 table?"*
```
Query executed successfully. Rows returned: 4
| name | type | default_type | default_expression | comment | ...
|-----------|-------------------|--------------|-------------------|---------|
| userId | String | | | |
| eventType | String | | | |
| timestamp | Float64 | | | |
| metadata | Nullable(String) | | | |
```
*"List all tables and their engines"*
```
Query executed successfully. Rows returned: 29
| name | engine |
|-----------------------------|------------------------------|
| Bar | MergeTree |
| BasicTypes | MergeTree |
| UserEvents_1_0 | MergeTree |
| UserEvents_2_0 | ReplacingMergeTree |
| ReplicatedMergeTreeTest | ReplicatedMergeTree |
| BarAggregated_MV | MaterializedView |
...
```
*"Count the number of rows in Bar"*
```
Query executed successfully. Rows returned: 1
| total_rows |
|------------|
| 0 |
```
**Tip**: Ask the AI to discover table names first using "What tables exist in my project?" before querying them. Table names are case-sensitive in ClickHouse.
**Use cases:**
- Exploring data during development
- Validating data transformations
- Checking table schemas
- Debugging SQL queries
- Analyzing data patterns
**Safety:**
Only read-only operations are permitted (SELECT, SHOW, DESCRIBE, EXPLAIN). Write operations (INSERT, UPDATE, DELETE) and DDL statements (CREATE, ALTER, DROP) are blocked.
---
### `get_stream_sample`
Sample recent messages from Kafka/Redpanda streaming topics.
**What you can ask for:**
- View recent messages from any stream/topic
- Specify how many messages to sample
- Get results in JSON or pretty-printed format
- Inspect message structure and content
**Example prompts:**
*"Sample 5 messages from the Bar topic"*
```json
{
"stream_name": "Bar",
"message_count": 5,
"partition_count": 1,
"messages": [
{
"primaryKey": "e90c93be-d28b-47d6-b783-5725655c044f",
"utcTimestamp": "+057480-11-24T20:39:59.000Z",
"hasText": true,
"textLength": 107
},
{
"primaryKey": "b974f830-f28a-4a95-b61c-f65bfc607795",
"utcTimestamp": "+057370-11-04T17:11:51.000Z",
"hasText": true,
"textLength": 166
},
...
]
}
```
*"What data is flowing through the BasicTypes stream?"* (pretty format)
```markdown
# Stream Sample: BasicTypes
Retrieved 3 message(s) from 1 partition(s)
## Message 1
{
"id": "bt-001",
"timestamp": "2024-10-09T12:00:00Z",
"stringField": "hello world",
"numberField": 42,
"booleanField": true
}
## Message 2
{
"id": "bt-002",
"timestamp": "2024-10-09T12:05:00Z",
"stringField": "test",
"numberField": 100,
"booleanField": false
}
...
```
**Tip**: Use "List all streaming topics" first to discover available streams in your project.
**Use cases:**
- Debugging data flow issues
- Validating streaming transformations
- Inspecting message formats
- Troubleshooting pipeline failures
---
### `get_source`
Look up the file path where a Moose component (table, stream, API, workflow, etc.) is defined. The tool searches the infrastructure map and returns the relative path plus snippet metadata.
**What you can ask for:**
- "Where is the `UserActivity` table defined?"
- "Open the TypeScript file for the `orderEventPipeline` stream."
- "Show me the FastAPI app behind the `analytics_api` web route."
**Example prompt and response:**
```
You: "Find the source for the OrderItem stream"
AI:
Component: stream/OrderItem (TypeScript)
File: src/ingest/order-items.ts
Snippet:
);
```
**Use cases:**
- Jump from CLI output or infra map entries to the actual code
- Let AI assistants open the correct file before editing
- Generate documentation references that include file paths
The tool only reads paths within your workspace and never executes code. Pair it with `get_infra_map` for quick “find-then-open” workflows.
---
## Example Workflows
### Debugging a Data Pipeline
```
You: "My users table is empty, can you help me debug?"
AI: Let me check the infrastructure and data flow.
[Uses get_infra_map to see the pipeline]
[Uses get_logs to check for errors]
[Uses query_olap to verify table is empty]
[Uses get_stream_sample to check if data is in the source stream]
AI: I found the issue - there are errors in the logs showing the streaming
function is failing. The stream has data, but it's not reaching the table.
```
### Exploring a New Project
```
You: "Can you give me an overview of this Moose project?"
AI: [Uses get_infra_map with format=summary]
AI: This project has 3 tables (users, events, aggregated_metrics),
2 streaming topics, 4 API endpoints, and 1 workflow. The data flow
appears to be: raw events → transformation → aggregated metrics.
```
### Validating Data Transformations
```
You: "Did my streaming function correctly transform the user data?"
AI: [Uses get_stream_sample on source stream]
[Uses query_olap on destination table]
AI: Yes, the transformation is working. I compared the source stream
messages with the destination table records, and the enrichment
fields are being added correctly.
```
## Advanced Configuration
### Custom Port
If you're running Moose on a non-default port, update the MCP configuration:
```bash
export MOOSE_HTTP_PORT=8080
moose dev
```
Then update your MCP client configuration to use port 8080 instead of 4000.
### Disabling the MCP Server
To run development mode without the MCP server:
```bash
moose dev --mcp=false
```
### Production Considerations
The MCP server is designed for local development only. It provides direct access to your infrastructure and should **never** be exposed in production environments.
The MCP server:
- Runs only in development mode (`moose dev`)
- Does not run in production mode (`moose prod`)
- Provides read-only access to sensitive infrastructure
- Should not be exposed over networks or proxied externally
## LLM-Optimized Documentation Feeds
Before handing control to an AI assistant, prime it with a compact doc bundle so it understands Moose primitives and terminology. We publish TypeScript and Python versions at `/llm-ts.txt` and `/llm-py.txt`, with optional `?path=` filters for specific sections.
See [LLM-optimized docs](/moose/llm-docs) for instructions on embedding these feeds into Claude, Cursor, Windsurf, or MCP clients alongside the live tools described above.
## Troubleshooting
### MCP Tools Not Appearing
1. Verify `moose dev` is running: `curl http://localhost:4000/mcp`
2. Check your AI client's MCP configuration is correct
3. Restart your AI client after updating configuration
4. Check the Moose logs for MCP-related errors: `moose logs --filter mcp`
### Connection Errors
If your AI client can't connect to the MCP server:
```bash
# Check if the dev server is running
curl http://localhost:4000/health
# Check MCP endpoint specifically
curl -X POST http://localhost:4000/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize"}'
```
### Empty Results
If tools return no data:
- Verify your dev server has been running long enough to generate data
- Check that infrastructure has been created: `moose ls`
- Try ingesting test data: `moose peek `
## Related Documentation
- [Local Development](/moose/local-dev) - Development mode overview
- [Moose CLI Reference](/moose/moose-cli) - CLI commands and flags
- [Model Context Protocol](https://modelcontextprotocol.io) - MCP specification
---
## Observability
Source: moose/metrics.mdx
Unified observability for Moose across development and production—metrics console, health checks, Prometheus, OpenTelemetry, logging, and error tracking
# Observability
This page consolidates Moose observability for both local development and production environments.
## Local Development
### Metrics Console
Moose provides a console to view live metrics from your Moose application. To launch the console, run:
```bash filename="Terminal" copy
moose metrics
```
Use the arrow keys to move up and down rows in the endpoint table and press Enter to view more details about that endpoint.
#### Endpoint Metrics
Aggregated metrics for all endpoints:
| Metric | Description |
| :-------------------- | :---------------------------------------------------------------------------------- |
| `AVERAGE LATENCY` | Average time in milliseconds it takes for a request to be processed by the endpoint |
| `TOTAL # OF REQUESTS` | Total number of requests made to the endpoint |
| `REQUESTS PER SECOND` | Average number of requests made per second to the endpoint |
| `DATA IN` | Average number of bytes of data sent to all `/ingest` endpoints per second |
| `DATA OUT` | Average number of bytes of data sent to all `/api` endpoints per second |
Individual endpoint metrics:
| Metric | Description |
| :---------------------------- | :---------------------------------------------------------------------------------- |
| `LATENCY` | Average time in milliseconds it takes for a request to be processed by the endpoint |
| `# OF REQUESTS RECEIVED` | Total number of requests made to the endpoint |
| `# OF MESSAGES SENT TO KAFKA` | Total number of messages sent to the Kafka topic |
#### Stream → Table Sync Metrics
| Metric | Description |
| :---------- | :-------------------------------------------------------------------------------------------------- |
| `MSG READ` | Total number of messages sent from `/ingest` API endpoint to the Kafka topic |
| `LAG` | The number of messages that have been sent to the consumer but not yet received |
| `MSG/SEC` | Average number of messages sent from `/ingest` API endpoint to the Kafka topic per second |
| `BYTES/SEC` | Average number of bytes of data received by the ClickHouse consumer from the Kafka topic per second |
#### Streaming Transformation Metrics
For each streaming transformation:
| Metric | Description |
| :------------ | :---------------------------------------------------------------------------- |
| `MSG IN` | Total number of messages passed into the streaming function |
| `MSG IN/SEC` | Average number of messages passed into the streaming function per second |
| `MSG OUT` | Total number of messages returned by the streaming function |
| `MSG OUT/SEC` | Average number of messages returned by the streaming function per second |
| `BYTES/SEC` | Average number of bytes of data returned by the streaming function per second |
---
## Production
### Health Monitoring
Moose applications expose a health check endpoint at `/health` that returns a 200 OK response when the application is operational. This endpoint is used by container orchestration systems like Kubernetes to determine the health of your application.
In production environments, we recommend configuring three types of probes:
1. Startup Probe: Gives Moose time to initialize before receiving traffic
2. Readiness Probe: Determines when the application is ready to receive traffic
3. Liveness Probe: Detects when the application is in a deadlocked state and needs to be restarted
Learn more about how to configure health checks in your Kubernetes deployment.
### Prometheus Metrics
Moose applications expose metrics in Prometheus format at the `/metrics` endpoint. These metrics include:
- HTTP request latency histograms for each endpoint
- Request counts and error rates
- System metrics for the Moose process
Example metrics output:
```
# HELP latency Latency of HTTP requests.
# TYPE latency histogram
latency_sum{method="POST",path="ingest/UserActivity"} 0.025
latency_count{method="POST",path="ingest/UserActivity"} 2
latency_bucket{le="0.001",method="POST",path="ingest/UserActivity"} 0
latency_bucket{le="0.01",method="POST",path="ingest/UserActivity"} 0
latency_bucket{le="0.02",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="0.05",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="0.1",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="0.25",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="0.5",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="1.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="5.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="10.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="30.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="60.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="120.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="240.0",method="POST",path="ingest/UserActivity"} 1
latency_bucket{le="+Inf",method="POST",path="ingest/UserActivity"} 1
```
You can scrape these metrics using a Prometheus server or any compatible monitoring system.
### OpenTelemetry Integration
In production deployments, Moose can export telemetry data using OpenTelemetry. Enable via environment variables:
```
MOOSE_TELEMETRY__ENABLED=true
MOOSE_TELEMETRY__EXPORT_METRICS=true
```
When running in Kubernetes with an OpenTelemetry operator, you can configure automatic sidecar injection by adding annotations to your deployment:
```yaml
metadata:
annotations:
"sidecar.opentelemetry.io/inject": "true"
```
### Logging
Configure structured logging via environment variables:
```
MOOSE_LOGGER__LEVEL=Info
MOOSE_LOGGER__STDOUT=true
MOOSE_LOGGER__FORMAT=Json
```
The JSON format is ideal for log aggregation systems (ELK Stack, Graylog, Loki, or cloud logging solutions).
### Production Monitoring Stack
Recommended components:
1. Metrics Collection: Prometheus or cloud-native monitoring services
2. Log Aggregation: ELK Stack, Loki, or cloud logging solutions
3. Distributed Tracing: Jaeger or other OpenTelemetry-compatible backends
4. Alerting: Alertmanager or cloud provider alerting
### Error Tracking
Integrate with systems like Sentry via environment variables:
```
SENTRY_DSN=https://your-sentry-dsn
RUST_BACKTRACE=1
```
Want this managed in production for you? Check out Boreal Cloud (from the makers of the Moose Stack).
## Feedback
Join our Slack community to share feedback and get help with Moose.
---
## Migrations & Planning
Source: moose/migrate.mdx
How Moose handles infrastructure migrations and planning
# Moose Migrate
Moose's migration system works like version control for your infrastructure. It automatically detects changes in your code and applies them to your data infrastructure with confidence.
Moose tracks changes across:
- OLAP Tables and Materialized Views
- Streaming Topics
- API Endpoints
- Workflows
## How It Works
Moose collects all objects defined in your main file (main.py) and automatically generates infrastructure operations to match your code:
```python file="app/main.py"
from pydantic import BaseModel
from moose_lib import OlapTable, Stream
class UserSchema(BaseModel):
id: str
name: str
email: str
users_table = OlapTable[UserSchema]("Users")
user_events = Stream[UserSchema]("Users")
```
When you add these objects, Moose automatically creates:
- A ClickHouse table named `Users` with the `UserSchema`
- A Redpanda topic named `Users` with the `UserSchema`
## Development Workflow
When running your code in development mode, Moose will automatically hot-reload migrations to your local infrastructure as you save code changes.
### Quick Start
Start your development environment:
```bash filename="Terminal" copy
moose dev
```
This automatically:
1. Recursively watches your `/app` directory for code changes
2. Parses objects defined in your main file
3. Compares the new objects with the current infrastructure state Moose stores internally
4. Generates and applies migrations in real-time based on the differences
5. Provides immediate feedback on any errors or warnings
6. Updates the internal state of your infrastructure to reflect the new state
### Example: Adding a New Table
```python file="app/main.py" {6} copy
# Before
users_table = OlapTable[UserSchema]("Users")
# After (add analytics table)
users_table = OlapTable[UserSchema]("Users")
analytics_table = OlapTable[AnalyticsSchema]("Analytics")
```
**What happens:**
- Moose detects the new `analyticsTable` object
- Compares: "No Analytics table exists"
- Generates migration: "Create Analytics table"
- Applies migration automatically
- Updates internal state
In your terminal, you will see a log that shows the new table being created:
```bash
⠋ Processing Infrastructure changes from file watcher
+ Table: Analytics Version None - id: String, number: Int64, status: String - - deduplicate: false
```
### Example: Schema Changes
```python file="app/main.py" {8} copy
from moose_lib import Key
# After (add age field)
class UserSchema(BaseModel):
id: Key[str]
name: str
email: str
age: int # New field
```
**What happens:**
- Moose detects the new `age` field
- Generates migration: "Add age column to Users table"
- Applies migration
- Existing rows get NULL/default values
## Production Workflow
Moose supports two deployment patterns: **Moose Server** and **Serverless**.
### Moose Server Deployments
For deployments with a running Moose server, preview changes before applying:
```bash filename="Terminal" copy
moose plan --url https://your-production-instance --token
```
Remote planning requires authentication:
1. Generate a token: `moose generate hash-token`
2. Configure your server:
```toml filename="moose.config.toml" copy
[authentication]
admin_api_key = "your-hashed-token"
```
3. Use the token with `--token` flag
**Deployment Flow:**
1. **Develop locally** with `moose dev`
2. **Test changes** in local environment
3. **Plan against production**: `moose plan --url --token `
4. **Review changes** carefully
5. **Deploy** - Moose applies migrations automatically on startup
### Serverless Deployments
For serverless deployments (no Moose server), use the ClickHouse connection directly:
```bash filename="Terminal" copy
# Step 1: Generate migration files
moose generate migration --clickhouse-url --save
# Step 2: Preview changes in PR
moose plan --clickhouse-url clickhouse://user:pass@host:port/database
# Step 3: Execute migration after merge
moose migrate --clickhouse-url
```
**Deployment Flow:**
1. **Develop locally** with `moose dev`
2. **Generate migration plan**: `moose generate migration --clickhouse-url --save`
3. **Create PR** with `plan.yaml`, `remote_state.json`, `local_infra_map.json`
4. **PR validation**: Run `moose plan --clickhouse-url ` in CI to preview changes
5. **Review** migration files and plan output
6. **Merge PR**
7. **Execute migration**: Run `moose migrate --clickhouse-url ` in CI/CD
Requires `state_config.storage = "clickhouse"` in `moose.config.toml`:
```toml filename="moose.config.toml" copy
[state_config]
storage = "clickhouse"
[features]
olap = true
data_models_v2 = true
```
Your ClickHouse instance needs the KeeperMap engine for state storage and migration locking.
✅ **ClickHouse Cloud**: Works out of the box
✅ **`moose dev` or `moose prod`**: Already configured
⚠️ **Self-hosted ClickHouse**: See [ClickHouse KeeperMap documentation](https://clickhouse.com/docs/en/engines/table-engines/special/keeper-map) for setup requirements
### State Storage Options
Moose migrations require storing infrastructure state and coordinating locks. You can choose between two backends:
**ClickHouse State Storage (Default)**
Uses the `_MOOSE_STATE` KeeperMap table. Best for:
- ClickHouse Cloud (works out of the box)
- Self-hosted with ClickHouse Keeper already configured
**Redis State Storage**
Uses Redis for state and locking. Best for:
- Existing Redis infrastructure
- Multi-tenant deployments (isolated by `key_prefix`)
- When ClickHouse Keeper isn't available
**Configuration:**
```toml filename="moose.config.toml" copy
[state_config]
storage = "redis" # or "clickhouse" (default)
```
**Usage with Redis:**
```bash filename="Terminal" copy
# With environment variable (recommended)
export MOOSE_REDIS_CONFIG__URL="redis://host:port"
moose migrate --clickhouse-url clickhouse://...
# Or with CLI flag
moose migrate \
--clickhouse-url clickhouse://... \
--redis-url redis://host:port
```
The ClickHouse URL is always required, even when using Redis for state storage.
### Understanding Plan Output
Moose shows exactly what will change:
```bash
+ Table: Analytics Version None - id: String, number: Int64, status: String - - deduplicate: false
+ Table: Users Version None - id: String, name: String, email: String - - deduplicate: false
```
## Migration Types
| Change Type | Infrastructure Impact | Data Impact |
|-------------|----------------------|-------------|
| **Add new object** | New table/stream/API created | No impact |
| **Remove object** | Table/stream/API dropped | All data lost |
| **Add field** | New column created | Existing rows get NULL/default |
| **Remove field** | Column dropped | Data permanently lost |
| **Change type** | Column altered | Data converted if compatible |
## Viewing Infrastructure State
### Via CLI
```bash
# Check current infrastructure objects
moose ls
# View migration logs
moose logs
```
### Via Direct Connection
Connect to your local infrastructure using details from `moose.config.toml`:
```toml file="moose.config.toml"
[features]
olap = true # ClickHouse for analytics
streaming_engine = true # Redpanda for streaming
workflows = false # Temporal for workflows
[clickhouse_config]
host = "localhost"
host_port = 18123
native_port = 9000
db_name = "local"
user = "panda"
password = "pandapass"
[redpanda_config]
broker = "localhost:19092"
message_timeout_ms = 1000
retention_ms = 30000
replication_factor = 1
```
## Best Practices
### Development
- Use `moose dev` for all local development
- Monitor plan outputs for warnings
- Test schema changes with sample data
### Production
- Always use remote planning before deployments
- Review changes carefully in production plans
- Maintain proper authentication
- Test migrations in staging first
### Managing TTL Outside Moose
If you're managing ClickHouse TTL settings through other tools or want to avoid migration failures from TTL drift, you can configure Moose to ignore TTL changes:
```toml filename="moose.config.toml" copy
[migration_config]
ignore_operations = ["ModifyTableTtl", "ModifyColumnTtl"]
```
This tells Moose to:
- Skip generating TTL change operations in migration plans
- Ignore TTL differences during drift detection
You'll still get migrations for all other schema changes (adding tables, modifying columns, etc.), but TTL changes won't block your deployments.
## Troubleshooting
### Authentication Errors
- Verify your authentication token
- Generate a new token: `moose generate hash-token`
- Check server configuration in `moose.config.toml`
### Migration Issues
- Check `moose logs` for detailed error messages
- Verify object definitions in your main file
- Ensure all required fields are properly typed
- **Stuck migration lock**: If you see "Migration already in progress" but no migration is running, wait 5 minutes for automatic expiry or manually clear it:
```sql
DELETE FROM _MOOSE_STATE WHERE key = 'migration_lock';
```
---
## LifeCycle Management
Source: moose/migrate/lifecycle.mdx
Control how Moose manages database and streaming resources when your code changes
# LifeCycle Management
## Overview
The `LifeCycle` enum controls how Moose manages the lifecycle of database/streaming resources when your code changes.
This feature gives you fine-grained control over whether Moose automatically updates your database schema or
leaves it under external/manual control.
## LifeCycle Modes
### `FULLY_MANAGED` (Default)
This is the default behavior where Moose has complete control over your database resources. When you change your data models, Moose will automatically:
- Add new columns or tables
- Remove columns or tables that no longer exist in your code
- Modify existing column types and constraints
This mode can perform destructive operations. Data may be lost if you remove fields from your data models or if you perform operations that require a destroy and recreate to be effective, like changing the `order_by_fields` field .
```py filename="FullyManagedExample.py" copy
from moose_lib import OlapTable, OlapConfig, LifeCycle
from pydantic import BaseModel
class UserData(BaseModel):
id: str
name: str
email: str
# Default behavior - fully managed
user_table = OlapTable[UserData]("users")
# Explicit fully managed configuration
explicit_table = OlapTable[UserData]("users", OlapConfig(
order_by_fields=["id"],
life_cycle=LifeCycle.FULLY_MANAGED
))
```
### `DELETION_PROTECTED`
This mode allows Moose to automatically add new database structures but prevents it from removing existing ones.
Perfect for production environments where you want to evolve your schema safely without risking data loss.
**What Moose will do:**
- Add new columns, tables
- Modify column types (if compatible)
- Update non-destructive configurations
**What Moose won't do:**
- Drop columns or tables
- Perform destructive schema changes
```py filename="DeletionProtectedExample.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig, OlapConfig, StreamConfig, LifeCycle
from pydantic import BaseModel
from datetime import datetime
class ProductEvent(BaseModel):
id: str
product_id: str
timestamp: datetime
action: str
product_analytics = IngestPipeline[ProductEvent]("product_analytics", IngestPipelineConfig(
table=OlapConfig(
order_by_fields=["timestamp", "product_id"],
engine=ClickHouseEngines.ReplacingMergeTree,
),
stream=StreamConfig(
parallelism=4,
),
ingest_api=True,
# automatically applied to the table and stream
life_cycle=LifeCycle.DELETION_PROTECTED
))
```
### `EXTERNALLY_MANAGED`
This mode tells Moose to completely hands-off your resources.
You become responsible for creating and managing the database schema. This is useful when:
- You have existing database tables managed by another team
- You're integrating with another system (e.g. PeerDB)
- You have strict database change management processes
With externally managed resources, you must ensure your database schema matches your data models exactly, or you may encounter runtime errors.
```py filename="ExternallyManagedExample.py" copy
from moose_lib import Stream, OlapTable, OlapConfig, StreamConfig, LifeCycle, Key
from pydantic import BaseModel
from datetime import datetime
class ExternalUserData(BaseModel):
user_id: Key[str]
full_name: str
email_address: str
created_at: datetime
# Connect to existing database table
legacy_user_table = OlapTable[ExternalUserData]("legacy_users", OlapConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED
))
# Connect to existing Kafka topic
legacy_stream = Stream[ExternalUserData]("legacy_user_stream", StreamConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED,
destination=legacy_user_table
))
```
---
## Migration Examples & Advanced Development
Source: moose/migrate/migration-types.mdx
Detailed migration examples and advanced development topics
# Migration Types
This guide provides detailed examples of different migration types. For the complete workflow overview, see [Migrations & Planning](/moose/migrate/planning).
## Adding New Infrastructure Components
Keep in mind that only the modules that you have enabled in your `moose.config.toml` will be included in your migrations.
```toml file="moose.config.toml"
[features]
olap = true
streaming_engine = true
workflows = true
```
### New OLAP Table or Materialized View
```python file="app/main.py"
from pydantic import BaseModel
from datetime import datetime
from moose_lib import OlapTable
class AnalyticsSchema(BaseModel):
id: str
event_type: str
timestamp: datetime
user_id: str
value: float
analytics_table = OlapTable[AnalyticsSchema]("Analytics")
```
**Migration Result:** Creates ClickHouse table `Analytics` with all fields from `AnalyticsSchema`
If you have not enabled the `olap` feature flag, you will not be able to create new OLAP tables.
```toml file="moose.config.toml"
[features]
olap = true
```
Check out the OLAP migrations guide to learn more about the different migration modes.
### New Stream
```python file="app/main.py"
user_events = Stream[UserSchema]("UserEvents")
system_events = Stream[SystemEventSchema]("SystemEvents")
```
**Migration Result:** Creates Redpanda topics `UserEvents` and `SystemEvents`
If you have not enabled the `streaming_engine` feature flag, you will not be able to create new streaming topics.
```toml file="moose.config.toml"
[features]
streaming_engine = true
```
## Schema Modifications
### Adding Fields
```python file="app/main.py"
# Before
class UserSchema(BaseModel):
id: str
name: str
email: str
# After
class UserSchema(BaseModel):
id: str
name: str
email: str
age: int
created_at: datetime
is_active: bool
```
**Migration Result:** Adds `age`, `created_at`, and `is_active` columns to existing table
### Removing Fields
```python file="app/main.py"
# Before
class UserSchema(BaseModel):
id: str
name: str
email: str
age: int
deprecated_field: str # Will be removed
# After
class UserSchema(BaseModel):
id: str
name: str
email: str
age: int
```
**Migration Result:** Drops `deprecated_field` column (data permanently lost)
### Type Changes
```python file="app/main.py"
# Before
class UserSchema(BaseModel):
id: str
name: str
email: str
score: float # Will change to str
# After
class UserSchema(BaseModel):
id: str
name: str
email: str
score: str # Changed from float
```
**Migration Result:** Alters `score` column type (data converted if compatible)
## Removing Infrastructure
```python file="app/main.py"
# Before
users_table = OlapTable[UserSchema]("Users")
analytics_table = OlapTable[AnalyticsSchema]("Analytics")
deprecated_table = OlapTable[DeprecatedSchema]("Deprecated")
# After (remove deprecated table)
users_table = OlapTable[UserSchema]("Users")
analytics_table = OlapTable[AnalyticsSchema]("Analytics")
```
**Migration Result:** Drops `Deprecated` table (all data lost)
## Working with Local Infrastructure
There are two main ways to inspect your local infrastructure to see how your migrations are applied:
### Using the CLI
Run `moose ls` to see the current state of your infrastructure:
```bash
# Verify object definitions
moose ls
```
### Connecting to your local infrastructure
You can also connect directly to your local infrastructure to see the state of your infrastructure.
All credentials for your local infrastructure are located in your project config file (`moose.config.toml`).
#### Connecting to ClickHouse
```bash
# Using clickhouse-client
clickhouse-client --host localhost --port 18123 --user panda --password pandapass --database local
# Using connection string
clickhouse-client "clickhouse://panda:pandapass@localhost:18123/local"
```
#### Connecting to Redpanda
```bash
# Using kafka-console-consumer
kafka-console-consumer --bootstrap-server localhost:19092 --topic UserEvents --from-beginning
# Using kafka-console-producer
kafka-console-producer --bootstrap-server localhost:19092 --topic UserEvents
```
#### Viewing Temporal Workflows
Navigate to `http://localhost:8080` to view the Temporal UI and see registered workflows.
## Gotchas:
Your dev server must be running to connect to your local infrastructure.
```bash
moose dev
```
Only the modules that you have enabled in your `moose.config.toml` will be included in your migrations:
```toml file="moose.config.toml"
[features]
olap = true # Required for OLAP Tables and Materialized Views
streaming_engine = true # Required for Streams
workflows = true # Required for Workflows and Tasks
```
---
## Moose CLI Reference
Source: moose/moose-cli.mdx
Moose CLI Reference
# Moose CLI Reference
## Installation
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
## Core Commands
### Init
Initializes a new Moose project.
```bash
moose init --template [--location ] [--no-fail-already-exists]
```
- ``: Name of your app or service
- ``: Template to use for the project
- `--location`: Optional location for your app or service
- `--no-fail-already-exists`: Skip the directory existence check
#### List Templates
Lists available templates for project initialization.
```bash
moose template list
```
### Build
Builds your Moose project.
```bash
moose build [--docker] [--amd64] [--arm64]
```
- `--docker`: Build for Docker
- `--amd64`: Build for AMD64 architecture
- `--arm64`: Build for ARM64 architecture
### Dev
Starts a local development environment with hot reload and automatic infrastructure management.
```bash
moose dev [--mcp] [--docker]
```
- `--mcp`: Enable or disable the MCP (Model Context Protocol) server (default: true). The MCP server provides AI-assisted development tools at `http://localhost:4000/mcp`. See [MCP Server documentation](/moose/mcp-server) for details.
- `--docker`: Use Docker for infrastructure (default behavior in dev mode)
The development server includes:
- Hot reload for code changes
- Automatic Docker container management (ClickHouse, Redpanda, Temporal, Redis)
- Built-in MCP server for AI assistant integration
- Health monitoring and metrics endpoints
### Prod
Starts Moose in production mode for cloud deployments.
```bash
moose prod
```
### Check
Checks the project for non-runtime errors.
```bash
moose check [--write-infra-map]
```
### Clean
Clears temporary data and stops development infrastructure.
```bash
moose clean
```
### Seed (ClickHouse)
Seed your local ClickHouse from a remote ClickHouse instance.
```bash
moose seed clickhouse [--connection-string ] [--table ] [--limit | --all]
```
- `--connection-string`: Remote ClickHouse connection string. If omitted, the CLI uses `MOOSE_SEED_CLICKHOUSE_URL`.
- `--table`: Seed only the specified table (default: all Moose tables).
- `--limit`: Copy up to N rows (mutually exclusive with `--all`). Large limits are automatically batched.
- `--all`: Copy entire table(s) in batches (mutually exclusive with `--limit`).
**Connection String Format:**
The connection string must use ClickHouse native protocol:
```bash
# ClickHouse native protocol (secure connection)
clickhouse://username:password@host:9440/database
```
**Important:**
- The data transfer uses ClickHouse's native TCP protocol via `remoteSecure()` function. The remote ClickHouse server must have the native TCP port accessible (typically port 9440 for secure connections).
- **Smart table matching**: The command automatically validates tables between local and remote databases. Tables that don't exist on the remote are gracefully skipped with warnings.
- Use `--table ` to seed a specific table that exists in both local and remote databases.
**User Experience:**
- Progress indicator shows seeding status with spinner
- Tables that don't exist on remote are automatically skipped with clear warnings
- Final summary shows successful and skipped tables
- Clean, streamlined output focused on results
Notes:
- Seeding is batched automatically for large datasets; Ctrl+C finishes the current batch gracefully.
- Use env var fallback:
```bash
export MOOSE_SEED_CLICKHOUSE_URL='clickhouse://username:password@host:9440/database'
```
### Truncate
Truncate tables or delete the last N rows from local ClickHouse tables.
```bash
moose truncate [TABLE[,TABLE...]] [--all] [--rows ]
```
- `TABLE[,TABLE...]`: One or more table names (comma-separated). Omit to use `--all`.
- `--all`: Apply to all non-view tables in the current database (mutually exclusive with listing tables).
- `--rows `: Delete the last N rows per table; omit to remove all rows (TRUNCATE).
Notes:
- For `--rows`, the command uses the table ORDER BY when available; otherwise it falls back to a timestamp heuristic.
## Monitoring Commands
### Logs
View Moose logs.
```bash
moose logs [--tail] [--filter ]
```
- `--tail`: Follow logs in real-time
- `--filter`: Filter logs by specific string
### Ps
View Moose processes.
```bash
moose ps
```
### Ls
View Moose primitives & infrastructure.
```bash
moose ls [--limit ] [--version ] [--streaming] [--type ] [--name ] [--json]
```
- `--limit`: Limit output (default: 10)
- `--version`: View specific version
- `--streaming`: View streaming topics
- `--type`: Filter by infrastructure type (tables, streams, ingestion, sql_resource, consumption)
- `--name`: Filter by name
- `--json`: Output in JSON format
### Metrics
View live metrics from your Moose application.
```bash
moose metrics
```
### Peek
View data from a table or stream.
```bash
moose peek [--limit ] [--file ] [-t|--table] [-s|--stream]
```
- ``: Name of the table or stream to peek
- `--limit`: Number of rows to view (default: 5)
- `--file`: Output to a file
- `-t, --table`: View data from a table (default if neither flag specified)
- `-s, --stream`: View data from a stream/topic
## Generation Commands
### Generate Hash Token
Generate authentication tokens for API access.
```bash
moose generate hash-token
```
Generates both a plain-text Bearer token and its corresponding hashed version for authentication.
### Generate Migration Plan (OLAP)
Create an ordered ClickHouse DDL plan by comparing a remote instance with your local code.
```bash
moose generate migration --url https:// --token --save
```
- Writes `./migrations/plan.yaml` and snapshots `remote_state.json` and `local_infra_map.json`
- Omit `--save` to output to stdout without writing files
- Requires these feature flags in `moose.config.toml`:
```toml filename="moose.config.toml" copy
[features]
olap = true
ddl_plan = true
```
### DB Pull (External Tables)
Refresh `EXTERNALLY_MANAGED` table definitions from a remote ClickHouse instance.
```bash
moose db pull --connection-string [--file-path ]
```
- `--connection-string`: ClickHouse URL; native `clickhouse://` is auto-converted to HTTP(S). Include `?database=` or the CLI will query the current database.
- `--file-path`: Optional override for the generated external models file (defaults to `app/externalModels.ts` or `app/external_models.py`).
Notes:
- Only tables marked `EXTERNALLY_MANAGED` in your code are refreshed.
- The command writes a single external models file and overwrites the file on each run.
- See the full guide: [/moose/olap/db-pull](/moose/olap/db-pull)
### Kafka
#### Pull external topics and schemas
Discover topics from a Kafka/Redpanda cluster and optionally fetch JSON Schemas from Schema Registry to emit typed external models.
```bash
moose kafka pull [--path ] [--include ] [--exclude ] [--schema-registry ]
```
- ``: Kafka bootstrap servers, e.g. `localhost:19092`
- `--path`: Output directory for generated files. Defaults to `app/external-topics` (TS) or `app/external_topics` (Python).
- `--include`: Include pattern (glob). Default: `*`
- `--exclude`: Exclude pattern (glob). Default: `{__consumer_offsets,_schemas}`
- `--schema-registry`: Base URL for Schema Registry, e.g. `http://localhost:8081`
Notes:
- JSON Schema is supported initially; Avro/Protobuf planned.
- Generated files will be overwritten on subsequent runs.
## Workflow Management
### Workflow
```bash
moose workflow [options]
```
Available workflow commands:
- `init [--tasks ] [--task ...]`: Initialize a new workflow
- `run [--input ]`: Run a workflow
- `resume --from `: Resume a workflow from a specific task
- `list [--json]`: List registered workflows
- `history [--status ] [--limit ] [--json]`: Show workflow history
- `terminate `: Terminate a workflow
- `pause `: Pause a workflow
- `unpause `: Unpause a workflow
- `status [--id ] [--verbose] [--json]`: Get workflow status
## Planning and Deployment
### Plan
Display infrastructure changes for the next production deployment.
**For Moose Server deployments:**
```bash
moose plan [--url ] [--token ]
```
- `--url`: Remote Moose instance URL (default: http://localhost:4000)
- `--token`: API token for authentication
**For Serverless deployments:**
```bash
moose plan --clickhouse-url
```
- `--clickhouse-url`: ClickHouse connection URL (e.g., `clickhouse://user:pass@host:port/database`)
### Refresh
Integrate matching tables from a remote instance into the local project.
```bash
moose refresh [--url ] [--token ]
```
- `--url`: Remote Moose instance URL (default: http://localhost:4000)
- `--token`: API token for authentication
This reference reflects the current state of the Moose CLI based on the source code in the framework-cli directory. The commands are organized by their primary functions and include all available options and flags.
---
## Moose OLAP
Source: moose/olap.mdx
Create and manage ClickHouse tables, materialized views, and data migrations
# Moose OLAP
## Overview
The Moose OLAP module provides standalone ClickHouse database management with type-safe schemas. You can use this capability independently to create tables, materialized views, and manage your table schemas without requiring other MooseStack components.
### Basic Example
```py filename="FirstTable.py" copy
from pydantic import BaseModel
from moose_lib import Key, OlapTable
class MyFirstTable(BaseModel):
id: Key[str]
name: str
age: int
# Create a table named "first_table"
my_table = OlapTable[MyFirstTable]("first_table")
# No export needed - Python modules are automatically discovered
```
## Getting started
If you’re new to Moose OLAP, choose one of these paths:
### Import your schema from an existing ClickHouse database
```bash filename="Terminal" copy
moose init your-project-name --from-remote
```
Review the full guide to learn more about how to bootstrap a new Moose OLAP project from an existing ClickHouse DB.
### Start from scratch
Create a blank project, then model your first table:
```bash filename="Terminal" copy
moose init your-project-name --language
cd your-project-name
```
Review the guide to learn more about how to model your first table.
Working with ClickPipes/PeerDB? Read [External Tables](/moose/olap/external-tables) and keep them in sync with [DB Pull](/moose/olap/db-pull).
## Enabling Moose OLAP
In your `moose.config.toml` file, enable the OLAP Database capability:
```toml
[features]
olap = true
```
## Core Capabilities
- Model tables and views with TypeScript or Python
- Automatic type mapping and support for advanced ClickHouse column types (e.g `JSON`, `LowCardinality`, `Nullable`, etc)
- Create tables and views with one line of code
- In-database transformations/materialized views
- Migrations and schema evolution
- Query with templated SQL strings and type-safe table and column references
- Bulk insertion with failure handling and runtime validation
### Managing your database
### Accessing your data
These capabilities are available from other MooseStack modules or even from your own client applications:
### Connecting to your ClickHouse instance
You can connect to your ClickHouse instance with your favorite database client. Your credentials are located in your `moose.config.toml` file:
```toml filename="moose.config.toml" copy
[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000
```
These are the default credentials for your local ClickHouse instance running in dev mode.
### Combining with other modules:
Although designed to work independently, Moose OLAP can be combined with other modules to add additional capabilities surrounding your database:
- Combine with Streaming and APIs to setup streaming ingestion into ClickHouse tables
- Combine with Workflows to build ETL pipelines and data transformations
- Combine with APIs to expose your ClickHouse queries to client applications via REST API
---
## Applying Migrations
Source: moose/olap/apply-migrations.mdx
How to apply migrations to your database
# Applying Migrations
This page covers OLAP migrations. For migrations across the MooseStack, see the Migrate docs.
## Overview
Migrations are designed for two complementary goals:
- Move fast locally by inferring changes from your code and applying them immediately to your local database.
- Be deliberate in production by executing a reviewed, versioned plan that matches your intent and protects data.
How to think about it:
- Development mode: You edit code, MooseStack infers the SQL and immediately applies it to local ClickHouse. Great for rapid iteration; not guaranteed to infer intent (e.g., renames).
- Production (planned) mode: You generate a plan from the target environment vs your code, review and commit the plan, and MooseStack executes it deterministically during deploy with drift checks.
What you need to do:
- In dev: just code. MooseStack handles local diffs automatically.
- In prod (OLAP):
- Generate and save a plan:
```bash
moose generate migration --url https:// --token --save
```
- Review and edit the plan (`plan.yaml`) as needed
- Commit the plan to source control
- Deploy to production. MooseStack validates snapshots (current DB vs `remote_state.json`, desired code vs `local_infra_map.json`) and executes `plan.yaml` in order. If drift is detected, the deploy aborts; regenerate the plan and retry.
## Development Workflow
### Starting the Runtime
Use `moose dev` to start the MooseStack runtime with automatic migration detection:
```bash
moose dev
⡏ Starting local infrastructure
Successfully started containers
Validated clickhousedb-1 docker container
Validated redpanda-1 docker container
Successfully validated red panda cluster
Validated temporal docker container
Successfully ran local infrastructure
```
### Hot-Reloaded Migrations
MooseStack continuously monitors your code changes and applies migrations automatically. All changes are applied to your **local database only**.
```python filename="app/tables/events.py" copy
class Event(BaseModel):
id: Key[str]
name: str
created_at: datetime
status: str # New field - will trigger migration
table = OlapTable[Event]("events")
```
When you save changes, you'll see live logs in the terminal showing the diffs being applied to your local database:
```bash
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
+ status: String
```
## Production Workflow
Use planned migrations to generate, review, and apply OLAP DDL plans deterministically.
### Generating Migration Plans
When using planned migrations for OLAP, you need to generate a migration plan from the remote environment. This is done by running the following command:
```bash
moose generate migration --url https:// --token --save
```
This generates a few files in the `migrations` directory:
- `plan.yaml`: The migration plan containing an ordered list of operations to apply to the remote database to bring it into alignment with your local code.
- `remote_state.json`: A snapshot of the remote database state at the time the plan was generated
- `local_infra_map.json`: A snapshot of the local database state at the time the plan was generated
The remote and local state are used to validate that the plan is still valid at the time of deployment. If there have been schema changes made to your live remote database since the plan was generated, the deployment will abort and you will need to regenerate the plan. This is to prevent you from dropping data unintentionally.
### Reviewing and Editing the Plan
You can review and edit the plan as needed. The plan is a YAML file that contains an ordered list of operations to apply to the remote database to bring it into alignment with your local code.
```yaml filename="migrations/plan.yaml" copy
```
### Applying the Plan
The plan is applied during deployment. MooseStack will validate that the remote database state matches the snapshot of the database state at the time the plan was generated, and applies `plan.yaml` in order; it aborts if snapshots don’t match current state.
## Migration Types
### Adding New Tables or Materialized Views
```python filename="main.py" {4-7} copy
from app.db import newTable, newMaterializedView
```
The dev mode will automatically detect the new table or materialized view and apply the changes to your local database. You will see a log like this in the terminal:
```bash filename="Terminal" copy
$ moose dev
⠋ Processing Infrastructure changes from file watcher
+ Table: new_table Version None - id: String, a_column: String, some_other_column: Float64 - - deduplicate: false
+ Table: target_table Version None - id: String, a_column: String, some_other_column: Float64 - id - deduplicate: false
+ SQL Resource: mv_to_target
```
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
- CreateTable:
table:
name: new_table
columns:
- name: id
data_type: String
required: true
unique: false
primary_key: true
default: null
annotations: []
- name: a_column
data_type: String
required: true
unique: false
primary_key: false
default: null
annotations: []
- name: some_other_column
data_type: Float64
required: true
unique: false
primary_key: false
default: null
annotations: []
order_by:
- id
deduplicate: false
engine: MergeTree
version: null
metadata:
description: null
life_cycle: FULLY_MANAGED
- CreateTable:
table:
name: target_table
columns:
- name: id
data_type: String
required: true
unique: false
primary_key: true
default: null
annotations: []
- name: a_column
data_type: String
required: true
unique: false
primary_key: false
default: null
annotations: []
- name: some_other_column
data_type: Float64
required: true
unique: false
primary_key: false
default: null
annotations: []
order_by:
- id
deduplicate: false
engine: MergeTree
version: null
metadata:
description: null
life_cycle: FULLY_MANAGED
- RawSQL:
sql: "CREATE MATERIALIZED VIEW mv_to_target TO target_table AS SELECT * FROM source_table"
description: Running setup SQL for resource mv_to_target
```
### Column Additions
Adding new fields to your data models:
```python filename="Before.py" copy
class AddedColumn(BaseModel):
id: Key[str]
another_column: str
some_column: str
table = OlapTable[AddedColumn]("events")
```
```python filename="After.py" copy
class AddedColumn(BaseModel):
id: Key[str]
another_column: str
some_column: str
new_column: int # New field - migration applied
table = OlapTable[AddedColumn]("events")
```
In dev mode, you will see a log like this:
```bash filename="Terminal" copy
$ moose dev
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
+ new_column: Int64
```
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
- AddTableColumn:
table: "events"
column:
name: "new_column"
data_type: "Int64"
```
### Column Removals
Removing fields from your data models:
```python filename="Before.py" copy
class RemovedColumn(BaseModel):
id: Key[str]
another_column: str
some_column: str
old_column: int
table = OlapTable[RemovedColumn]("events")
```
```python filename="After.py" copy
class RemovedColumn(BaseModel):
id: Key[str]
another_column: str
some_column: str
# old_column field removed
table = OlapTable[RemovedColumn]("events")
```
In dev mode, you will see a log like this:
```bash filename="Terminal" copy
$ moose dev
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
- old_column: Int64
```
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
- DropTableColumn:
table: "events"
column_name: "old_column"
```
### Changing Column Data Types (use with caution)
```python filename="Before.py" copy
class ChangedType(BaseModel):
id: Key[str]
some_column: str
table = OlapTable[ChangedType]("events")
```
```python filename="After.py" copy
class ChangedType(BaseModel):
id: Key[str]
some_column: Annotated[str, "LowCardinality"] # Add LowCardinality for better performance
table = OlapTable[ChangedType]("events")
```
In dev mode, you will see a log like this:
```bash filename="Terminal" copy
$ moose dev
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
- some_column: String -> LowCardinality(String)
```
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
- ChangeTableColumn:
table: "events"
column_name: "some_column"
data_type: "LowCardinality(String)"
```
Some data type changes can be incompatible with existing data. Read the guide to learn more.
### Materialized View Changes
Modifying the `SELECT` statement of a materialized view:
```python filename="Before.py" copy
from pydantic import BaseModel
class TargetSchema(BaseModel):
day: Date;
count: number;
sum: number;
mv = MaterializedView[TargetSchema](MaterializedViewConfig(
select_statement="""
SELECT
toStartOfDay(a_date) as day,
uniq(id) as count,
sum(a_number) as sum
FROM table
GROUP BY day
"""
target_table=OlapConfig(
name="target_table",
engine=ClickHouseEngines.MergeTree,
order_by_fields=["day"],
),
materialized_view_name="mv_table_to_target",
))
```
```python filename="After.py" copy
class TargetSchema(BaseModel):
day: Date;
count: number;
sum: number;
avg: number;
mv = MaterializedView[TargetSchema](MaterializedViewConfig(
select_statement="""
SELECT
toStartOfDay(a_date) as day,
uniq(id) as count,
sum(a_number) as sum,
avg(a_number) as avg
FROM table
GROUP BY day
"""
target_table=OlapConfig(
name="target_table",
engine=ClickHouseEngines.MergeTree,
order_by_fields=["day"],
),
materialized_view_name="mv_table_to_target",
))
```
The dev mode diff:
```bash filename="Terminal" copy
$ moose dev
⠋ Processing Infrastructure changes from file watcher
~ Table target_table:
Column changes:
+ avg: Float64
~ SQL Resource: mv_to_target
```
Notice that the materialized view generates both a target table and a SQL resource. The target table creates a new table in the database to store the results of the materialized view `SELECT` statement. The `SQL Resource` is the SQL statement that is used to create the target table.
The generated plan for this operation will look like this:
```yaml filename="migrations/plan.yaml" copy
created_at: 2025-08-20T05:35:31.668353Z
operations:
- RawSql:
sql:
- DROP VIEW IF EXISTS mv_table_to_target
description: Running teardown SQL for resource mv_table_to_target
- AddTableColumn:
table: target_table
column:
name: "avg"
data_type: "Float64"
- RawSql:
sql:
- "CREATE MATERIALIZED VIEW IF NOT EXISTS mv_table_to_target \n TO target_table\n AS \n SELECT \n toStartOfDay(`a_date`) as day, \n uniq(`id`) as count, \n sum(`a_number`) as sum, \n avg(`a_number`) as avg\n FROM `source_table` \n GROUP BY day"
- "INSERT INTO target_table\n \n SELECT \n toStartOfDay(`a_date`) as day, \n uniq(`id`) as count, \n sum(`a_number`) as sum, \n avg(`a_number`) as avg\n FROM `source_table` \n GROUP BY day"
description: Running setup SQL for resource mv_table_to_target
```
Changing a materialized view's SELECT statement will recreate the entire view and repopulate all data. This can be time-consuming for large datasets.
---
## Syncing External Tables
Source: moose/olap/db-pull.mdx
Refresh your external table models from an existing ClickHouse database
# Syncing External Tables
## What this is
Use `moose db pull` to refresh the definitions of tables you marked as `EXTERNALLY_MANAGED` from a live ClickHouse instance. It reads your code to find external tables, fetches their remote schemas, regenerates one external models file, and creates a small git commit if anything changed. If new external tables were added remotely (e.g., new CDC streams), they are added to the external models file as part of the same run.
## When to use it
- **External tables changed remotely**: a DBA, CDC, or ETL pipeline updated schema.
- **Keep types in sync**: update generated models without touching fully-managed tables.
- **Safe by design**: does not modify the database or your managed models.
This is a read-only sync for your code models. For concepts and modeling guidance, see [External Tables](/moose/olap/external-tables). To bootstrap a project from an existing DB, see [Initialize from ClickHouse](/moose/getting-started/from-clickhouse).
## Requirements
- Tables are defined with `life_cycle=LifeCycle.EXTERNALLY_MANAGED`
- A ClickHouse connection string (native or HTTP/S)
## Connection strings
`db pull` accepts both native and HTTP(S) URLs. Native strings are automatically converted to HTTP(S) with the appropriate ports.
Examples:
```bash filename="Terminal" copy
# Native (auto-converted to HTTPS + 8443)
moose db pull --connection-string "clickhouse://explorer@play.clickhouse.com:9440/default"
# HTTPS (explicit database via query param)
moose db pull --connection-string "https://play.clickhouse.com/?user=explorer&database=default"
# Local HTTP
moose db pull --connection-string "http://localhost:8123/?user=default&database=default"
```
## What gets written
`app/external_models.py`
`db pull` treats this file as the single source of truth for `EXTERNALLY_MANAGED` tables. It introspects the remote schema, updates existing external tables, and adds any newly detected external tables here. It does not modify models elsewhere in your codebase.
Keep all external tables in this file and import it once from your root (`app/main.py`).
Important:
- The file is overwritten on every run (or at the path passed via `--file-path`).
- If you customize the path, ensure your root file imports it so Moose loads your external models.
## How it works
When you run `db pull` the CLI does the following:
- Loads your project’s infrastructure map and identifies tables marked as `EXTERNALLY_MANAGED`.
- Connects to the remote ClickHouse specified by `--connection-string` and introspects the live schemas for those tables.
- Regenerates a single external models file that mirrors the remote schema.
- Adds any newly detected external tables from the remote database to the generated file so your code stays in sync as sources evolve.
- Does not change any fully managed tables, your `app/main.py`, or the database itself.
- Creates a small git commit if the generated file changed, so you can review and share the update.
### Example output
```py filename="app/external_models.py"
# AUTO-GENERATED FILE. DO NOT EDIT.
# This file will be replaced when you run `moose db pull`.
# ...pydantic models matching remote EXTERNALLY_MANAGED tables...
```
## Command
```bash filename="Terminal" copy
moose db pull --connection-string [--file-path ]
```
- **--connection-string**: Required. ClickHouse URL (native or HTTP/S)
- **--file-path**: Optional. Override the default output file. The file at this path will be regenerated (overwritten) on each run.
## Typical Use Cases
### Remote schema changed; update local types
Your DBA, CDC pipeline (e.g., ClickPipes), or ETL job updated a table’s schema. To keep your code accurate and type-safe, refresh your external models so queries, APIs, and materialized views reference the correct columns and types.
```bash filename="Terminal" copy
moose db pull --connection-string
```
This updates only `EXTERNALLY_MANAGED` models and leaves managed code untouched.
### Automatically run on dev startup (keep local fresh)
In active development, schemas can drift faster than you commit updates. Running `db pull` on dev startup helps ensure your local code matches the live schema you depend on.
```bash filename="Terminal" copy
export REMOTE_CLICKHOUSE_URL="clickhouse://:@:/"
```
Add to `moose.config.toml`:
```toml filename="moose.config.toml" copy
[http_server_config]
on_first_start_script = "moose db pull --connection-string $REMOTE_CLICKHOUSE_URL"
```
This runs once when the dev server first starts. To run after code reloads, use `on_reload_complete_script`. If you run this frequently, prefer HTTP(S) URLs and cache credentials via env/secrets to avoid friction.
### New project from an existing DB
If you’re starting with an existing ClickHouse database, bootstrap code with `init --from-remote`, then use `db pull` over time to keep external models fresh:
```bash filename="Terminal" copy
moose init my-project --from-remote $REMOTE_CLICKHOUSE_URL --language
```
Review the full getting started guide to learn more about how to bootstrap a new Moose OLAP project from an existing ClickHouse DB.
### A new CDC/external table appeared; add it to code
Your CDC pipeline created a new table (or exposed a new stream). Pull to add the new table to your external models file automatically.
```bash filename="Terminal" copy
moose db pull --connection-string
```
The regenerated external models file will now include the newly discovered external table.
## Troubleshooting
- **No changes written**: Ensure tables are actually marked as `EXTERNALLY_MANAGED` and names match remote.
- **Unsupported types**: The CLI will list tables with unsupported types; they’re skipped in the generated file.
- **Auth/TLS errors**: Verify scheme/ports (8123 or 8443) and credentials; try HTTPS if native URL fails.
- **Git commit issues**: The command attempts a lightweight commit; commit manually if your working tree is dirty.
## Related
- **External Tables**: concepts and configuration
- **Initialize from ClickHouse**: bootstrap projects from an existing DB
- **Supported Types**: mapping and constraints
---
## External Tables
Source: moose/olap/external-tables.mdx
Connect to externally managed database tables and CDC services
# External Tables
## Overview
External tables allow you to connect Moose to database tables that are managed outside of your application. This is essential when working with:
- **CDC (Change Data Capture) services** like ClickPipes, Debezium, or AWS DMS
- **Legacy database tables** managed by other teams
- **Third-party data sources** with controlled schema evolution
## When to Use External Tables
## Configuration
Set `life_cycle=LifeCycle.EXTERNALLY_MANAGED` to tell Moose not to modify the table schema:
```py filename="ExternalTableExample.py" copy
from moose_lib import OlapTable, OlapConfig, LifeCycle
from pydantic import BaseModel
from datetime import datetime
class CdcUserData(BaseModel):
id: str
name: str
email: str
updated_at: datetime
# Connect to CDC-managed table
cdc_user_table = OlapTable[CdcUserData]("cdc_users", OlapConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED
))
```
## Getting Models for External Tables
### New project: initialize from your existing ClickHouse
If you don’t yet have a Moose project, use init-from-remote to bootstrap models from your existing ClickHouse:
```bash filename="Terminal" copy
moose init my-project --from-remote --language
```
What happens:
- Moose introspects your database and generates table models in your project.
- If Moose detects ClickPipes (or other CDC-managed) tables, it marks those as `EXTERNALLY_MANAGED` and writes them into a dedicated external models file:
- TypeScript: `app/externalModels.ts`
- Python: `app/external_models.py`
- This is a best-effort detection to separate CDC-managed tables from those you may want Moose to manage in code.
How detection works (ClickPipes/PeerDB example):
- Moose looks for PeerDB-specific fields that indicate CDC ownership and versions, such as `_peerdb_synced_at`, `_peerdb_is_deleted`, `_peerdb_version`, and related metadata columns.
- When these are present, the table will be marked `EXTERNALLY_MANAGED` and emitted into the external models file automatically.
### Existing project: mark additional external tables
If there are other tables in your DB that are not CDC-managed but you want Moose to treat as external (not managed by code):
1) Mark them as external in code
```py copy
table = OlapTable[MySchema](
"my_table",
OlapConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED
)
)
```
2) Move them into the external models file
- Move the model definitions to your external file (`app/externalModels.ts` or `app/external_models.py`).
- Ensure your root file still loads only the external models via a single import:
- Add `from external_models import *` in your `app/main.py` file.
This keeps truly external tables out of your managed code path, while still making them available locally (and in tooling) without generating production DDL.
## Important Considerations
`EXTERNALLY_MANAGED` tables reflect schemas owned by your CDC/DBA/ETL processes. Do not change their field shapes in code.
If you accidentally edited an external model, revert to the source of truth by running **DB Pull**: [/moose/olap/db-pull](/moose/olap/db-pull).
Locally, externally managed tables are created/kept in sync in your development ClickHouse so you can develop against them and **seed data**. See **Seed (ClickHouse)** in the CLI: [/moose/moose-cli#seed-clickhouse](/moose/moose-cli#seed-clickhouse).
Moose will **not** apply schema changes to `EXTERNALLY_MANAGED` tables in production. If you edit these table models in code, those edits will not produce DDL operations in the migration plan (they will not appear in `plan.yaml`).
For more on how migration plans are generated and what shows up in `plan.yaml`, see [/moose/olap/planned-migrations](/moose/olap/planned-migrations).
## Staying in sync with remote schema
For `EXTERNALLY_MANAGED` tables, keep your code in sync with the live database by running DB Pull. You can do it manually or automate it in dev.
```bash filename="Terminal" copy
moose db pull --connection-string
```
Use DB Pull to regenerate your external models file from the remote schema. To run it automatically during development, see the script hooks in [the local development guide](/moose/local-dev#script-execution-hooks).
---
## Secondary Indexes
Source: moose/olap/indexes.mdx
Specifying indexes with Moose OLAP
## Indexes for ClickHouse tables
Moose lets you declare secondary/data-skipping indexes directly in your table definitions.
Moose generates the ClickHouse `INDEX` clauses on create and
plans `ALTER TABLE ADD/DROP INDEX` operations when you change them later.
### When to use indexes
- Use indexes to optimize selective predicates on large tables, especially string and high-cardinality columns.
- Common types: `minmax`, `Set(max_rows)`, `ngrambf_v1(...)`, `bloom_filter`.
### TypeScript
```ts
interface Events {
id: string;
user: string;
message: string;
}
);
```
### Python
```python
from moose_lib.dmv2.olap_table import OlapTable, OlapConfig, MergeTreeEngine
from pydantic import BaseModel
class Events(BaseModel):
id: str
user: str
message: str
events_table = OlapTable[Events](
"Events",
OlapConfig(
engine=MergeTreeEngine(),
order_by_fields=["id"],
indexes=[
OlapConfig.TableIndex(name="idx_user", expression="user", type="minmax", granularity=1),
OlapConfig.TableIndex(name="idx_message_ngrams", expression="message", type="ngrambf_v1", arguments=["3","256","1","123"], granularity=1),
],
),
)
```
### How Moose applies changes
- On create, Moose emits `INDEX ...` entries inside `CREATE TABLE`.
- On change, Moose plans `ALTER TABLE DROP INDEX ` then `ADD INDEX ...` if the definition changed; pure adds/drops are applied as single operations.
---
## Inserting Data
Source: moose/olap/insert-data.mdx
Insert data into OLAP tables using various methods
# Inserting Data
Inserting data into your database is a common task. MooseStack provides a few different ways to insert data into your database.
If a table column is modeled as optional in your app type but has a ClickHouse default, Moose treats incoming records as optional at the API/stream boundary, but the ClickHouse table stores the column as required with a DEFAULT clause. If you omit the field in the payload, ClickHouse fills it with the default at insert time.
`Annotated[int, clickhouse_default("18")]`
## From a Stream (Streaming Ingest)
When you need to stream data into your ClickHouse tables, you can set the `Stream.destination` as a reference to the `OlapTable` you want to insert into. This will automatically provision a synchronization process that batches and inserts data into the table.
```py filename="StreamInsert.py" copy
from moose_lib import Stream, StreamConfig, Key
from pydantic import BaseModel
from datetime import datetime
class Event(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
events_table = OlapTable[Event]("user_events")
events_pipeline = Stream[Event]("user_events", StreamConfig(
destination=events_table # Automatically syncs the stream to the table in ClickHouse-optimized batches
))
```
[ClickHouse inserts need to be batched for optimal performance](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance). Moose automatically batches your data into ClickHouse-optimized batches of up to 100,000 records, with automatic flushing every second. It also handles at-least-once delivery and retries on connection errors to ensure your data is never lost.
## From a Workflow (Batch Insert)
If you have data source better suited for batch patterns, use a workflow and the direct `insert()` method to land data into your tables:
```py filename="WorkflowInsert.py" copy
from moose_lib import OlapTable, Key, InsertOptions
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
events_table = OlapTable[UserEvent]("user_events")
# Direct insertion for ETL workflows
result = events_table.insert([
{"id": "evt_1", "user_id": "user_123", "timestamp": datetime.now(), "event_type": "click"},
{"id": "evt_2", "user_id": "user_456", "timestamp": datetime.now(), "event_type": "view"}
])
print(f"Successfully inserted: {result.successful} records")
print(f"Failed: {result.failed} records")
```
## From a Client App
### Via REST API
In your Moose code, you can leverage the built in [MooseAPI module](/moose/apis) to place a `POST` REST API endpoint in front of your streams and tables to allow you to insert data from external applications.
```py filename="IngestApi.py" copy
from moose_lib import IngestApi, IngestConfig
ingest_api = IngestApi[Event]("user_events", IngestConfig(
destination=events_stream
))
```
Alternatively, use `IngestPipeline` instead of standalone `IngestApi`, `Stream` `OlapTable` components:
```py filename="IngestPipeline.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig
ingest_pipeline = IngestPipeline[Event]("user_events", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True,
))
```
With these APIs you can leverage the built-in OpenAPI client integration to generate API clients in your own language to connect to your pipelines from external applications.
### Coming Soon: MooseClient
We're working on a new client library that you can use to interact with your Moose pipelines from external applications.
Join the community slack to stay updated and let us know if you're interested in helping us build it.
## Direct Data Insertion
The `OlapTable` provides an `insert()` method that allows you to directly insert data into ClickHouse tables with validation and error handling.
### Inserting Arrays of Records
```py filename="DirectInsert.py" copy
from moose_lib import OlapTable, Key, InsertOptions
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
events_table = OlapTable[UserEvent]("user_events")
# Insert single record or array of records
result = events_table.insert([
{"id": "evt_1", "user_id": "user_123", "timestamp": datetime.now(), "event_type": "click"},
{"id": "evt_2", "user_id": "user_456", "timestamp": datetime.now(), "event_type": "view"}
])
print(f"Successfully inserted: {result.successful} records")
print(f"Failed: {result.failed} records")
```
ClickHouse strongly recommends batching inserts. You should avoid inserting single records in to tables, and consider using Moose Streams and Ingest Pipelines if your data source sends events as individual records.
### Handling Large Batch Inserts
For large datasets, use Python generators for memory-efficient processing:
```py filename="StreamInsert.py" copy
def user_event_generator():
"""Generate user events for memory-efficient processing."""
for i in range(10000):
yield {
"id": f"evt_{i}",
"user_id": f"user_{i % 100}",
"timestamp": datetime.now(),
"event_type": "click" if i % 2 == 0 else "view"
}
# Insert from generator (validation not available for streams)
result = events_table.insert(user_event_generator(), InsertOptions(strategy="fail-fast"))
```
### Validation Methods
Before inserting data, you can validate it using the following methods:
```py filename="ValidationMethods.py" copy
from moose_lib import OlapTable, Key
from pydantic import BaseModel
class UserEvent(BaseModel):
id: Key[str]
user_id: str
event_type: str
events_table = OlapTable[UserEvent]("user_events")
# Validate a single record
validated_data, error = events_table.validate_record(unknown_data)
if validated_data is not None:
print("Valid data:", validated_data)
else:
print("Validation error:", error)
# Validate multiple records with detailed error reporting
validation_result = events_table.validate_records(data_array)
print(f"Valid records: {len(validation_result.valid)}")
print(f"Invalid records: {len(validation_result.invalid)}")
for error in validation_result.invalid:
print(f"Record {error.index} failed: {error.error}")
```
### Error Handling Strategies
Choose from three error handling strategies based on your reliability requirements:
#### Fail-Fast Strategy (Default)
```py filename="FailFast.py" copy
from moose_lib import InsertOptions
# Stops immediately on any error
result = events_table.insert(data, InsertOptions(strategy="fail-fast"))
```
#### Discard Strategy
```py filename="Discard.py" copy
from moose_lib import InsertOptions
# Discards invalid records, continues with valid ones
result = events_table.insert(data, InsertOptions(
strategy="discard",
allow_errors=10, # Allow up to 10 failed records
allow_errors_ratio=0.05 # Allow up to 5% failure rate
))
```
#### Isolate Strategy
```py filename="Isolate.py" copy
from moose_lib import InsertOptions
# Retries individual records to isolate failures
result = events_table.insert(data, InsertOptions(
strategy="isolate",
allow_errors_ratio=0.1
))
# Access detailed failure information
if result.failed_records:
for failed in result.failed_records:
print(f"Record {failed.index} failed: {failed.error}")
```
### Performance Optimization
The insert API includes several performance optimizations:
- **Memoized connections**: ClickHouse clients are reused across insert calls
- **Batch processing**: Optimized batch sizes for large datasets
- **Async inserts**: Automatic async insert mode for datasets > 1000 records
- **Connection management**: Use `close_client()` when completely done
```py filename="Performance.py" copy
from moose_lib import InsertOptions
# For high-throughput scenarios
result = events_table.insert(large_dataset, InsertOptions(
validate=False, # Skip validation for performance
strategy="discard"
))
# Clean up when completely done (optional)
events_table.close_client()
```
## Best Practices
---
## Creating Materialized Views
Source: moose/olap/model-materialized-view.mdx
Create and configure materialized views for data transformations
# Modeling Materialized Views
## Overview
Materialized views are write-time transformations in ClickHouse. A static `SELECT` populates a destination table from one or more sources. You query the destination like any other table. The `MaterializedView` class wraps [ClickHouse `MATERIALIZED VIEW`](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#create-materialized-view) and keeps the `SELECT` explicit. When you edit the destination schema in code and update the `SELECT` accordingly, Moose applies the corresponding DDL, orders dependent updates, and backfills as needed, so the pipeline stays consistent as you iterate.
In local dev, Moose Migrate generates and applies DDL to your local database.
Today, destination schemas are declared in code and kept in sync manually with your `SELECT`. Moose Migrate coordinates DDL and dependencies when you make those changes. A future enhancement will infer the destination schema from the `SELECT` and update it automatically.
This dependency awareness is critical for [cascading materialized views](https://clickhouse.com/docs/en/sql-reference/statements/create/view/#create-materialized-view-with-dependencies). Moose Migrate [orders DDL across views and tables](https://www.fiveonefour.com/blog/Moose-SQL-Getting-DDL-Dependencies-in-Order) to avoid failed migrations and partial states.
### Basic Usage
```python filename="BasicUsage.py" copy
from moose_lib import MaterializedView, MaterializedViewOptions, ClickHouseEngines
from source_table import source_table
# Define the schema of the transformed rows-- this is static and it must match the results of your SELECT. It also represents the schema of your entire destination table.
class TargetSchema(BaseModel):
id: str
average_rating: float
num_reviews: int
mv = MaterializedView[TargetSchema](MaterializedViewOptions(
# The transformation to run on the source table
select_statement="""
SELECT
{source_table.columns.id},
avg({source_table.columns.rating}) AS average_rating,
count(*) AS num_reviews
FROM {source_table}
GROUP BY {source_table.columns.id}
""",
# Reference to the source table(s) that the SELECT reads from
select_tables=[source_table],
# Creates a new OlapTable named "target_table" where the transformed rows are written to.
table_name="target_table",
order_by_fields=["id"],
# The name of the materialized view in ClickHouse
materialized_view_name="mv_to_target_table",
))
```
The ClickHouse `MATERIALIZED VIEW` object acts like a trigger: on new inserts into the source table(s), it runs the SELECT and writes the transformed rows to the destination.
### Quick Reference
```python filename="ViewOptions.py" copy
from moose_lib import MaterializedView, sql
from source_table import source_table
class MaterializedViewOptions(BaseModel):
select_statement: str
table_name: str
materialized_view_name: str
select_tables: List[OlapTable | View]
engine: ClickHouseEngines = ClickHouseEngines.MergeTree
order_by_fields: List[str] = []
```
## Modeling the Target Table
The destination table is where the transformed rows are written by the materialized view. You can model it in two ways:
### Option 1 — Define target table inside the MaterializedView (most cases)
- Simple, co-located lifecycle: the destination table is created/updated/dropped with the MV.
- Best for: projection/denormalization, filtered serving tables, enrichment joins, and most rollups.
```python filename="InlineTarget.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions
class TargetSchema(BaseModel):
id: str
value: int
mv = MaterializedView[TargetSchema](MaterializedViewOptions(
select_statement="""
SELECT {source_table.columns.id}, toInt32({source_table.columns.value}) AS value FROM {source_table}
""",
select_tables=[source_table],
table_name="serving_table",
order_by_fields=["id"],
materialized_view_name="mv_to_serving_table",
))
```
### Option 2 — Decoupled: reference a standalone `OlapTable`
Certain use cases may benefit from a separate lifecycle for the target table that is managed independently from the MV.
```python filename="DecoupledTarget.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions, OlapConfig, ClickHouseEngines
class TargetSchema(BaseModel):
id: str
value: int
# Create the standalone table
target_table = OlapTable[TargetSchema](OlapConfig(
name="target_table",
engine=ClickHouseEngines.MergeTree,
order_by_fields=["id"],
))
mv = MaterializedView[TargetSchema](MaterializedViewOptions(
select_statement="""
SELECT {source_table.columns.id}, toInt32({source_table.columns.value}) AS value FROM {source_table}
""",
select_tables=[source_table],
materialized_view_name="mv_to_target_table",
), target_table=target_table)
```
### Basic Transformation, Cleaning, Filtering, Denormalization
Create a narrower, query-optimized table from a wide source. Apply light transforms (cast, rename, parse) at write time.
```python filename="Denormalization.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions
class Dest(BaseModel):
id: str
value: int
created_at: str
mv = MaterializedView[Dest](MaterializedViewOptions(
select_statement="""
SELECT {source_table.columns.id}, toInt32({source_table.columns.value}) AS value, {source_table.columns.created_at} AS created_at FROM {source_table} WHERE active = 1
""",
select_tables=[source_table],
table_name="proj_table",
order_by_fields=["id"],
materialized_view_name="mv_to_proj_table",
))
```
### Aggregations
### Simple Additive Rollups
When you want to maintain running sums (counts, totals) that are additive per key, use the `SummingMergeTree` engine:
```python filename="Summing.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions, ClickHouseEngines
class DailyCounts(BaseModel):
day: str
user_id: str
events: int
stmt = """
SELECT
toDate({events.columns.timestamp}) AS day,
{events.columns.user_id} AS user_id,
count(*) AS events
FROM {events}
GROUP BY day, user_id
"""
mv = MaterializedView[DailyCounts](MaterializedViewOptions(
select_statement=STMT,
select_tables=[events],
table_name="daily_counts",
engine=ClickHouseEngines.SummingMergeTree,
order_by_fields=["day", "user_id"],
materialized_view_name="mv_to_daily_counts",
))
```
#### Complex Aggregations
When you want to compute complex aggregation metrics that are not just simple additive operations (sum, count, avg, etc), but instead uses more complex anlaytical functions: (topK,percentile, etc), create a target table with the `AggregatingMergeTree` engine.
```python filename="AggTransform.py" copy
from typing import Annotated
from pydantic import BaseModel
from moose_lib import MaterializedView, AggregateFunction, MaterializedViewOptions, ClickHouseEngines
class MetricsById(BaseModel):
id: str
avg_rating: Annotated[float, AggregateFunction(agg_func="avg", param_types=[float])]
daily_uniques: Annotated[int, AggregateFunction(agg_func="uniqExact", param_types=[str])]
# The SELECT must output aggregate states
STMT = """
SELECT
id,
avgState(${events.columns.rating}) AS avg_rating,
uniqExactState(${events.columns.user_id}) AS daily_uniques
FROM ${events}
GROUP BY ${events.columns.id}
"""
# Create the MV (engine config shown in TS example)
mv = MaterializedView[MetricsById](MaterializedViewOptions(
select_statement=STMT,
table_name="metrics_by_id",
materialized_view_name="mv_metrics_by_id",
engine=ClickHouseEngines.AggregatingMergeTree,
order_by_fields=["id"],
select_tables=[events],
))
```
Jump to the [Advanced: AggregatingMergeTree transformations](#advanced-aggregatingmergetree-transformations) section for more details.
### Fan-in Patterns
When you have multiple sources that you want to merge into a single destination table, its best to create an OlapTable and reference it in each MV that needs to write to it:
```python filename="FanIn.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions, OlapConfig, ClickHouseEngines
class DailyCounts(BaseModel):
day: str
user_id: str
events: int
# Create the destination table explicitly
daily = OlapTable[DailyCounts]("daily_counts", OlapConfig(
engine=ClickHouseEngines.SummingMergeTree,
order_by_fields=["day", "user_id"],
))
# MV 1 - write to the daily_counts table
mv1 = MaterializedView[DailyCounts](MaterializedViewOptions(
select_statement="SELECT toDate(ts) AS day, user_id, 1 AS events FROM {webEvents}",
select_tables=[webEvents],
materialized_view_name="mv_web_to_daily_counts",
), target_table=daily)
# MV 2 - write to the daily_counts table
mv2 = MaterializedView[DailyCounts](MaterializedViewOptions(
select_statement="SELECT toDate(ts) AS day, user_id, 1 AS events FROM {mobileEvents}",
select_tables=[mobileEvents],
materialized_view_name="mv_mobile_to_daily_counts",
), target_table=daily)
```
### Blue/green schema migrations
Create a new table for a breaking schema change and use an MV to copy data from the old table; when complete, switch reads to the new table and drop just the MV and old table.
For more information on how to use materialized views to perform blue/green schema migrations, see the [Schema Versioning](./schema-versioning) guide.
## Defining the transformation
The `select_statement` is a static SQL query that Moose runs to transform data from your source table(s) into rows for the destination table.
Transformations are defined as ClickHouse SQL queries. We strongly recommend using the ClickHouse SQL reference and functions overview to help you develop your transformations.
You can use f-strings to interpolate tables and columns identifiers to your queries. Since these are static, you don't need to worry about SQL injection.
```python filename="Transformation.py" copy
from pydantic import BaseModel
from moose_lib import MaterializedView, MaterializedViewOptions, OlapConfig
class Dest(BaseModel):
id: str
name: str
day: str
mv = MaterializedView[Dest](MaterializedViewOptions(
select_statement="""
SELECT
{events.columns.id} AS id,
{events.columns.name} AS name,
toDate({events.columns.ts}) AS day
FROM {events}
JOIN {users} ON {events.columns.user_id} = {users.columns.id}
WHERE {events.columns.active} = 1
""",
select_tables=[events, users],
order_by_fields=["id"],
table_name="user_activity_by_day",
materialized_view_name="mv_user_activity_by_day",
))
```
The columns returned by your `SELECT` must exactly match the destination table schema.
- Use column aliases (`AS target_column_name`) to align names.
- All destination columns must be present in the `SELECT`, or the materialized view won't be created. Adjust your transformation or table schema so they match.
Go to the [Advanced: Writing SELECT statements to Aggregated tables](#writing-select-statements-to-aggregated-tables) section for more details.
## Backfill Destination Tables
When the MaterializedView is created, Moose backfills the destination once by running your `SELECT` (so you start with a fully populated table).
Materialized views that source from S3Queue tables are **not backfilled** automatically. S3Queue tables only process new files added to S3 after the table is created - there is no historical data to backfill from. The MV will start populating as new files arrive in S3.
You can see the SQL that Moose will run to backfill the destination table when you generate the [Migration Plan](./migration-plan).
During dev mode, as soon as you save the MaterializedView, Moose will run the backfill and you can see the results in the destination table by querying it in your local ClickHouse instance.
## Query Destination Tables
You can query the destination table like any other table.
For inline or decoupled target tables, you can reference target table columns and tables directly in your queries:
```python filename="Query.py" copy
# Query inline destination table by name
QUERY = """
SELECT {mv.target_table.columns.id}, {mv.target_table.columns.value}
FROM {mv.target_table}
ORDER BY {mv.target_table.columns.id}
LIMIT 10
"""
```
If you define your target table outside of the MaterializedView, you can also just reference the table by its variable name in your queries:
```python filename="QueryDecoupled.py" copy
# Query the standalone destination table by name
target_table = OlapTable[TargetTable](OlapConfig(
name="target_table",
engine=ClickHouseEngines.MergeTree,
order_by_fields=["id"],
))
QUERY = """
SELECT
{target_table.columns.id},
{target_table.columns.average_rating}
FROM {target_table}
WHERE {target_table.columns.id} = 'abc'
"""
```
Go to the [Querying Aggregated tables](#querying-aggregated-tables) section for more details on how to query Aggregated tables.
## Advanced: Aggregations + Materialized Views
This section dives deeper into advanced patterns and tradeoffs when building aggregated materialized views.
### Target Tables with `AggregatingMergeTree`
When using an `AggregatingMergeTree` target table, you must use the `AggregateFunction` type to model the result of the aggregation functions:
```python filename="AggTransform.py" copy
from typing import Annotated, TypedDict
from moose_lib import MaterializedView, AggregateFunction, MaterializedViewOptions
class MetricsById(TypedDict):
id: Key[str]
# avg_rating stores result of avgState(events.rating)
# daily_uniques stores result of uniqExactState(events.user_id)
# - uniqExact returns an integer; use number & ClickHouseInt<"uint64"> for precision
# - Aggregated arg type is [string] because the column (events.user_id) is a string
# - Aggregated function name is "uniqExact"
avg_rating: Annotated[float, AggregateFunction(agg_func="avg", param_types=[float])]
# daily_uniques stores result of uniqExactState(events.user_id)
# - uniqExact returns an integer; Annotated[int, ...] to model this result type
# - Aggregated function name is "uniqExact"
# - The column we are aggregating (events.user_id) is a string, so the Aggregated arg type is [string].
daily_uniques: Annotated[int, AggregateFunction(agg_func="uniqExact", param_types=[str])]
# The SELECT must output aggregate states
STMT = """
SELECT
id,
avgState(${events.columns.rating}) AS avg_rating,
uniqExactState(${events.columns.user_id}) AS daily_uniques
FROM ${events}
GROUP BY ${events.columns.id}
"""
# Create the MV (engine config shown in TS example)
mv = MaterializedView[MetricsById](MaterializedViewOptions(
select_statement=STMT,
table_name="metrics_by_id",
materialized_view_name="mv_metrics_by_id",
select_tables=[events],
))
```
- Using `avg()`/`uniqExact()` in the SELECT instead of `avgState()`/`uniqExactState()`
- Forgetting to annotate the schema with `AggregateFunction(...)` so the target table can be created correctly
- Mismatch between `GROUP BY` keys in your `SELECT` and the `order_by_fields` of your target table
### Modeling columns with `AggregateFunction`
- Pattern: `Annotated[U, AggregateFunction(agg_func="avg", param_types=[float])]`
- `U` is the read-time type (e.g., `float`, `int`)
- `agg_func` is the aggregation name (e.g., `avg`, `uniqExact`)
- `param_types` are the argument types. These are the types of the columns that are being aggregated.
```python filename="FunctionToTypeMapping.py" copy
Annotated[int, Aggregated["avg", [int]]] # avgState(col: int)
Annotated[int, Aggregated["uniqExact", [str]]] # uniqExactState(col: str)
Annotated[int, Aggregated["count", []]] # countState(col: any)
Annotated[str, Aggregated["argMax", [str, datetime]]] # argMaxState(col: str, value: datetime)
Annotated[str, Aggregated["argMin", [str, datetime]]] # argMinState(col: str, value: datetime)
Annotated[float, Aggregated["corr", [float, float]]] # corrState(col1: float, col2: float)
Annotated[float, Aggregated["quantiles", [float]]] # quantilesState(levels: float, value: float)
```
### Writing SELECT statements to Aggregated tables
When you write to an `AggregatingMergeTree` table, you must add a `State` suffix to the aggregation functions in your `SELECT` statement.
```python filename="AggTransform.py" copy
from pydantic import BaseModel
from typing import Annotated
from moose_lib import MaterializedView, ClickHouseEngines, AggregateFunction, MaterializedViewOptions
class MetricsById(BaseModel):
id: str
avg_rating: Annotated[float, AggregateFunction(agg_func="avg", param_types=[float])]
total_reviews: Annotated[int, AggregateFunction(agg_func="sum", param_types=[int])]
agg_stmt = '''
SELECT
{reviews.columns.id} AS id,
avgState({reviews.columns.rating}) AS avg_rating,
countState({reviews.columns.id}) AS total_reviews
FROM {reviews}
GROUP BY {reviews.columns.id}
'''
mv = MaterializedView[MetricsById](MaterializedViewOptions(
select_statement=agg_stmt,
select_tables=[reviews],
table_name="metrics_by_id",
engine=ClickHouseEngines.AggregatingMergeTree,
order_by_fields=["id"],
materialized_view_name="mv_metrics_by_id",
))
```
Why states? Finalized values (e.g., `avg()`) are not incrementally mergeable. Storing states lets ClickHouse maintain results efficiently as new data arrives. Docs: https://clickhouse.com/docs/en/sql-reference/aggregate-functions/index and https://clickhouse.com/docs/en/sql-reference/aggregate-functions/combinators#-state
### Querying Aggregated Tables
When you query a table with an `AggregatingMergeTree` engine, you must use aggregate functions with the `Merge` suffix (e.g., `avgMerge`)
```python filename="QueryAgg.py" copy
# Manual finalization using ...Merge
QUERY = """
SELECT
avgMerge(avg_rating) AS avg_rating,
countMerge(total_reviews) AS total_reviews
FROM metrics_by_id
WHERE id = '123'
"""
```
## Choosing the right engine
- Use `MergeTree` for copies/filters/enrichment without aggregation semantics.
- Use `SummingMergeTree` when all measures are additive, and you want compact, eventually-consistent sums.
- Use `AggregatingMergeTree` for non-additive metrics and advanced functions; store states and finalize on read.
- Use `ReplacingMergeTree` for dedup/upserts or as an idempotent staging layer before rollups.
---
## Modeling Tables
Source: moose/olap/model-table.mdx
Model your database schema in code using native TypeScript/Python typing
# Modeling Tables
## Overview
Tables in Moose let you define your database schema entirely in code using native TypeScript/Python typing.
You can integrate tables into your pipelines as destinations for new data or as sources for analytics queries in your downstream transformations, APIs, and more.
```py filename="FirstTable.py" copy
from pydantic import BaseModel
from moose_lib import Key, OlapTable
from pydantic import BaseModel
class MyFirstTable(BaseModel):
id: Key[str]
name: str
age: int
# Create a table named "first_table"
my_table = OlapTable[MyFirstTable]("first_table")
# No export needed - Python modules are automatically discovered
```
## Basic Usage
### Standalone Tables
Create a table directly for custom data flows or when you need fine-grained control:
```py filename="StandaloneTable.py"
from moose_lib import Key, OlapTable
from pydantic import BaseModel
class ExampleSchema(BaseModel):
id: Key[str]
date_field: Date
numeric_field: float
boolean_field: bool
# Create a standalone table named "example_table"
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
example_table = OlapTable[ExampleSchema]("example_table", OlapConfig(
order_by_fields=["id", "date_field"],
engine=ReplacingMergeTreeEngine()
))
```
### Creating Tables in Ingestion Pipelines
For end-to-end data flows, create tables as part of an ingestion pipeline:
```py filename="PipelineTable.py"
from moose_lib import IngestPipeline, Key, OlapTable
from pydantic import BaseModel
class UserEvent(BaseModel):
id: Key[str]
user_id: str
timestamp: Date
event_type: str
from moose_lib import IngestPipeline, IngestPipelineConfig, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
events_pipeline = IngestPipeline[UserEvent]("user_events", IngestPipelineConfig(
ingest_api=True, # Creates a REST API endpoint at POST localhost:4000/ingest/user_events
stream=True, # Creates a Kafka/Redpanda topic
table=OlapConfig( # Creates and configures the table named "user_events"
order_by_fields=["id", "timestamp"],
engine=ReplacingMergeTreeEngine()
)
))
# Access the table component when needed:
events_table = events_pipeline.get_table()
```
## Data Modeling
### Special ClickHouse Types (LowCardinality, Nullable, etc)
```py filename="ClickHouseTypes.py" copy
from moose_lib import Key, clickhouse_decimal, ClickHouseNamedTuple
from typing import Annotated
from pydantic import BaseModel
from datetime import datetime
class Customer(BaseModel):
name: str
address: str
class Order(BaseModel):
order_id: Key[str]
amount: clickhouse_decimal(10, 2)
status: Literal["Paid", "Shipped", "Delivered"] # translated to LowCardinality(String) in ClickHouse
created_at: datetime
customer: Annotated[Customer, "ClickHouseNamedTuple"]
```
### Default values
Use defaults instead of nullable columns to keep queries fast and schemas simple. You can specify defaults at the column level so Moose generates ClickHouse defaults in your table DDL.
```py filename="Defaults.py" copy
from typing import Annotated
from pydantic import BaseModel
from moose_lib import OlapTable, Key, clickhouse_default, clickhouse_decimal
from datetime import datetime
class Event(BaseModel):
id: Key[str]
# Static defaults
status: Annotated[str, clickhouse_default("'pending'")] # DEFAULT 'pending'
retries: Annotated[int, clickhouse_default("0")] # DEFAULT 0
# Server-side timestamps
created_at: Annotated[datetime, clickhouse_default("now()")]
# Decimal with default
amount: Annotated[float, clickhouse_decimal(10, 2)] = 0
events = OlapTable[Event]("events", {
"orderByFields": ["id", "created_at"],
})
```
The value passed into the `clickhouse_default` function can either be a string literal or a stringified ClickHouse SQL expression.
If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.
- Optional without default → ClickHouse Nullable type.
- Optional with default (using `clickhouse_default("18")` in annotations) → non-nullable column with default `18`.
This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.
### Database Selection
By default, tables are created in the database specified in your `moose.config.toml` ClickHouse configuration. You can override this on a per-table basis using the `database` field:
```py filename="DatabaseOverride.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel
class UserData(BaseModel):
id: Key[str]
name: str
email: str
# Table in default database (from moose.config.toml)
default_table = OlapTable[UserData]("users")
# Table in specific database (e.g., "analytics")
analytics_table = OlapTable[UserData]("users", OlapConfig(
database="analytics",
order_by_fields=["id"]
))
```
To use custom databases, configure them in your `moose.config.toml`:
```toml
[clickhouse_config]
db_name = "local"
additional_databases = ["analytics", "staging"]
```
The databases in `additional_databases` will be created automatically when you start your Moose application.
### Primary Keys and Sorting
You must configure table indexing using one of these approaches:
1. Define at least one `Key` in your table schema
2. Specify `orderByFields` in the table config
3. Use both (all `Key` fields must come first in the `orderByFields` array)
```py filename="PrimaryKeyConfig.py" copy
from moose_lib import Key, OlapTable
from pydantic import BaseModel
class Record1(BaseModel):
id: Key[str] # Primary key field
field1: str
field2: int
table1 = OlapTable[Record1]("table1") # id is the primary key
```
### Order By Fields Only
Leverage the `OlapConfig` class to configure your table:
```py filename="OrderByFieldsOnly.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel
from datetime import datetime
class SchemaWithoutPrimaryKey(BaseModel):
field1: str
field2: int
field3: datetime
table2 = OlapTable[SchemaWithoutPrimaryKey]("table2", OlapConfig(
order_by_fields=["field1", "field2"] # Specify ordering without primary key
))
```
### Order By Expression
Use a ClickHouse SQL expression to control ordering directly. This is useful for advanced patterns (functions, transformations) or when you want to disable sorting entirely.
```py filename="OrderByExpression.py" copy
from moose_lib import OlapTable, OlapConfig
from pydantic import BaseModel
from datetime import datetime
class Events(BaseModel):
user_id: str
created_at: datetime
event_type: str
# Equivalent to order_by_fields=["user_id", "created_at", "event_type"]
events = OlapTable[Events]("events", OlapConfig(
order_by_expression="(user_id, created_at, event_type)",
))
# Advanced: functions inside expression
events_by_month = OlapTable[Events]("events_by_month", OlapConfig(
order_by_expression="(user_id, toYYYYMM(created_at))",
))
# No sorting
unsorted = OlapTable[Events]("events_unsorted", OlapConfig(
order_by_expression="tuple()",
))
```
### Using Both Primary Key and Order By Fields
```py filename="ComboKeyAndOrderByFields.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel
class SchemaWithKey(BaseModel):
id: Key[str]
field1: str
field2: int
table3 = OlapTable[SchemaWithKey]("table3", OlapConfig(
order_by_fields=["id", "field1"] # Primary key must be first
))
```
### Using Multiple Primary Keys
```py filename="MultiKeyTable.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from pydantic import BaseModel
class MultiKeyRecord(BaseModel):
key1: Key[str]
key2: Key[int]
field1: str
multi_key_table = OlapTable[MultiKeyRecord]("multi_key_table", OlapConfig(
order_by_fields=["key1", "key2", "field1"] # Multiple keys must come first
))
```
### Table engines
By default, Moose will create tables with the `MergeTree` engine. You can use different engines by setting the `engine` in the table configuration.
```py filename="TableEngine.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import MergeTreeEngine, ReplacingMergeTreeEngine
# Default MergeTree engine
table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"]
))
# Explicitly specify engine
dedup_table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"],
engine=ReplacingMergeTreeEngine()
))
```
#### Deduplication (`ReplacingMergeTree`)
Use the `ReplacingMergeTree` engine to keep only the latest record for your designated sort key:
```py filename="DeduplicatedTable.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
class Record(BaseModel):
id: Key[str]
updated_at: str # Version column
deleted: int = 0 # Soft delete marker (UInt8)
# Basic deduplication
table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"],
engine=ReplacingMergeTreeEngine()
))
# With version column (keeps record with highest version)
versioned_table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"],
engine=ReplacingMergeTreeEngine(ver="updated_at")
))
# With soft deletes (requires ver parameter)
soft_delete_table = OlapTable[Record]("table", OlapConfig(
order_by_fields=["id"],
engine=ReplacingMergeTreeEngine(
ver="updated_at",
is_deleted="deleted" # UInt8 column: 1 marks row for deletion
)
))
```
ClickHouse's ReplacingMergeTree engine runs deduplication in the background AFTER data is inserted into the table. This means that duplicate records may not be removed immediately.
**Version Column (`ver`)**: When specified, ClickHouse keeps the row with the maximum version value for each unique sort key.
**Soft Deletes (`is_deleted`)**: When specified along with `ver`, rows where this column equals 1 are deleted during merges. This column must be UInt8 type.
For more details, see the [ClickHouse documentation](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree).
#### Streaming from S3 (`S3Queue`)
Use the `S3Queue` engine to automatically ingest data from S3 buckets as files are added:
S3Queue tables only process **new files** added to S3 after table creation. When used as a source for materialized views, **no backfill occurs** - the MV will only start populating as new files arrive. See the [Materialized Views documentation](./model-materialized-view#backfill-destination-tables) for more details.
```py filename="S3StreamingTable.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import S3QueueEngine
class S3Event(BaseModel):
id: str
timestamp: datetime
data: dict
# Modern API using engine configuration
s3_events = OlapTable[S3Event]("s3_events", OlapConfig(
engine=S3QueueEngine(
s3_path="s3://my-bucket/data/*.json",
format="JSONEachRow",
# ⚠️ WARNING: See security callout below about credentials
aws_access_key_id="AKIA...",
aws_secret_access_key="secret..."
),
settings={
"mode": "unordered",
"keeper_path": "/clickhouse/s3queue/events"
}
))
```
**Security Risk**: Hardcoding credentials in your code embeds them in Docker images and deployment artifacts, creating serious security vulnerabilities.
**Solution**: Use `mooseRuntimeEnv` for runtime credential resolution:
```py filename="SecureS3Streaming.py" copy
from moose_lib import OlapTable, OlapConfig, moose_runtime_env
from moose_lib.blocks import S3QueueEngine
# ✅ RECOMMENDED: Runtime environment variable resolution
secure_s3_events = OlapTable[S3Event]("s3_events", OlapConfig(
engine=S3QueueEngine(
s3_path="s3://my-bucket/data/*.json",
format="JSONEachRow",
aws_access_key_id=moose_runtime_env.get("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=moose_runtime_env.get("AWS_SECRET_ACCESS_KEY")
),
settings={
"mode": "unordered",
"keeper_path": "/clickhouse/s3queue/events"
}
))
```
**Then set environment variables:**
```bash filename="Terminal" copy
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="your-secret-key"
moose prod up
```
**Benefits:**
- Credentials never embedded in Docker images
- Supports credential rotation (changing passwords triggers table recreation)
- Different credentials per environment (dev/staging/prod)
- Clear error messages if environment variables are missing
S3Queue requires ClickHouse 24.7+ and proper ZooKeeper/ClickHouse Keeper configuration for coordination between replicas. Files are processed exactly once across all replicas.
#### Direct S3 Access (`S3`)
Use the `S3` engine for direct read/write access to S3 storage without streaming semantics:
```py filename="S3Table.py" copy
from moose_lib import OlapTable, OlapConfig, moose_runtime_env
from moose_lib.blocks import S3Engine
# S3 table with credentials (recommended with moose_runtime_env)
s3_data = OlapTable[DataRecord]("s3_data", OlapConfig(
engine=S3Engine(
path="s3://my-bucket/data/file.json",
format="JSONEachRow",
aws_access_key_id=moose_runtime_env.get("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=moose_runtime_env.get("AWS_SECRET_ACCESS_KEY"),
compression="gzip"
)
))
# Public S3 bucket (no authentication needed - just omit credentials)
public_s3 = OlapTable[DataRecord]("public_s3", OlapConfig(
engine=S3Engine(
path="s3://public-bucket/data/*.parquet",
format="Parquet"
)
))
```
- **S3**: Direct read/write access to S3 files. Use for batch processing or querying static data.
- **S3Queue**: Streaming engine that automatically processes new files as they arrive. Use for continuous data ingestion.
Both engines support the same credential management and format options.
#### In-Memory Buffer (`Buffer`)
The `Buffer` engine provides an in-memory buffer that flushes data to a destination table based on time, row count, or size thresholds:
```py filename="BufferTable.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import MergeTreeEngine, BufferEngine
# First create the destination table
destination_table = OlapTable[Record]("destination", OlapConfig(
engine=MergeTreeEngine(),
order_by_fields=["id", "timestamp"]
))
# Then create buffer that writes to it
buffer_table = OlapTable[Record]("buffer", OlapConfig(
engine=BufferEngine(
target_database="local",
target_table="destination",
num_layers=16,
min_time=10, # Min 10 seconds before flush
max_time=100, # Max 100 seconds before flush
min_rows=10000, # Min 10k rows before flush
max_rows=1000000, # Max 1M rows before flush
min_bytes=10485760, # Min 10MB before flush
max_bytes=104857600 # Max 100MB before flush
)
))
```
- Data in buffer is **lost if server crashes** before flush
- Not suitable for critical data that must be durable
- Best for high-throughput scenarios where minor data loss is acceptable
- Buffer and destination table must have identical schemas
- Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on buffer tables
For more details, see the [ClickHouse Buffer documentation](https://clickhouse.com/docs/en/engines/table-engines/special/buffer).
#### Distributed Tables (`Distributed`)
The `Distributed` engine creates a distributed table across a ClickHouse cluster for horizontal scaling:
```py filename="DistributedTable.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import DistributedEngine
# Distributed table across cluster
distributed_table = OlapTable[Record]("distributed_data", OlapConfig(
engine=DistributedEngine(
cluster="my_cluster",
target_database="default",
target_table="local_table",
sharding_key="cityHash64(id)" # Optional: how to distribute data
)
))
```
- Requires a configured ClickHouse cluster with remote_servers configuration
- The local table must exist on all cluster nodes
- Distributed tables are virtual - data is stored in local tables
- Cannot use `orderByFields`, `partitionBy`, or `sampleByExpression` on distributed tables
- The `cluster` name must match a cluster defined in your ClickHouse configuration
For more details, see the [ClickHouse Distributed documentation](https://clickhouse.com/docs/en/engines/table-engines/special/distributed).
#### Replicated Engines
Replicated engines provide high availability and data replication across multiple ClickHouse nodes. Moose supports all standard replicated MergeTree variants:
- `ReplicatedMergeTree` - Replicated version of MergeTree
- `ReplicatedReplacingMergeTree` - Replicated with deduplication
- `ReplicatedAggregatingMergeTree` - Replicated with aggregation
- `ReplicatedSummingMergeTree` - Replicated with summation
```py filename="ReplicatedEngines.py" copy
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import (
ReplicatedMergeTreeEngine,
ReplicatedReplacingMergeTreeEngine,
ReplicatedAggregatingMergeTreeEngine,
ReplicatedSummingMergeTreeEngine
)
class Record(BaseModel):
id: Key[str]
updated_at: datetime
deleted: int = 0
# Basic replicated table with explicit paths
replicated_table = OlapTable[Record]("records", OlapConfig(
order_by_fields=["id"],
engine=ReplicatedMergeTreeEngine(
keeper_path="/clickhouse/tables/{database}/{shard}/records",
replica_name="{replica}"
)
))
# Replicated with deduplication
replicated_dedup = OlapTable[Record]("dedup_records", OlapConfig(
order_by_fields=["id"],
engine=ReplicatedReplacingMergeTreeEngine(
keeper_path="/clickhouse/tables/{database}/{shard}/dedup_records",
replica_name="{replica}",
ver="updated_at",
is_deleted="deleted"
)
))
# For ClickHouse Cloud or Boreal (no parameters needed)
cloud_replicated = OlapTable[Record]("cloud_records", OlapConfig(
order_by_fields=["id"],
engine=ReplicatedMergeTreeEngine()
))
```
The `keeper_path` and `replica_name` parameters are **optional** for replicated engines:
- **Omit both parameters** (recommended): Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments. The default path pattern `/clickhouse/tables/{uuid}/{shard}` with replica `{replica}` works automatically with Atomic databases (default in modern ClickHouse).
- **Provide custom paths**: You can still specify both parameters explicitly if you need custom replication paths for your self-managed cluster.
**Note**: Both parameters must be provided together, or both omitted. The `{uuid}`, `{shard}`, and `{replica}` macros are automatically substituted by ClickHouse at runtime.
For more details, see the [ClickHouse documentation on data replication](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication).
### Irregular column names and Python Aliases
If a ClickHouse column name isn't a valid Python identifier or starts with an underscore,
you can use a safe Python field name and set a Pydantic alias to the real column name.
MooseOLAP then uses the alias for ClickHouse DDL and data mapping,
so your model remains valid while preserving the true column name.
```python
from pydantic import BaseModel, Field
class CHUser(BaseModel):
# ClickHouse: "_id" → safe Python attribute with alias
UNDERSCORE_PREFIXED_id: str = Field(alias="_id")
# ClickHouse: "user name" → replace spaces, keep alias
user_name: str = Field(alias="user name")
```
## Externally Managed Tables
If you have a table that is managed by an external system (e.g Change Data Capture like ClickPipes), you can still use Moose to query it. You can set the config in the table config to set the lifecycle to `EXTERNALLY_MANAGED`.
```py filename="ExternallyManagedTable.py" copy
from moose_lib import OlapTable, OlapConfig, LifeCycle
# Table managed by external system
external_table = OlapTable[UserData]("external_users", OlapConfig(
order_by_fields=["id", "timestamp"],
life_cycle=LifeCycle.EXTERNALLY_MANAGED # Moose won't create or modify this table in prod mode
))
```
Learn more about the different lifecycle options and how to use them in the [LifeCycle Management](/stack/olap/lifecycle) documentation.
## Invalid Configurations
```py filename="InvalidConfig.py" copy
from moose_lib import Key, OlapTable, OlapConfig
from typing import Optional
class BadRecord1(BaseModel):
field1: str
field2: int
bad_table1 = OlapTable[BadRecord1]("bad_table1") ## No primary key or orderByFields
class BadRecord2(BaseModel):
id: Key[str]
field1: str
bad_table2 = OlapTable[BadRecord2]("bad_table2", OlapConfig(
order_by_fields=["field1", "id"] # Wrong order - primary key must be first
))
class BadRecord3(BaseModel):
id: Key[str]
field1: str
field2: Optional[int]
bad_table3 = OlapTable[BadRecord3]("bad_table3", OlapConfig(
order_by_fields=["id", "field2"] # Can't have nullable field in orderByFields
))
```
## Development Workflow
### Local Development with Hot Reloading
One of the powerful features of Moose is its integration with the local development server:
1. Start your local development server with `moose dev`
2. When you define or modify an `OlapTable` in your code and save the file:
- The changes are automatically detected
- The TypeScript compiler plugin processes your schema definitions
- The infrastructure is updated in real-time to match your code changes
- Your tables are immediately available for testing
For example, if you add a new field to your schema:
```py filename="HotReloading.py" copy
# Before
class BasicSchema(BaseModel):
id: Key[str]
name: str
# After adding a field
class BasicSchema(BaseModel):
id: Key[str]
name: str
created_at: datetime
```
The Moose framework will:
1. Detect the change when you save the file
2. Update the table schema in the local ClickHouse instance
3. Make the new field immediately available for use
### Verifying Your Tables
You can verify your tables were created correctly using:
```bash filename="Terminal" copy
# List all tables in your local environment
moose ls
```
#### Connecting to your local ClickHouse instance
You can connect to your local ClickHouse instance with your favorite database client. Your credentials are located in your `moose.config.toml` file:
```toml filename="moose.config.toml" copy
[clickhouse_config]
db_name = "local"
user = "panda"
password = "pandapass"
use_ssl = false
host = "localhost"
host_port = 18123
native_port = 9000
```
---
## Modeling Views
Source: moose/olap/model-view.mdx
Define standard ClickHouse Views for read-time projections
# Modeling Views
## Overview
Views are read-time projections in ClickHouse. A static `SELECT` defines the view over one or more base tables or other views. Moose wraps [ClickHouse `VIEW`](https://clickhouse.com/docs/en/sql-reference/statements/create/view) with a simple `View` class in Python. You provide the view name, the `SELECT`, and the list of source tables/views so Moose can order DDL correctly during migrations.
Use `View` when you want a virtual read-time projection and don’t need write-time transformation or a separate storage table. For write-time pipelines and backfills, use a Materialized View instead.
## Basic Usage
```python filename="BasicUsage.py" copy
from moose_lib import View
from tables import users, events
active_user_events = View(
"active_user_events",
"""
SELECT
{events.columns.id} AS event_id,
{users.columns.id} AS user_id,
{users.columns.name} AS user_name,
{events.columns.ts} AS ts
FROM {events}
JOIN {users} ON {events.columns.user_id} = {users.columns.id}
WHERE {users.columns.active} = 1
""",
[events, users],
)
```
## Quick Reference
```python filename="Signature.py" copy
# View(name: str, select_statement: str, base_tables: list[OlapTable | View])
View(
"view_name",
"SELECT ... FROM {someTable}",
[someTable],
)
```
The `SELECT` should be static (no runtime parameters). In TypeScript, prefer Moose’s `sql` template for safe table/column interpolation; in Python, use string templates with `{table.columns.col}`.
---
## Planned Migrations (OLAP)
Source: moose/olap/planned-migrations.mdx
Generate, review, and safely execute ClickHouse DDL plans
# Planned Migrations
Migration planning is a new way to have more fine-grained control over HOW database schema changes are applied to your database when you deploy your code into production.
## Why planned migrations?
Most database migrations are designed under the assumption that your code is the sole owner of the database schema. In OLAP databases, we have to be more careful and assume that schema changes can happen at any time:
- The database schema is shared with other services (e.g. Change Data Capture services like ClickPipes)
- Other users (e.g. analysts) of the database may have credentials that let them change the schema
This is why the plan is generated from the remote environment, and validated against the live state of the database at the time of deployment. If it detects a drift, it will abort the deployment and require you to regenerate the plan, to make sure you are not dropping data unintentionally.
Planned migrations apply only to OLAP (ClickHouse) schema changes. Streaming, APIs, and processes are unaffected by this flow.
## What this does
- Generates an ordered set of ClickHouse operations and writes them to `./migrations/plan.yaml`
- Saves two validation snapshots for drift detection:
- `./migrations/remote_state.json` (state when plan was created)
- `./migrations/local_infra_map.json` (desired state from your local code)
- When enabled, validates state and executes the exact reviewed operations
## Prerequisites
```toml file="moose.config.toml"
[features]
olap = true
ddl_plan = true
```
## Generating a Plan
Once done editing your code in your feature branch, you can generate a plan that diffs your local code against your live remote database:
**For Moose server deployments:**
```bash filename="Terminal" copy
moose generate migration --url https:// --token --save
```
**For serverless deployments:**
```bash filename="Terminal" copy
moose generate migration --clickhouse-url clickhouse://user:pass@host:port/db --save
```
Outputs:
```text
./migrations/plan.yaml
./migrations/remote_state.json
./migrations/local_infra_map.json
```
What each file contains:
- `remote_state.json`: The state of the remote database when the plan was generated.
- `local_infra_map.json`: The state of the local code when the plan was generated.
- `plan.yaml`: The plan to apply to the remote database based on the diff between the two states.
You will commit the entire `migrations/` directory to version control, and Moose will automatically apply the plan when you deploy the code to production.
## Review and edit the plan
Moose makes some assumptions about your schema changes, such as renaming a column instead of dropping and adding. You can modify the plan to override these assumptions.
Open `plan.yaml` in your PR. Operations are ordered (teardown first, then setup) to avoid dependency issues. Review like regular code. You can also edit the plan to override the default assumptions Moose makes.
```yaml filename="migrations/plan.yaml" copy
# Drop a deprecated column
- DropTableColumn:
table: "events"
column_name: "deprecated_field"
# Rename a column to match code
- RenameTableColumn:
table: "events"
before_column_name: "createdAt"
after_column_name: "created_at"
# Add a new nullable column after created_at
- AddTableColumn:
table: "events"
column:
name: "status"
data_type: "String"
required: false
unique: false
primary_key: false
default: null
annotations: []
comment: null
after_column: "created_at"
# Change a column type to Nullable(Float64)
- ModifyTableColumn:
table: "events"
before_column:
name: "value"
data_type: "Float64"
required: false
unique: false
primary_key: false
default: null
annotations: []
comment: null
after_column:
name: "value"
data_type:
Nullable:
nullable: "Float64"
required: false
unique: false
primary_key: false
default: null
annotations: []
comment: null
# Create a simple view via raw SQL
- RawSql:
sql:
- "CREATE VIEW IF NOT EXISTS `events_by_user` AS SELECT user_id, count() AS c FROM events GROUP BY user_id"
description: "Creating view events_by_user"
```
You can edit the plan to override the default assumptions Moose makes.
### When to edit the plan
There are two main reasons to edit the plan:
1. To "override" the default assumptions Moose makes when it cannot infer the intent of your schema changes, such as renaming a column instead of dropping and adding.
2. To add new operations that are not covered by the default assumptions, such as adding a backfill operation to a new column.
#### Rename a column instead of drop/add
When you rename a column, Moose will default to dropping and adding the column. However, you can override this by using the `RenameTableColumn` operation:
```yaml filename="migrations/plan.yaml" copy
- DropTableColumn:
table: source_table
column_name: created_at
- AddTableColumn:
table: source_table
column:
name: createdAt
data_type: DateTime
required: true
unique: false
primary_key: false
default: null
annotations: []
after_column: color
```
In the plan, you can override this by using the `RenameTableColumn` operation:
```yaml filename="migrations/plan.yaml" copy
created_at: 2025-08-20T05:35:31.668353Z
- RenameTableColumn:
table: source_table
before_column_name: created_at
after_column_name: createdAt
```
#### Add a backfill operation to a new column
When you add a new column, Moose will default to backfilling the column based on the value in the `default` field.
If your field is a `DateTime`, you can edit the plan to set the default value to the current timestamp:
```yaml filename="migrations/plan.yaml" copy
- AddTableColumn:
table: "source_table"
column:
name: "created_at"
data_type: "DateTime"
required: false
unique: false
default: NOW ## Specify the default value to the current timestamp
```
You can also override the the default behavior by using the `RawSql` operation to define your own custom backfill logic:
```yaml filename="migrations/plan.yaml" copy
- AddTableColumn:
table: "source_table"
column:
name: "created_at"
data_type: "DateTime"
required: false
unique: false
default: null
- RawSql:
sql:
- "UPDATE events SET created_at = toDateTime(created_at_ms / 1000) WHERE created_at IS NULL"
description: "Backfill created_at from created_at_ms"
```
## Deployment Flows
### Moose Server Deployments
For Moose server deployments (with `moose prod` running), migrations are applied automatically on startup. Generate plans using:
```bash filename="Terminal" copy
moose generate migration --url https:// --token --save
```
When you deploy, Moose validates the plan and executes it automatically.
### Serverless Deployments
For serverless deployments (no Moose server), you manage migrations manually using the ClickHouse connection directly:
```toml file="moose.config.toml"
[state_config]
storage = "clickhouse"
[features]
olap = true
data_model_v2 = true
```
**Workflow:**
1. **Generate the plan** from your ClickHouse database:
```bash filename="Terminal" copy
moose generate migration --clickhouse-url --save
```
2. **Review** the generated `./migrations/` files in your PR
3. **Execute the plan** against your ClickHouse with CI/CD or manually:
```bash filename="Terminal" copy
moose migrate --clickhouse-url
```
Before applying the plan, Moose will first validate that the snapshot of your database that was taken when you generated the plan is still the same as the current database state. If it is not, Moose will abort the deployment. If it is, Moose will execute the plan in `plan.yaml` against your production database.
Execution rules:
- If current tables in your live production database differ from `remote_state.json`, Moose aborts (remote drift since planning).
- If desired tables in your local code differ from `local_infra_map.json`, Moose aborts (code changed since planning).
- If both match, `plan.yaml` operations are executed in order against ClickHouse.
## Troubleshooting
- Failure to connect to remote database? Make sure you have [your admin API key setup correctly](./apis/auth#admin-endpoints)
- Plan rejected due to drift: Re-generate a plan against the current remote, review, and retry.
- No execution in moose server deployments: Ensure `ddl_plan = true` and `./migrations/plan.yaml` exists.
- OLAP disabled: Ensure `[features].olap = true`.
---
## Querying Data
Source: moose/olap/read-data.mdx
Query OLAP tables using SQL with type safety
# Querying Data
Moose provides type-safe SQL querying for your `OlapTable` and `MaterializedView` instances. Use cases include:
- Building APIs to expose your data to client/frontend applications
- Building transformation pipelines inside your database with materialized views
## Querying with MooseClient
Use `MooseClient` to query data from existing tables and materialized views.
### Basic Querying
You can use a formatted string with `execute`:
```py filename="BasicQuerying.py"
from moose_lib import MooseClient
from app.UserTable import UserTable
client = MooseClient()
status = "active"
limit = 10
query = """
SELECT id, name, email
FROM {table}
WHERE status = {status}
LIMIT {limit}
"""
rows = client.query.execute(query, {"table": UserTable, "status": status, "limit": limit})
rows = client.query.execute(query)
```
This allows you to safely interpolate the table and column names while still using your Moose OlapTables and columns.
If you'd rather just use the raw ClickHouse python driver with server-side parameter binding, you can use `execute_raw`:
```py filename="BasicQuerying.py"
from moose_lib import MooseClient
client = MooseClient()
# Query existing table using execute_raw with explicit ClickHouse types
query = """
SELECT id, name, email
FROM users
WHERE status = {status:String}
LIMIT {limit:UInt32}
"""
rows = client.query.execute_raw(query, {
"status": "active",
"limit": 10
})
```
### Querying Materialized Views
You can use a formatted string with `execute`:
```py filename="QueryMaterializedView.py"
from moose_lib import MooseClient
client = MooseClient()
min_orders = 10
query = """
SELECT user_id, total_orders, average_order_value
FROM user_stats_view
WHERE total_orders > {min_orders}
ORDER BY average_order_value DESC
"""
rows = client.query.execute(query, {"min_orders": min_orders})
```
Use `execute_raw` with parameter binding:
```py filename="QueryMaterializedView.py"
from moose_lib import MooseClient
client = MooseClient()
min_orders = 10
# Query existing materialized view
query = """
SELECT user_id, total_orders, average_order_value
FROM user_stats_view
WHERE total_orders > {min_orders:UInt32}
ORDER BY average_order_value DESC
"""
rows = client.query.execute_raw(query, {"min_orders": min_orders})
```
## Select With Column and Table References
```py filename="TypedReferences.py"
from moose_lib import MooseClient
from app.UserTable import UserTable
client = MooseClient()
status = "active"
query = """
SELECT
{column}
FROM {table}
WHERE status = {status}
"""
rows = client.query.execute(query, {"column": UserTable.cols.id, "table": UserTable, "status": status})
```
```python copy
from moose_lib import MooseClient
client = MooseClient()
# Use parameter binding with explicit identifiers
query = """
SELECT
id,
name,
email
FROM {table: Identifier}
WHERE status = {status:String}
"""
rows = client.query.execute_raw(query, {"table": UserTable.name, "status": "active"})
```
## Filtering with WHERE Clauses
```py copy
from moose_lib import MooseClient
client = MooseClient()
status = "active"
start_date = "2024-01-01"
search_pattern = "%example%"
min_age = 18
max_age = 65
user_ids = [1, 2, 3, 4, 5]
# Multiple WHERE conditions
filter_query = """
SELECT id, name
FROM {table}
WHERE status = {status}
AND created_at > {start_date}
AND email ILIKE {search_pattern}
"""
# Using BETWEEN
range_query = """
SELECT * FROM {table}
WHERE age BETWEEN {min_age} AND {max_age}
"""
# Using IN
in_query = """
SELECT * FROM {table}
WHERE id IN {user_ids}
"""
# Execute examples
filter_rows = client.query.execute(filter_query, {"table": UserTable, "status": status, "startDate": start_date, "searchPattern": search_pattern})
range_rows = client.query.execute(range_query, {"table": UserTable, "minAge": min_age, "maxAge": max_age})
in_rows = client.query.execute(in_query, {"table": UserTable, "userIds": user_ids})
```
```py filename="WhereClauses.py"
from moose_lib import MooseClient
client = MooseClient()
# Multiple WHERE conditions
filter_query = """
SELECT id, name
FROM users
WHERE status = {status:String}
AND created_at > {startDate:DateTime}
AND email ILIKE {searchPattern:String}
"""
# Using BETWEEN
range_query = """
SELECT * FROM users
WHERE age BETWEEN {minAge:UInt32} AND {maxAge:UInt32}
"""
# Using IN with typed arrays
in_query = """
SELECT * FROM users
WHERE id IN {userIds:Array(UInt32)}
"""
# Execute examples
filter_rows = client.query.execute_raw(filter_query, {
"status": "active",
"startDate": "2024-01-01",
"searchPattern": "%example%"
})
range_rows = client.query.execute_raw(range_query, {
"minAge": 18,
"maxAge": 65
})
in_rows = client.query.execute_raw(in_query, {
"userIds": [1, 2, 3, 4, 5]
})
```
## Dynamic Query Building
Moose provides two distinct approaches for executing queries in Python. Choose the right one for your use case:
- Option 1: Use formatted strings with `execute`
- Option 2: Use `execute_raw` with parameter binding (lowest level of abstraction)
```py filename="execute.py"
from moose_lib import MooseClient
from pydantic import BaseModel, Field, validator
from typing import Optional
client = MooseClient()
# Example: Static query with validated parameters
def get_active_users(status: str, limit: int):
# Static table/column names, validated parameters
query = """
SELECT id, name, email
FROM {table}
WHERE status = {status}
LIMIT {limit}
"""
return client.query.execute(query, {"table": UserTable, "status": status, "limit": limit})
# Usage with validated input
active_users = get_active_users("active", 10)
class UserQueryParams(BaseModel):
status: str = Field(..., pattern=r"^(active|inactive|pending)$")
limit: int = Field(default=10, ge=1, le=1000)
def build_validated_query(params: UserQueryParams):
# All parameters are validated by Pydantic
query = """
SELECT id, name, email
FROM {table}
WHERE status = {status}
LIMIT {limit}
"""
return client.query.execute(query, {"table": UserTable, "status": params.status, "limit": params.limit})
```
```py filename="ParameterBinding.py"
from moose_lib import MooseClient
client = MooseClient()
# Example: Dynamic table and column selection with server-side parameter binding
def query_user_data(table_name: str, status_filter: str, limit: int):
# Dynamic identifiers in query structure, bound parameters for values
query = """
SELECT id, name, email
FROM {table_name:Identifier}
WHERE status = {status:String}
AND created_at > {startDate:DateTime}
LIMIT {limit:UInt32}
"""
return client.query.execute_raw(query, {
"table_name": table_name, # Bound parameter
"status": status_filter, # Bound parameter
"startDate": "2024-01-01T00:00:00", # Bound parameter
"limit": limit # Bound parameter
})
# Usage with different tables
users_data = query_user_data("users", "active", 10)
admins_data = query_user_data("admin_users", "pending", 5)
# Conditional WHERE clauses
def build_conditional_query(client: MooseClient, params: FilterParams):
conditions: list[str] = []
parameters: dict = {}
if params.min_age is not None:
conditions.append("age >= {minAge:UInt32}")
parameters["minAge"] = params.min_age
if params.status:
conditions.append("status = {status:String}")
parameters["status"] = params.status
if params.search_text:
conditions.append("(name ILIKE {searchPattern:String} OR email ILIKE {searchPattern:String})")
parameters["searchPattern"] = f"%{params.search_text}%"
query = "SELECT * FROM users"
if conditions:
query += " WHERE " + " AND ".join(conditions)
query += " ORDER BY created_at DESC"
return client.query.execute_raw(query, parameters)
```
## Building APIs
To build REST APIs that expose your data, see the [Bring Your Own API Framework documentation](/moose/app-api-frameworks) for comprehensive examples and patterns using Express, Koa, Fastify, or FastAPI.
## Common Pitfalls
## Performance Optimization
If your query is slower than expected, there are a few things you can check:
- If using filters, try to filter on a column that is defined in the `orderByFields` of the table
- For common queries, consider [creating a materialized view](/stack/olap/create-materialized-view) to pre-compute the result set
## Further Reading
---
## moose/olap/schema-change
Source: moose/olap/schema-change.mdx
# Handling Failed Migrations
One of the main benefits of the Moose local development environment is that you can detect breaking schema changes before they happen in production. This can be specifically useful for identifying incompatible data type changes when you change a column's data type and the generated migration cannot cast the existing data to the new type.
This page describes how to recover from a failed migration in dev and gives a playbook for safely achieving the desired type change.
## What happened
You changed a column’s data type on a table that already has data. The dev migration tried to run an in-place ALTER and ClickHouse created a mutation that failed (incompatible cast, nullability, defaults, etc.).
Symptoms:
- Failed migration in dev
- A stuck mutation on the table
- Reverting your code type alone doesn’t help until the mutation is cleared
## Quick recovery (dev)
Follow these steps to get unblocked quickly.
### View the terminal logs to see the failing mutation
In your terminal, you should see a message like this:
```txt
⢹ Processing Infrastructure changes from file watcher
~ Table events:
Column changes:
~ value: String -> Float64
Applying: ALTER TABLE events MODIFY COLUMN value Float64
ClickHouse mutation created: mutation_id='00000001-0000-4000-8000-000000000123'
Error: Code: 368. Conversion failed: cannot parse 'abc' as Float64 (column: value)
Status: mutation failed; table may be partially transformed
```
Copy the mutation ID from the terminal logs and run the following command to kill the mutation.
### Kill the mutation
- If you have the `mutation_id`:
```sql
KILL MUTATION WHERE mutation_id = '';
```
- If you didn’t capture the ID, find it and kill by table:
```sql
SELECT mutation_id, command, is_done, latest_fail_reason
FROM system.mutations
WHERE database = currentDatabase() AND table = ''
ORDER BY create_time DESC;
KILL MUTATION WHERE database = currentDatabase() AND table = '';
```
ClickHouse ALTERs are implemented as asynchronous mutations, not transactional. If a mutation fails mid-way, some parts may have been rewritten while others were not, leaving the table partially transformed. The failed mutation also remains queued until you kill it. Clear the mutation first, then proceed.
Soon, Moose will automatically generate a local DDL plan that kills the mutation and "rolls back" the transformation to the data that was changed before the failure occurred.
### Revert your code to match the current DB schema
- Change the column type in code back to the previous (working) type
- Save your changes; let `moose dev` resync. You should be able to query the table again
If the table only has disposable dev data, you can also `TRUNCATE TABLE .` or drop/recreate the table and let `moose dev` rebuild it. Only do this in dev.
## Safely achieving the desired type change
Instead of editing the column type in place, you can add a new column with the target type and backfill the data. This is the recommended approach.
### Add a new column + backfill
Then, generate a plan to add the new column and backfill the data.
```bash
moose generate migration --url --save --token
```
Open the generated `/migrations/plan.yaml` file. You'll see the `AddTableColumn` operation to add the new column. Right after it, you can add a `RawSql` operation to backfill the data. Here you can write an `ALTER TABLE` statement to update the new column with the data from the old column:
```yaml filename="migrations/plan.yaml"
- AddTableColumn:
table: "events"
column:
name: "status_v2"
data_type:
Nullable:
nullable: "StatusEnum"
default: null
- RawSql:
sql:
- "ALTER TABLE events UPDATE status_v2 = toStatusEnumOrNull(status) WHERE status_v2 IS NULL"
description: "Backfill status_v2 from status"
```
Then, when writing to the table, double write to both columns.
This allows for all surrounding processes and applications that rely on the old column to continue working, and you can later deprecate the old column and rename the new column when you are ready.
### Later, deprecate the old column and rename the new column
Once the column backfill is complete and you are ready to deprecate the old column, you can rename the new column to the old column name and apply this in a new, subsequent PR.
In your code, you can rename the column and deprecate the old column:
```python filename="app/tables/events.py" copy
class Event(BaseModel):
id: Key[str]
name: str
created_at: datetime
status_old: str
status: StatusEnum
table = OlapTable[Event]("events")
```
Initially you'll see two `DeleteTableColumn` operations, followed by two `AddTableColumn` operations.
*IMPORTANT*: DELETE ALL FOUR GENERATED `DeleteTableColumn` AND `AddTableColumn` OPERATIONS WITH THE FOLLOWING:
```yaml filename="migrations/plan.yaml"
- RenameTableColumn:
table: "events"
before_column_name: "status"
after_column_name: "status_old"
- RenameTableColumn:
table: "events"
before_column_name: "status_v2"
after_column_name: "status"
```
Once the old column is no longer needed, you can drop it in a third PR.
```yaml filename="migrations/plan.yaml"
- DropTableColumn:
table: "events"
column_name: "status_old"
```
## Common breaking cases
- String -> Int/Float: can fail on non-numeric rows; prefer `toInt64OrNull(...)`/`toFloat64OrNull(...)` + backfill
- Nullable(T) -> T (NOT NULL): fails if any NULLs exist and no default is provided; backfill then drop nullability
- Narrowing types (e.g., Int64 -> Int32): fails if values overflow; validate and transform first
Read about migration planning and how to use it to safely manage schema changes in production.
---
## moose/olap/schema-optimization
Source: moose/olap/schema-optimization.mdx
# Schema Optimization
Choosing the right data types and column ordering for your tables is crucial for ClickHouse performance and storage efficiency. Poor schema design can lead to 10-100x slower queries and 2-5x larger storage requirements.
## Data Types
Keep the following best practices in mind when defining your column types:
### Avoid Nullable Columns
Nullable columns in ClickHouse have significant performance overhead.
Instead of using `| None` or `Optional[type]`, add the `Annotated[type, clickhouse_default("...")]` to your column type.
```py filename="AvoidNullable.py"
from moose_lib import OlapTable
from pydantic import BaseModel, Field
# ❌ Bad: Using nullable columns
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
description: str | None = None # Nullable
metadata: dict | None = None # Nullable
created_at: Date
# ✅ Good: Use default values instead
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
description: str = "" # Default empty string
metadata: dict = Field(default_factory=dict) # Default empty dict
created_at: Date
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["id", "created_at"]
})
```
### Use `LowCardinality` where possible
`LowCardinality` is ClickHouse's most efficient string type for columns with limited unique values.
```py filename="LowCardinality.py"
from moose_lib import OlapTable, LowCardinality
from pydantic import BaseModel
from typing import Literal
class UserEvent(BaseModel):
id: str
user_id: str
event_type: Annotated[str, "LowCardinality"] # ✅ Good for limited values
status: Literal["active", "inactive", "pending"]# ✅ Literals become LowCardinality automatically
country: Annotated[str, "LowCardinality"] # ✅ Good for country codes
user_agent: str # ❌ Keep as String for high cardinality
created_at: Date
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["id", "created_at"]
})
```
### Pick the right Integer types
Choose the smallest integer type that fits your data range to save storage and improve performance.
```py filename="IntegerTypes.py"
from moose_lib import OlapTable, UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64
from pydantic import BaseModel
class UserEvent(BaseModel):
id: str
user_id: str
age: Annotated[int, "int8"] # ✅ 0-255 (1 byte)
score: Annotated[int, "uint16"] # ✅ -32,768 to 32,767 (2 bytes)
view_count: Annotated[int, "int32"] # ✅ 0 to ~4 billion (4 bytes)
timestamp: Annotated[int, "int64"] # ✅ Unix timestamp (8 bytes)
event_type: str
created_at: Date
# Integer type ranges:
# UInt8: 0 to 255
# UInt16: 0 to 65,535
# UInt32: 0 to 4,294,967,295
# UInt64: 0 to 18,446,744,073,709,551,615
# Int8: -128 to 127
# Int16: -32,768 to 32,767
# Int32: -2,147,483,648 to 2,147,483,647
# Int64: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
```
### Use the right precision for `DateTime`
Choose appropriate DateTime precision based on your use case to balance storage and precision.
```py filename="DateTimePrecision.py"
from moose_lib import OlapTable, DateTime, DateTime64
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
created_at: datetime # ✅ Second precision (default)
updated_at: clickhouse_datetime(3) # ✅ Millisecond precision
processed_at: clickhouse_datetime(6) # ✅ Microsecond precision
logged_at: clickhouse_datetime(9) # ✅ Nanosecond precision
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["id", "created_at"]
})
```
### Use Decimal over Float
Use `Decimal` for financial and precise calculations to avoid floating-point precision issues.
```py filename="DecimalOverFloat.py"
from moose_lib import OlapTable, Decimal
from pydantic import BaseModel
class Order(BaseModel):
id: str
user_id: str
amount: clickhouse_decimal(10, 2) # ✅ 10 total digits, 2 decimal places
tax: clickhouse_decimal(8, 2) # ✅ 8 total digits, 2 decimal places
discount: clickhouse_decimal(5, 2) # ✅ 5 total digits, 2 decimal places
total: clickhouse_decimal(12, 2) # ✅ 12 total digits, 2 decimal places
created_at: Date
# ❌ Bad: Using float for financial data
class BadOrder(BaseModel):
id: str
amount: float # Float - can cause precision issues
tax: float # Float - can cause precision issues
orders_table = OlapTable[Order]("orders", {
"orderByFields": ["id", "created_at"]
})
```
### Use `NamedTuple` over `Nested`
`NamedTuple` is more efficient than `Nested` for structured data in ClickHouse.
```py filename="NamedTupleOverNested.py"
from moose_lib import OlapTable, NamedTuple
from pydantic import BaseModel
class Location(BaseModel):
latitude: float
longitude: float
city: str
country: str
class Metadata(BaseModel):
version: str
source: str
priority: int
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
location: Annotated[Location, "ClickHouseNamedTuple"] # lat, lon, city, country
metadata: Annotated[Metadata, "ClickHouseNamedTuple"] # version, source, priority
created_at: Date
# ❌ Bad: Using nested objects (less efficient)
class BadUserEvent(BaseModel):
id: str
location: Location # Nested - less efficient
metadata: Metadata # Nested - less efficient
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["id", "created_at"]
})
```
## Ordering
### Choose columns that you will use in WHERE and GROUP BY clauses
Optimize your `orderByFields` (or `orderByExpression`) for your most common query patterns.
```py filename="OrderByOptimization.py"
from moose_lib import OlapTable
from pydantic import BaseModel
class UserEvent(BaseModel):
id: str
user_id: str
event_type: str
status: str
created_at: Date
country: str
# ✅ Good: Optimized for common query patterns
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["user_id", "created_at", "event_type"] # Most common filters first
})
# Common queries this optimizes for:
# - WHERE user_id = ? AND created_at > ?
# - WHERE user_id = ? AND event_type = ?
# - GROUP BY user_id, event_type
```
### `ORDER BY` should prioritize LowCardinality columns first
Place `LowCardinality` columns first in your `order_by_fields` (or reflect this priority in your `order_by_expression`) for better compression and query performance.
```py filename="LowCardinalityOrdering.py"
from moose_lib import OlapTable, LowCardinality
from pydantic import BaseModel
class UserEvent(BaseModel):
id: str
user_id: str
event_type: LowCardinality[str] # ✅ Low cardinality
status: LowCardinality[str] # ✅ Low cardinality
country: LowCardinality[str] # ✅ Low cardinality
created_at: Date # High cardinality
session_id: str # High cardinality
# ✅ Good: LowCardinality columns first
user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["event_type", "status", "country", "created_at", "session_id"]
})
# ❌ Bad: High cardinality columns first
bad_user_events_table = OlapTable[UserEvent]("user_events", {
"orderByFields": ["created_at", "session_id", "event_type", "status"] # Less efficient
})
```
---
## Schema Versioning with Materialized Views
Source: moose/olap/schema-versioning.mdx
Use table versions and materialized views to migrate breaking schema changes safely
# Table Versioning & Blue/Green Migrations
## Overview
Changing a table's storage layout (engine or sorting key) in ClickHouse requires a full table rewrite. Doing it in-place can block or slow concurrent reads and writes due to heavy merges and metadata changes, creating real risk for production workloads. Blue/Green avoids this by creating a new versioned table and migrating data live via a materialized view, so traffic continues uninterrupted.
**When to use it**:
- Change the **table engine** (e.g., MergeTree → ReplacingMergeTree)
- Update **ORDER BY fields** (sorting keys) to better match query patterns
- Reshape **primary keys** or perform type changes that require a rewrite
**How Moose does it**:
1. Define a new table with the same logical name and a bumped `version`, setting the new `order_by_fields` and/or `engine` ([Table modeling](/moose/olap/model-table)).
2. Create a [Materialized view](/moose/olap/model-materialized-view) that selects from the old table and writes to the new one; Moose backfills once and keeps the view live for new inserts.
3. Later on, cut over readers/writers to the new export and clean up old resources ([Applying migrations](/moose/olap/apply-migrations)).
Setting `config.version` on an `OlapTable` changes only the underlying table name (suffixes dots with underscores). Your code still refers to the logical table you exported.
## High-level workflow
## Example: change sorting key (ORDER BY)
Assume the original `events` table orders by `id` only. We want to update the
sorting key to optimize reads by ordering on `id, createdAt`.
### Original table (version 0.0)
```python filename="app/tables/events.py" copy
from typing import Annotated
from pydantic import BaseModel
from moose_lib import OlapTable, Key, OlapConfig
class EventV0(BaseModel):
id: str
name: str
created_at: str # datetime in your format
events = OlapTable[EventV0]("events", config=OlapConfig(version="0.0", order_by_fields=["id"]))
```
### New table (bump to version 0.1)
Create a new table with the same logical name, but set `version: "0.1"` and update the ordering to `id, createdAt`. Moose will create `events_0_1` in ClickHouse.
```python filename="app/tables/events_v01.py" copy
class EventV1(BaseModel):
id: Key[str]
name: str
created_at: str
events_v1 = OlapTable[EventV1]("events", config=OlapConfig(version="0.1", order_by_fields=["id", "created_at"]))
```
### Create the materialized view to migrate data
Create a materialized view that:
- SELECTs from the old table (`events_v0`)
- copies fields 1:1 to the new table
- writes into the versioned target table (`events_v1`)
Pass the versioned `OlapTable` instance as `targetTable`. If you only pass a `tableName`, Moose will create an unversioned target.
```python filename="app/views/migrate_events_to_v01.py" copy
from moose_lib import MaterializedView, MaterializedViewOptions
from app.tables.events import events
from app.tables.events_v01 import events_v1, EventV1
migrate_events_to_v01 = MaterializedView[EventV1](
MaterializedViewOptions(
materialized_view_name="mv_events_to_0_1",
select_statement=(
f"SELECT * FROM {events.name}"
),
select_tables=[events],
),
target_table=events_v1,
)
```
What happens when you export this view:
- Moose creates the versioned table if needed
- Moose creates the MATERIALIZED VIEW and immediately runs a one-time backfill (`INSERT INTO ... SELECT ...`)
- ClickHouse keeps the view active: any new inserts into `events` automatically flow into `events_0_1`
## Cutover and cleanup
- Update readers to query the new table export (`eventsV1`).
- Update writers/streams to produce to the new table if applicable.
- After verifying parity and retention windows, drop the old table and the migration view.
## Notes and tips
- Use semantic versions like `0.1`, `1.0`, `1.1`. Moose will render `events_1_1` as the physical name.
- Keep the migration view simple and deterministic. If you need complex transforms, prefer explicit SQL in the `selectStatement`.
- Very large backfills can take time. Consider deploying during low-traffic windows.
---
## Supported Column Types
Source: moose/olap/supported-types.mdx
Complete guide to defining columns for ClickHouse tables in Moose
# Supported Column Types
Moose supports a comprehensive set of ClickHouse column types across both TypeScript and Python libraries. This guide covers all supported types, their syntax, and best practices for defining table schemas.
## Basic Types
### String Types
```python
from typing import Literal
from uuid import UUID
class User(BaseModel):
string: str # String
low_cardinality: Annotated[str, "LowCardinality"] # LowCardinality(String)
uuid: UUID # UUID
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `String` | `str` | Variable-length string |
| `LowCardinality(String)` | `str` with `Literal[str]` | Optimized for repeated values |
| `UUID` | `UUID` | UUID format strings |
### Numeric Types
### Integer Types
```python
from typing import Annotated
class Metrics(BaseModel):
user_id: Annotated[int, "int32"] # Int32
count: Annotated[int, "int64"] # Int64
small_value: Annotated[int, "uint8"] # UInt8
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Int8` | `Annotated[int, "int8"]` | -128 to 127 |
| `Int16` | `Annotated[int, "int16"]` | -32,768 to 32,767 |
| `Int32` | `Annotated[int, "int32"]` | -2,147,483,648 to 2,147,483,647 |
| `Int64` | `Annotated[int, "int64"]` | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
| `UInt8` | `Annotated[int, "uint8"]` | 0 to 255 |
| `UInt16` | `Annotated[int, "uint16"]` | 0 to 65,535 |
| `UInt32` | `Annotated[int, "uint32"]` | 0 to 4,294,967,295 |
| `UInt64` | `Annotated[int, "uint64"]` | 0 to 18,446,744,073,709,551,615 |
### Floating Point Types
```python
from moose_lib import ClickhouseSize
class SensorData(BaseModel):
temperature: float # Float64
humidity: Annotated[float, ClickhouseSize(4)] # Float32
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Float64` | `float` | floating point number |
### Decimal Types
```python
from moose_lib import clickhouse_decimal
class FinancialData(BaseModel):
amount: clickhouse_decimal(10, 2) # Decimal(10,2)
rate: clickhouse_decimal(5, 4) # Decimal(5,4)
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Decimal(P,S)` | `clickhouse_decimal(P,S)` | Fixed-point decimal |
### Boolean Type
```python
class User(BaseModel):
is_active: bool
verified: bool
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Boolean` | `bool` | `bool` |
### Date and Time Types
```python
from datetime import date, datetime
from moose_lib import ClickhouseSize, clickhouse_datetime64
class Event(BaseModel):
created_at: datetime # DateTime
updated_at: clickhouse_datetime64(3) # DateTime(3)
birth_date: date # Date
compact_date: Annotated[date, ClickhouseSize(2)] # Date16
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `Date` | `date` | Date only |
| `Date16` | `date` | `Annotated[date, ClickhouseSize(2)]` | Compact date format |
| `DateTime` | `datetime` | Date and time |
### Network Types
```python
from ipaddress import IPv4Address, IPv6Address
class NetworkEvent(BaseModel):
source_ip: IPv4Address
dest_ip: IPv6Address
```
| ClickHouse Type | Python | Description |
|------|------------|--------|
| `IPv4` | `ipaddress.IPv4Address` | IPv4 addresses |
| `IPv6` | `ipaddress.IPv6Address` | IPv6 addresses |
## Complex Types
### Geometry Types
Moose supports ClickHouse geometry types. Use the helpers in each language to get type-safe models and correct ClickHouse mappings.
```python
from moose_lib import Point, Ring, LineString, MultiLineString, Polygon, MultiPolygon
class GeoTypes(BaseModel):
point: Point # tuple[float, float]
ring: Ring # list[tuple[float, float]]
line_string: LineString # list[tuple[float, float]]
multi_line_string: MultiLineString # list[list[tuple[float, float]]]
polygon: Polygon # list[list[tuple[float, float]]]
multi_polygon: MultiPolygon # list[list[list[tuple[float, float]]]]
```
| ClickHouse Type | Python |
|------|------------|
| `Point` | `Point` (tuple[float, float]) |
| `Ring` | `Ring` (list[tuple[float, float]]) |
| `LineString` | `LineString` (list[tuple[float, float]]) |
| `MultiLineString` | `MultiLineString` (list[list[tuple[float, float]]]) |
| `Polygon` | `Polygon` (list[list[tuple[float, float]]]) |
| `MultiPolygon` | `MultiPolygon` (list[list[list[tuple[float, float]]]]) |
Geometry coordinates are represented as numeric pairs `[x, y]` (TypeScript) or `tuple[float, float]` (Python).
### Array Types
Arrays are supported for all basic types and some complex types.
```python
from typing import List, Dict, Any
class User(BaseModel):
tags: List[str] # Array(String)
scores: List[float] # Array(Float64)
metadata: List[Dict[str, Any]] # Array(Json)
tuple: List[Tuple[str, int]] # Array(Tuple(String, Int32))
```
### Map Types
Maps store key-value pairs with specified key and value types.
```python
from typing import Dict
class User(BaseModel):
preferences: Dict[str, str] # Map(String, String)
metrics: Dict[str, float] # Map(String, Float64)
```
### Nested Types
Nested types allow embedding complex objects within tables.
```python
class Address(BaseModel):
street: str
city: str
zip: str
class User(BaseModel):
name: str
address: Address # Nested type
```
### Named Tuple Types
Named tuples provide structured data with named fields.
```python
from typing import Annotated
class Point(BaseModel):
x: float
y: float
class Shape(BaseModel):
center: Annotated[Point, "ClickHouseNamedTuple"] # Named tuple
radius: float
```
### Enum Types
Enums map to ClickHouse enums with string or integer values.
```python
from enum import Enum
class UserRole(str, Enum):
ADMIN = "admin"
USER = "user"
GUEST = "guest"
class User(BaseModel):
role: UserRole # Enum with string values
```
## Special Types
### JSON Type
The `Json` type stores arbitrary JSON data with optional schema configuration for performance and type safety.
#### Basic JSON (Unstructured)
For completely dynamic JSON data without any schema:
```python
from typing import Any, Dict
class Event(BaseModel):
metadata: Dict[str, Any] # Basic JSON - accepts any structure
config: Any # Basic JSON - fully dynamic
```
#### Rich JSON with Type Configuration
For better performance and validation, you can define typed fields within your JSON using `ClickHouseJson`. This creates a ClickHouse `JSON` column with explicit type hints for specific paths.
```python
from typing import Annotated
from pydantic import BaseModel, ConfigDict
from moose_lib.data_models import ClickHouseJson
# Define the structure for your JSON payload
class PayloadStructure(BaseModel):
model_config = ConfigDict(extra='allow') # Required for JSON types
name: str
count: int
timestamp: Optional[datetime] = None
class Event(BaseModel):
id: str
# JSON with typed paths - better performance, allows extra fields
payload: Annotated[PayloadStructure, ClickHouseJson()]
# JSON with performance tuning options
metadata: Annotated[PayloadStructure, ClickHouseJson(
max_dynamic_paths=256, # Limit tracked paths
max_dynamic_types=16, # Limit type variations
skip_paths=("skip.me",), # Exclude specific paths
skip_regexes=(r"^tmp\\.",) # Exclude paths matching regex
)]
```
#### Configuration Options
| Option | Type | Description |
|--------|------|-------------|
| `max_dynamic_paths` | `number` | Maximum number of unique JSON paths to track. Helps control memory usage for highly variable JSON structures. |
| `max_dynamic_types` | `number` | Maximum number of type variations allowed per path. Useful when paths may contain different types. |
| `skip_paths` | `string[]` | Array of exact JSON paths to ignore during ingestion (e.g., `["temp", "debug.info"]`). |
| `skip_regexps` | `string[]` | Array of regex patterns for paths to exclude (e.g., `["^tmp\\.", ".*_internal$"]`). |
#### Benefits of Typed JSON
1. **Better Performance**: ClickHouse can optimize storage and queries for known paths
2. **Type Safety**: Validates that specified paths match expected types
3. **Flexible Schema**: Allows additional fields beyond typed paths
4. **Memory Control**: Configure limits to prevent unbounded resource usage
- **Basic JSON** (`any`, `Dict[str, Any]`): Use when JSON structure is completely unknown or rarely queried
- **Rich JSON** (`ClickHouseJson`): Use when you have known fields that need indexing/querying, but want to allow additional dynamic fields
#### Example: Product Event Tracking
```python
from typing import Annotated, Optional
from pydantic import BaseModel, ConfigDict
from moose_lib import Key, ClickHouseJson
from datetime import datetime
class ProductProperties(BaseModel):
model_config = ConfigDict(extra='allow')
category: str
price: float
in_stock: bool
class ProductEvent(BaseModel):
event_id: Key[str]
timestamp: datetime
# Typed paths for common fields, but allows custom properties
properties: Annotated[ProductProperties, ClickHouseJson(
max_dynamic_paths=128, # Track up to 128 unique paths
max_dynamic_types=8, # Allow up to 8 type variations per path
skip_paths=("_internal",), # Ignore internal fields
skip_regexes=(r"^debug_",) # Ignore debug fields
)]
```
With this schema, you can send events like:
```python
{
"event_id": "evt_123",
"timestamp": "2025-10-22T12:00:00Z",
"properties": {
"category": "electronics", # Typed field ✓
"price": 99.99, # Typed field ✓
"in_stock": True, # Typed field ✓
"custom_tag": "holiday-sale", # Extra field - accepted ✓
"brand_id": 42, # Extra field - accepted ✓
"_internal": "ignored" # Skipped by skip_paths ✓
}
}
```
### Nullable Types
All types support nullable variants using optional types.
```python
from typing import Optional
class User(BaseModel):
name: str # Required
email: Optional[str] = None # Nullable
age: Optional[int] = None # Nullable
```
If a field is optional in your app model but you provide a ClickHouse default, Moose infers a non-nullable ClickHouse column with a DEFAULT clause.
- Optional without default → ClickHouse Nullable type.
- Optional with default (using `clickhouse_default("18")` in annotations) → non-nullable column with default `18`.
This lets you keep optional fields at the application layer while avoiding Nullable columns in ClickHouse when a server-side default exists.
### SimpleAggregateFunction
`SimpleAggregateFunction` is designed for use with `AggregatingMergeTree` tables. It stores pre-aggregated values that are automatically merged when ClickHouse combines rows with the same primary key.
```python
from moose_lib import simple_aggregated, Key, OlapTable, OlapConfig, AggregatingMergeTreeEngine
from pydantic import BaseModel
from datetime import datetime
class DailyStats(BaseModel):
date: datetime
user_id: str
total_views: simple_aggregated('sum', int)
max_score: simple_aggregated('max', float)
last_seen: simple_aggregated('anyLast', datetime)
stats_table = OlapTable[DailyStats](
"daily_stats",
OlapConfig(
engine=AggregatingMergeTreeEngine(),
order_by_fields=["date", "user_id"]
)
)
```
See [ClickHouse docs](https://clickhouse.com/docs/en/sql-reference/data-types/simpleaggregatefunction) for the complete list of functions.
## Table Engines
Moose supports all common ClickHouse table engines:
| Engine | Python | Description |
|--------|------------|-------------|
| `MergeTree` | `ClickHouseEngines.MergeTree` | Default engine |
| `ReplacingMergeTree` | `ClickHouseEngines.ReplacingMergeTree` | Deduplication |
| `SummingMergeTree` | `ClickHouseEngines.SummingMergeTree` | Aggregates numeric columns |
| `AggregatingMergeTree` | `ClickHouseEngines.AggregatingMergeTree` | Advanced aggregation |
| `ReplicatedMergeTree` | `ClickHouseEngines.ReplicatedMergeTree` | Replicated version of MergeTree |
| `ReplicatedReplacingMergeTree` | `ClickHouseEngines.ReplicatedReplacingMergeTree` | Replicated with deduplication |
| `ReplicatedSummingMergeTree` | `ClickHouseEngines.ReplicatedSummingMergeTree` | Replicated with aggregation |
| `ReplicatedAggregatingMergeTree` | `ClickHouseEngines.ReplicatedAggregatingMergeTree` | Replicated with advanced aggregation |
```python
from moose_lib import ClickHouseEngines
user_table = OlapTable("users", {
"engine": ClickHouseEngines.ReplacingMergeTree,
"orderByFields": ["id", "updated_at"]
})
```
## Best Practices
### Type Selection
- **Use specific integer types** when you know the value ranges to save storage
- **Prefer `Float64`** for most floating-point calculations unless storage is critical
- **Use `LowCardinality`** for string columns with repeated values
- **Choose appropriate DateTime precision** based on your accuracy needs
### Performance Considerations
- **Order columns by cardinality** (low to high) for better compression
- **Use `ReplacingMergeTree`** for tables with frequent updates
- **Specify `orderByFields` or `orderByExpression`** for optimal query performance
- **Consider `LowCardinality`** for string columns with < 10,000 unique values
---
## moose/olap/ttl
Source: moose/olap/ttl.mdx
## TTL (Time-to-Live) for ClickHouse Tables
Moose lets you declare ClickHouse TTL directly in your data model:
- Table-level TTL via the `ttl` option on `OlapTable` config
- Column-level TTL via `ClickHouseTTL` on individual fields
### When to use TTL
- Automatically expire old rows to control storage cost
- Mask or drop sensitive columns earlier than the full row expiry
### TypeScript
```ts
interface Event {
id: Key;
timestamp: DateTime;
email: string & ClickHouseTTL<"timestamp + INTERVAL 30 DAY">; // column TTL
}
);
```
### Python
```python
from typing import Annotated
from moose_lib import OlapTable, OlapConfig, Key, ClickHouseTTL
from pydantic import BaseModel
from datetime import datetime
class Event(BaseModel):
id: Key[str]
timestamp: datetime
email: Annotated[str, ClickHouseTTL("timestamp + INTERVAL 30 DAY")]
events = OlapTable[Event](
"Events",
OlapConfig(
order_by_fields=["id", "timestamp"],
ttl="timestamp + INTERVAL 90 DAY DELETE",
),
)
```
### Notes
- Expressions must be valid ClickHouse TTL expressions, but do not include the leading `TTL` keyword.
- Column TTLs are independent from the table TTL and can be used together.
- Moose will apply TTL changes via migrations using `ALTER TABLE ... MODIFY TTL` and `MODIFY COLUMN ... TTL`.
### Related
- See `Modeling Tables` for defining your schema
- See `Applying Migrations` to roll out TTL changes
---
## Python Moose Lib Reference
Source: moose/reference/py-moose-lib.mdx
Python Moose Lib Reference
# API Reference
This is a comprehensive reference for the Python `moose_lib`, detailing all exported components, types, and utilities.
## Core Types
### `Key[T]`
A type annotation for marking fields as primary keys in data models. Used with Pydantic.
```python
from moose_lib import Key
from pydantic import BaseModel
class MyModel(BaseModel):
id: Key[str] # Marks 'id' as a primary key of type string
```
### `BaseModel`
Pydantic base model used for data modeling in Moose.
```python
from pydantic import BaseModel
class MyDataModel(BaseModel):
id: str
name: str
count: int
```
### `MooseClient`
Client for interacting with ClickHouse and Temporal.
```python
class MooseClient:
query: QueryClient # For database queries
workflow: Optional[WorkflowClient] # For workflow operations
```
### `ApiResult`
Class representing the result of a analytics API call.
```python
@dataclass
class ApiResult:
status: int # HTTP status code
body: Any # Response body
```
## Configuration Types
### `OlapConfig`
Configuration for OLAP tables.
```python
from typing import Union, Optional
from moose_lib.blocks import EngineConfig
class OlapConfig(BaseModel):
database: Optional[str] = None # Optional database name (defaults to moose.config.toml clickhouse_config.db_name)
order_by_fields: list[str] = [] # Fields to order by
engine: Optional[EngineConfig] = None # Table engine configuration
```
### `EngineConfig` Classes
Base class and implementations for table engine configurations.
```python
# Base class
class EngineConfig:
pass
# Available engine implementations
class MergeTreeEngine(EngineConfig):
pass
class ReplacingMergeTreeEngine(EngineConfig):
ver: Optional[str] = None # Version column for keeping latest
is_deleted: Optional[str] = None # Soft delete marker (requires ver)
class AggregatingMergeTreeEngine(EngineConfig):
pass
class SummingMergeTreeEngine(EngineConfig):
columns: Optional[List[str]] = None # Columns to sum
# Replicated engines
class ReplicatedMergeTreeEngine(EngineConfig):
keeper_path: Optional[str] = None # ZooKeeper/Keeper path (optional for Cloud)
replica_name: Optional[str] = None # Replica name (optional for Cloud)
class ReplicatedReplacingMergeTreeEngine(EngineConfig):
keeper_path: Optional[str] = None # ZooKeeper/Keeper path (optional for Cloud)
replica_name: Optional[str] = None # Replica name (optional for Cloud)
ver: Optional[str] = None # Version column for keeping latest
is_deleted: Optional[str] = None # Soft delete marker (requires ver)
class ReplicatedAggregatingMergeTreeEngine(EngineConfig):
keeper_path: Optional[str] = None # ZooKeeper/Keeper path (optional for Cloud)
replica_name: Optional[str] = None # Replica name (optional for Cloud)
class ReplicatedSummingMergeTreeEngine(EngineConfig):
keeper_path: Optional[str] = None # ZooKeeper/Keeper path (optional for Cloud)
replica_name: Optional[str] = None # Replica name (optional for Cloud)
columns: Optional[List[str]] = None # Columns to sum
```
### `StreamConfig`
Configuration for data streams.
```python
class StreamConfig(BaseModel):
parallelism: int = 1
retention_period: int = 60 * 60 * 24 * 7 # 7 days
destination: Optional[OlapTable[Any]] = None
```
### `IngestConfig`
Configuration for data ingestion.
```python
class IngestConfig(BaseModel):
destination: Optional[OlapTable[Any]] = None
```
### `IngestPipelineConfig`
Configuration for creating a complete data pipeline.
```python
class IngestPipelineConfig(BaseModel):
table: bool | OlapConfig = True
stream: bool | StreamConfig = True
ingest_api: bool | IngestConfig = True
```
## Infrastructure Components
### `OlapTable[T]`
Creates a ClickHouse table with the schema of type T.
```python
from moose_lib import OlapTable, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
# Basic usage
my_table = OlapTable[UserProfile]("user_profiles")
# With configuration (fields)
my_table = OlapTable[UserProfile]("user_profiles", OlapConfig(
order_by_fields=["id", "timestamp"],
engine=ReplacingMergeTreeEngine(),
))
# With configuration (expression)
my_table_expr = OlapTable[UserProfile]("user_profiles_expr", OlapConfig(
order_by_expression="(id, timestamp)",
engine=ReplacingMergeTreeEngine(),
))
# With custom database override
analytics_table = OlapTable[UserProfile]("user_profiles", OlapConfig(
database="analytics", # Override default database
order_by_fields=["id", "timestamp"]
))
# Disable sorting entirely
my_table_unsorted = OlapTable[UserProfile]("user_profiles_unsorted", OlapConfig(
order_by_expression="tuple()",
))
```
### `Stream[T]`
Creates a Redpanda topic with the schema of type T.
```python
# Basic usage
my_stream = Stream[UserEvent]("user_events")
# With configuration
my_stream = Stream[UserEvent]("user_events", StreamConfig(
parallelism=3,
retention_period=86400 # 1 day in seconds
))
# Adding transformations
def transform_user_event(event: UserEvent) -> ProfileUpdate:
return ProfileUpdate(user_id=event.user_id, update_type="event")
my_stream.add_transform(profile_stream, transform_user_event)
```
### `IngestApi[T]`
Creates an HTTP endpoint for ingesting data of type T.
```python
# Basic usage with destination stream
my_ingest_api = IngestApi[UserEvent]("user_events", IngestConfigWithDestination(
destination=my_user_event_stream
))
```
### `Api[T, U]`
Creates an HTTP endpoint for querying data with request type T and response type U.
```python
# Basic usage
def get_user_profiles(params: UserQuery) -> list[UserProfile]:
# Query implementation
return [UserProfile(...), UserProfile(...)]
my_api = Api[UserQuery, list[UserProfile]](
"get_user_profiles",
get_user_profiles
)
```
### `IngestPipeline[T]`
Combines ingest API, stream, and table creation in a single component.
```python
from moose_lib import IngestPipeline, IngestPipelineConfig, StreamConfig, OlapConfig
from moose_lib.blocks import ReplacingMergeTreeEngine
# Basic usage
pipeline = IngestPipeline[UserEvent]("user_pipeline", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True
))
# With advanced configuration
pipeline = IngestPipeline[UserEvent]("user_pipeline", IngestPipelineConfig(
ingest_api=True,
stream=StreamConfig(parallelism=3),
table=OlapConfig(
order_by_fields=["id", "timestamp"],
engine=ReplacingMergeTreeEngine(),
)
))
```
### `MaterializedView[T]`
Creates a materialized view in ClickHouse.
```python
# Basic usage
view = MaterializedView[UserStatistics](MaterializedViewOptions(
select_statement="SELECT user_id, COUNT(*) as event_count FROM user_events GROUP BY user_id",
table_name="user_events",
materialized_view_name="user_statistics",
order_by_fields=["user_id"]
))
```
## ClickHouse Utilities
### Engine Configuration Classes
Type-safe configuration classes for table engines:
```python
from moose_lib.blocks import (
MergeTreeEngine,
ReplacingMergeTreeEngine,
AggregatingMergeTreeEngine,
SummingMergeTreeEngine,
ReplicatedMergeTreeEngine,
ReplicatedReplacingMergeTreeEngine,
ReplicatedAggregatingMergeTreeEngine,
ReplicatedSummingMergeTreeEngine,
S3QueueEngine
)
# ReplacingMergeTree with version control and soft deletes
dedup_engine = ReplacingMergeTreeEngine(
ver="updated_at", # Optional: version column for keeping latest
is_deleted="deleted" # Optional: soft delete marker (requires ver)
)
# ReplicatedMergeTree with explicit keeper paths (self-managed ClickHouse)
replicated_engine = ReplicatedMergeTreeEngine(
keeper_path="/clickhouse/tables/{database}/{shard}/my_table",
replica_name="{replica}"
)
# ReplicatedReplacingMergeTree with deduplication
replicated_dedup_engine = ReplicatedReplacingMergeTreeEngine(
keeper_path="/clickhouse/tables/{database}/{shard}/my_dedup_table",
replica_name="{replica}",
ver="updated_at",
is_deleted="deleted"
)
# For ClickHouse Cloud or Boreal - omit keeper parameters
cloud_replicated = ReplicatedMergeTreeEngine() # No parameters needed
# S3Queue configuration for streaming from S3
s3_engine = S3QueueEngine(
s3_path="s3://bucket/data/*.json",
format="JSONEachRow",
aws_access_key_id="AKIA...", # Optional
aws_secret_access_key="secret...", # Optional
compression="gzip", # Optional
headers={"X-Custom": "value"} # Optional
)
# Use with OlapTable
s3_table = OlapTable[MyData]("s3_events", OlapConfig(
engine=s3_engine,
settings={
"mode": "unordered",
"keeper_path": "/clickhouse/s3queue/events",
"loading_retries": "3"
}
))
# S3 engine for direct S3 access (not streaming)
s3_direct_engine = S3Engine(
path="s3://bucket/data/file.json",
format="JSONEachRow",
aws_access_key_id="AKIA...", # Optional
aws_secret_access_key="secret...", # Optional
compression="gzip" # Optional
)
s3_direct_table = OlapTable[MyData]("s3_data", OlapConfig(
engine=s3_direct_engine
))
# Buffer engine for high-throughput buffered writes
buffer_engine = BufferEngine(
target_database="local",
target_table="destination_table",
num_layers=16,
min_time=10,
max_time=100,
min_rows=10000,
max_rows=1000000,
min_bytes=10485760,
max_bytes=104857600
)
buffer_table = OlapTable[MyData]("buffer", OlapConfig(
engine=buffer_engine
))
# Distributed engine for cluster-wide distributed tables
distributed_engine = DistributedEngine(
cluster="my_cluster",
target_database="default",
target_table="local_table",
sharding_key="cityHash64(id)" # Optional
)
distributed_table = OlapTable[MyData]("distributed", OlapConfig(
engine=distributed_engine
))
```
## Task Management
### `Task[T, U]`
A class that represents a single task within a workflow system, with typed input and output.
```python
from moose_lib import Task, TaskConfig, TaskContext
from pydantic import BaseModel
# Define input and output models
class InputData(BaseModel):
user_id: str
class OutputData(BaseModel):
result: str
status: bool
# Task with input and output
def process_user(ctx: TaskContext[InputData]) -> OutputData:
# Process the user data
return OutputData(result=f"Processed {ctx.input.user_id}", status=True)
user_task = Task[InputData, OutputData](
name="process_user",
config=TaskConfig(
run=process_user,
retries=3,
timeout="30s"
)
)
# Task with no input, but with output
def fetch_data(ctx: TaskContext[None]) -> OutputData:
return OutputData(result="Fetched data", status=True)
fetch_task = Task[None, OutputData](
name="fetch_data",
config=TaskConfig(run=fetch_data)
)
# Task with input but no output
def log_event(ctx: TaskContext[InputData]) -> None:
print(f"Event logged for: {ctx.input.user_id}")
log_task = Task[InputData, None](
name="log_event",
config=TaskConfig(run=log_event)
)
# Task with neither input nor output
def cleanup(ctx: TaskContext[None]) -> None:
print("Cleanup complete")
cleanup_task = Task[None, None](
name="cleanup",
config=TaskConfig(run=cleanup)
)
```
### `TaskConfig[T, U]`
Configuration for a Task.
```python
@dataclasses.dataclass
class TaskConfig(Generic[T, U]):
# The handler function that executes the task logic
# Can be any of: () -> None, () -> U, (T) -> None, or (T) -> U depending on input/output types
run: TaskRunFunc[T, U]
# Optional list of tasks to run after this task completes
on_complete: Optional[list[Task[U, Any]]] = None
# Optional function that is called when the task is cancelled
on_cancel: Optional[Callable[[TaskContext[T_none]], Union[None, Awaitable[None]]]] = None
# Optional timeout string (e.g. "5m", "1h", "never")
timeout: Optional[str] = None
# Optional number of retry attempts
retries: Optional[int] = None
```
### `Workflow`
Represents a workflow composed of one or more tasks.
```python
from moose_lib import Workflow, WorkflowConfig
# Create a workflow that starts with the fetch_task
data_workflow = Workflow(
name="data_processing",
config=WorkflowConfig(
starting_task=fetch_task,
schedule="@every 1h", # Run every hour
timeout="10m", # Timeout after 10 minutes
retries=2 # Retry up to 2 times if it fails
)
)
```
### `WorkflowConfig`
Configuration for a workflow.
```python
@dataclasses.dataclass
class WorkflowConfig:
# The first task to execute in the workflow
starting_task: Task[Any, Any]
# Optional number of retry attempts for the entire workflow
retries: Optional[int] = None
# Optional timeout string for the entire workflow
timeout: Optional[str] = None
# Optional cron-like schedule string for recurring execution
schedule: Optional[str] = None
```
---
## TypeScript Moose Lib Reference
Source: moose/reference/ts-moose-lib.mdx
TypeScript Moose Lib Reference
# API Reference
This is a comprehensive reference for `moose_lib` , detailing all exported components, types, and utilities.
## Core Types
### `Key`
A type for marking fields as primary keys in data models.
```ts
// Example
interface MyModel {
id: Key; // Marks 'id' as a primary key of type string
}
```
### `JWT`
A type for working with JSON Web Tokens.
```ts
// Example
type UserJWT = JWT<{ userId: string, role: string }>;
```
### `ApiUtil`
Interface providing utilities for analytics APIs.
```ts
interface ApiUtil {
client: MooseClient; // Client for interacting with the database
sql: typeof sql; // SQL template tag function
jwt: JWTPayload | undefined; // Current JWT if available
}
```
## Infrastructure Components
### `OlapTable`
Creates a ClickHouse table with the schema of type T.
```ts
// Basic usage with MergeTree (default)
);
// With sorting configuration (expression)
);
// Disable sorting entirely
);
// For deduplication, explicitly set the ReplacingMergeTree engine
);
```
### `BaseOlapConfig`
Base configuration interface for `OlapTable` with common table configuration options.
```ts
interface BaseOlapConfig {
// Optional database name (defaults to moose.config.toml clickhouse_config.db_name)
database?: string;
// Optional array of field names to order by
orderByFields?: (keyof T & string)[];
// Optional SQL expression for ORDER BY clause (alternative to orderByFields)
orderByExpression?: string;
// Optional table engine (defaults to MergeTree)
engine?: ClickHouseEngines;
// Optional settings for table configuration
settings?: { [key: string]: string };
// Optional lifecycle mode (defaults to MOOSE_MANAGED)
lifeCycle?: LifeCycle;
// Additional engine-specific fields (ver, isDeleted, keeperPath, etc.)
// depend on the engine type
}
```
Example with database override:
```ts
// Table in custom database
);
// Default database (from moose.config.toml)
);
```
### `Stream`
Creates a Redpanda topic with the schema of type T.
```ts
// Basic usage
);
// Adding transformations
myConfiguredStream.addTransform(
destinationStream,
(record) => transformFunction(record)
);
```
### `IngestApi`
Creates an HTTP endpoint for ingesting data of type T.
```ts
// Basic usage with destination stream
);
```
### `Api`
Creates an HTTP endpoint for querying data with request type T and response type R.
```ts
// Basic usage
) => {
const result = await client.query.execute(
sql`SELECT * FROM user_profiles WHERE age > ${params.minAge} LIMIT 10`
);
return result;
}
);
```
### `IngestPipeline`
Combines ingest API, stream, and table creation in a single component.
```ts
// Basic usage
);
// With advanced configuration
);
```
### `MaterializedView`
Creates a materialized view in ClickHouse.
```ts
// Basic usage
);
```
## SQL Utilities
### `sql` Template Tag
Template tag for creating type-safe SQL queries with parameters.
```ts
// Basic usage
const query = sql`SELECT * FROM users WHERE id = ${userId}`;
// With multiple parameters
const query = sql`
SELECT * FROM users
WHERE age > ${minAge}
AND country = ${country}
LIMIT ${limit}
`;
```
### `MooseClient`
Client for interacting with ClickHouse and Temporal.
```ts
class MooseClient {
query: QueryClient; // For database queries
workflow: WorkflowClient; // For workflow operations
}
```
## ClickHouse Utilities
### Table Engine Configurations
#### `ClickHouseEngines` Enum
Available table engines:
```ts
enum ClickHouseEngines {
MergeTree = "MergeTree",
ReplacingMergeTree = "ReplacingMergeTree",
AggregatingMergeTree = "AggregatingMergeTree",
SummingMergeTree = "SummingMergeTree",
ReplicatedMergeTree = "ReplicatedMergeTree",
ReplicatedReplacingMergeTree = "ReplicatedReplacingMergeTree",
ReplicatedAggregatingMergeTree = "ReplicatedAggregatingMergeTree",
ReplicatedSummingMergeTree = "ReplicatedSummingMergeTree",
S3Queue = "S3Queue"
}
```
#### `ReplacingMergeTreeConfig`
Configuration for ReplacingMergeTree tables:
```ts
type ReplacingMergeTreeConfig = {
engine: ClickHouseEngines.ReplacingMergeTree;
orderByFields?: (keyof T & string)[];
ver?: keyof T & string; // Optional: version column for keeping latest
isDeleted?: keyof T & string; // Optional: soft delete marker (requires ver)
settings?: { [key: string]: string };
}
```
#### Replicated Engine Configurations
Configuration for replicated table engines:
```ts
// ReplicatedMergeTree
type ReplicatedMergeTreeConfig = {
engine: ClickHouseEngines.ReplicatedMergeTree;
keeperPath?: string; // Optional: ZooKeeper/Keeper path (omit for Cloud)
replicaName?: string; // Optional: replica name (omit for Cloud)
orderByFields?: (keyof T & string)[];
settings?: { [key: string]: string };
}
// ReplicatedReplacingMergeTree
type ReplicatedReplacingMergeTreeConfig = {
engine: ClickHouseEngines.ReplicatedReplacingMergeTree;
keeperPath?: string; // Optional: ZooKeeper/Keeper path (omit for Cloud)
replicaName?: string; // Optional: replica name (omit for Cloud)
ver?: keyof T & string; // Optional: version column
isDeleted?: keyof T & string; // Optional: soft delete marker
orderByFields?: (keyof T & string)[];
settings?: { [key: string]: string };
}
// ReplicatedAggregatingMergeTree
type ReplicatedAggregatingMergeTreeConfig = {
engine: ClickHouseEngines.ReplicatedAggregatingMergeTree;
keeperPath?: string; // Optional: ZooKeeper/Keeper path (omit for Cloud)
replicaName?: string; // Optional: replica name (omit for Cloud)
orderByFields?: (keyof T & string)[];
settings?: { [key: string]: string };
}
// ReplicatedSummingMergeTree
type ReplicatedSummingMergeTreeConfig = {
engine: ClickHouseEngines.ReplicatedSummingMergeTree;
keeperPath?: string; // Optional: ZooKeeper/Keeper path (omit for Cloud)
replicaName?: string; // Optional: replica name (omit for Cloud)
columns?: string[]; // Optional: columns to sum
orderByFields?: (keyof T & string)[];
settings?: { [key: string]: string };
}
```
**Note**: The `keeperPath` and `replicaName` parameters are optional. When omitted, Moose uses smart defaults that work in both ClickHouse Cloud and self-managed environments (default path: `/clickhouse/tables/{uuid}/{shard}` with replica `{replica}`). You can still provide both parameters explicitly if you need custom replication paths.
### `S3QueueTableSettings`
Type-safe interface for S3Queue-specific table settings (ClickHouse 24.7+).
```ts
interface S3QueueTableSettings {
mode?: "ordered" | "unordered"; // Processing mode
after_processing?: "keep" | "delete"; // File handling after processing
keeper_path?: string; // ZooKeeper path for coordination
loading_retries?: string; // Number of retry attempts
processing_threads_num?: string; // Parallel processing threads
// ... and many more settings
}
```
### S3Queue Configuration
Configure S3Queue tables for streaming data from S3 buckets (ORDER BY is not supported):
```ts
);
```
### S3 Configuration
Configure S3 tables for direct read/write access to S3 storage:
```ts
);
// Public bucket (no authentication) - omit credentials for NOSIGN
);
```
### Buffer Configuration
Configure Buffer tables for high-throughput buffered writes (ORDER BY is not supported):
```ts
// First create destination table
);
// Then create buffer table
);
```
### Distributed Configuration
Configure Distributed tables for cluster-wide distributed queries (ORDER BY is not supported):
```ts
);
```
## Task Management
### `Task`
A class that represents a single task within a workflow system.
```ts
// No input, no output
);
// With input and output
);
```
### `TaskContext`
A context object that includes input & state passed between the task's run/cancel functions.
```ts
export type TaskContext = T extends null ? { state: any; input?: null } : { state: any; input: T };
```
### `TaskConfig`
Configuration options for tasks.
```ts
interface TaskConfig {
// The main function that executes the task logic
run: (context: TaskContext) => Promise;
// Optional array of tasks to execute after this task completes
onComplete?: (Task | Task)[];
// Optional function that is called when the task is cancelled.
onCancel?: (context: TaskContext) => Promise;
// Optional timeout duration (e.g., "30s", "5m", "never")
timeout?: string;
// Optional number of retry attempts
retries?: number;
}
```
### `Workflow`
A class that represents a complete workflow composed of interconnected tasks.
```ts
const myWorkflow = new Workflow("getData", {
startingTask: callAPI,
schedule: "@every 5s", // Run every 5 seconds
timeout: "1h",
retries: 3
});
```
### `WorkflowConfig`
Configuration options for defining a workflow.
```ts
interface WorkflowConfig {
// The initial task that begins the workflow execution
startingTask: Task | Task | Task | Task;
// Optional number of retry attempts
retries?: number;
// Optional timeout duration (e.g., "10m", "1h", "never")
timeout?: string;
// Optional cron-style schedule string
schedule?: string;
}
```
---
**Important:** The following components must be exported from your `app/index.ts` file for Moose to detect them:
- `OlapTable` instances
- `Stream` instances
- `IngestApi` instances
- `Api` instances
- `IngestPipeline` instances
- `MaterializedView` instances
- `Task` instances
- `Workflow` instances
**Configuration objects and utilities** (like `DeadLetterQueue`, `Key`, `sql`) do not need to be exported as they are used as dependencies of the main components.
---
## Moose Streaming
Source: moose/streaming.mdx
Build real-time data pipelines with Redpanda/Kafka streams, transformations, and event processing
# Moose Streaming
## Overview
The Streaming module provides standalone real-time data processing with Kafka/Redpanda topics. You can use this capability independently to build event-driven architectures, data transformations, and real-time pipelines without requiring other MooseStack components.
## Basic Usage
```py filename="Stream.py" copy
from moose_lib import Stream
from pydantic import BaseModel
from datetime import datetime
class ExampleEvent(BaseModel):
id: str
user_id: str
timestamp: datetime
event_type: str
# Create a standalone stream for user events
example_stream = Stream[ExampleEvent]("streaming-topic-name")
# Add consumers for real-time processing
def process_event(event: ExampleEvent):
print(f"Processing event: {event}")
# Custom processing logic here
example_stream.add_consumer(process_event)
# No export needed - Python modules are automatically discovered
```
### Enabling Streaming
To enable streaming, you need to ensure that the `streaming_engine` feature flag is set to `true` in your `moose.config.toml` file:
```toml
[features]
streaming_engine = true
```
## Core Capabilities
## Integration with Other Capabilities
The Streaming capability can be used independently, or in conjunction with other MooseStack modules:
---
## moose/streaming/connect-cdc
Source: moose/streaming/connect-cdc.mdx
# Connect to CDC Services
Coming Soon!
---
## Streaming Consumer Functions
Source: moose/streaming/consumer-functions.mdx
Read and process data from streams with consumers and processors
# Streaming Consumer Functions
## Overview
Consuming data from streams allows you to read and process data from Kafka/Redpanda topics. This is essential for building real-time applications, analytics, and event-driven architectures.
## Basic Usage
Consumers are just functions that are called when new data is available in a stream. You add them to a stream like this:
```py filename="StreamConsumer.py" copy
from moose_lib import Stream
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: str
user_id: str
timestamp: datetime
event_type: str
user_events_stream = Stream[UserEvent]("user-events")
# Add a consumer to process events
def process_event(event: UserEvent):
print(f"Processing event: {event.id}")
print(f"User: {event.user_id}, Type: {event.event_type}")
# Your processing logic here
# e.g., update analytics, send notifications, etc.
user_events_stream.add_consumer(process_event)
# Add multiple consumers for different purposes
def analytics_consumer(event: UserEvent):
# Analytics processing
if event.event_type == 'purchase':
update_purchase_analytics(event)
def notification_consumer(event: UserEvent):
# Notification processing
if event.event_type == 'signup':
send_welcome_email(event.user_id)
user_events_stream.add_consumer(analytics_consumer)
user_events_stream.add_consumer(notification_consumer)
```
## Processing Patterns
### Stateful Processing with MooseCache
Maintain state across event processing using MooseCache for distributed state management:
```py filename="StatefulProcessing.py" copy
from datetime import datetime
from typing import Dict, Any
from moose_lib import MooseCache
from pydantic import BaseModel
# State container for accumulating data
class AccumulatorState(BaseModel):
id: str
counter: int
sum: float
last_modified: datetime
attributes: Dict[str, Any]
# Input message structure
class InputMessage(BaseModel):
id: str
group_id: str
numeric_value: float
message_type: str
timestamp: datetime
payload: Dict[str, Any]
message_stream = Stream[InputMessage]("input-stream")
# Initialize distributed cache
cache = MooseCache()
def process_message(message: InputMessage):
cache_key = f"state:{message.group_id}"
# Load existing state or create new one
state = cache.get(cache_key, AccumulatorState)
if not state:
# Initialize new state
state = AccumulatorState(
id=message.group_id,
counter=0,
sum=0.0,
last_modified=datetime.now(),
attributes={}
)
# Apply message to state
state.counter += 1
state.sum += message.numeric_value
state.last_modified = message.timestamp
state.attributes.update(message.payload)
# Determine cache lifetime based on message type
ttl_seconds = 60 if message.message_type == 'complete' else 3600
if message.message_type == 'complete' or should_finalize(state):
# Finalize and remove state
finalize_state(state)
cache.delete(cache_key)
else:
# Persist updated state
cache.set(cache_key, state, ttl_seconds=ttl_seconds)
def should_finalize(state: AccumulatorState) -> bool:
"""Condition for automatic state finalization"""
threshold = 100
time_limit_seconds = 30 * 60 # 30 minutes
elapsed = (datetime.now() - state.last_modified).total_seconds()
return state.counter >= threshold or elapsed > time_limit_seconds
def finalize_state(state: AccumulatorState):
print(f"Finalizing state {state.id}: counter={state.counter}, sum={state.sum}")
message_stream.add_consumer(process_message)
```
## Propagating Events to External Systems
You can use consumer functions to trigger actions across external systems - send notifications, sync databases, update caches, or integrate with any other service when events occur:
### HTTP API Calls
Send processed data to external APIs:
```py filename="HttpIntegration.py" copy
from datetime import datetime
from typing import Dict, Any
from pydantic import BaseModel
class WebhookPayload(BaseModel):
id: str
data: Dict[str, Any]
timestamp: datetime
webhook_stream = Stream[WebhookPayload]("webhook-events")
async def send_to_external_api(payload: WebhookPayload):
try:
async with httpx.AsyncClient() as client:
response = await client.post(
'https://external-api.com/webhook',
headers={
'Content-Type': 'application/json',
'Authorization': f'Bearer {os.getenv("API_TOKEN")}'
},
json={
'event_id': payload.id,
'event_data': payload.data,
'processed_at': datetime.now().isoformat()
}
)
if response.status_code != 200:
raise Exception(f"HTTP {response.status_code}: {response.text}")
print(f"Successfully sent event {payload.id} to external API")
except Exception as error:
print(f"Failed to send event {payload.id}: {error}")
# Could implement retry logic or dead letter queue here
webhook_stream.add_consumer(send_to_external_api)
```
#### Database Operations
Write processed data to external databases:
```py filename="DatabaseIntegration.py" copy
from datetime import datetime
from typing import Dict, Any
from pydantic import BaseModel
class DatabaseRecord(BaseModel):
id: str
category: str
value: float
metadata: Dict[str, Any]
timestamp: datetime
db_stream = Stream[DatabaseRecord]("database-events")
async def insert_to_database(record: DatabaseRecord):
try:
# Connect to PostgreSQL database
conn = await asyncpg.connect(
host=os.getenv('DB_HOST'),
user=os.getenv('DB_USER'),
password=os.getenv('DB_PASSWORD'),
database=os.getenv('DB_NAME')
)
# Insert record into external database
await conn.execute(
'''
INSERT INTO processed_events (id, category, value, metadata, created_at)
VALUES ($1, $2, $3, $4, $5)
''',
record.id,
record.category,
record.value,
json.dumps(record.metadata),
record.timestamp
)
print(f"Inserted record {record.id} into database")
except Exception as error:
print(f"Database insert failed for record {record.id}: {error}")
finally:
if 'conn' in locals():
await conn.close()
db_stream.add_consumer(insert_to_database)
```
#### File System Operations
Write processed data to files or cloud storage:
```py filename="FileSystemIntegration.py" copy
from datetime import datetime
from typing import Literal
from pydantic import BaseModel
class FileOutput(BaseModel):
id: str
filename: str
content: str
directory: str
format: Literal['json', 'csv', 'txt']
file_stream = Stream[FileOutput]("file-events")
async def write_to_file(output: FileOutput):
try:
# Ensure directory exists
os.makedirs(output.directory, exist_ok=True)
# Format content based on type
if output.format == 'json':
formatted_content = json.dumps(json.loads(output.content), indent=2)
else:
formatted_content = output.content
# Write file with timestamp
timestamp = datetime.now().isoformat().replace(':', '-').replace('.', '-')
filename = f"{output.filename}_{timestamp}.{output.format}"
filepath = os.path.join(output.directory, filename)
async with aiofiles.open(filepath, 'w', encoding='utf-8') as f:
await f.write(formatted_content)
print(f"Written file: {filepath}")
except Exception as error:
print(f"Failed to write file for output {output.id}: {error}")
file_stream.add_consumer(write_to_file)
```
#### Email and Notifications
Send alerts and notifications based on processed events:
```py filename="NotificationIntegration.py" copy
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from typing import Dict, Any, Literal
from pydantic import BaseModel
class NotificationEvent(BaseModel):
id: str
type: Literal['email', 'slack', 'webhook']
recipient: str
subject: str
message: str
priority: Literal['low', 'medium', 'high']
metadata: Dict[str, Any]
notification_stream = Stream[NotificationEvent]("notifications")
async def send_notification(notification: NotificationEvent):
try:
if notification.type == 'email':
# Send email
msg = MIMEMultipart()
msg['From'] = os.getenv('SMTP_FROM')
msg['To'] = notification.recipient
msg['Subject'] = notification.subject
body = f"""
{notification.message}
Priority: {notification.priority}
"""
msg.attach(MIMEText(body, 'plain'))
server = smtplib.SMTP(os.getenv('SMTP_HOST'), int(os.getenv('SMTP_PORT', '587')))
server.starttls()
server.login(os.getenv('SMTP_USER'), os.getenv('SMTP_PASS'))
server.send_message(msg)
server.quit()
elif notification.type == 'slack':
# Send to Slack
async with httpx.AsyncClient() as client:
await client.post(
f"https://hooks.slack.com/services/{os.getenv('SLACK_WEBHOOK')}",
json={
'text': notification.message,
'channel': notification.recipient,
'username': 'Moose Alert',
'icon_emoji': ':warning:' if notification.priority == 'high' else ':information_source:'
}
)
elif notification.type == 'webhook':
# Send to webhook
async with httpx.AsyncClient() as client:
await client.post(
notification.recipient,
json={
'id': notification.id,
'subject': notification.subject,
'message': notification.message,
'priority': notification.priority,
'metadata': notification.metadata
}
)
print(f"Sent {notification.type} notification {notification.id}")
except Exception as error:
print(f"Failed to send notification {notification.id}: {error}")
notification_stream.add_consumer(send_notification)
```
---
## Create Streams
Source: moose/streaming/create-stream.mdx
Define and create Kafka/Redpanda topics with type-safe schemas
# Creating Streams
## Overview
Streams serve as the transport layer between your data sources and database tables. Built on Kafka/Redpanda topics, they provide a way to implement real-time pipelines for ingesting and processing incoming data.
## Creating Streams
You can create streams in two ways:
- High-level: Using the `IngestPipeline` class (recommended)
- Low-level: Manually configuring the `Stream` component
### Streams for Ingestion
The `IngestPipeline` class provides a convenient way to set up streams with ingestion APIs and tables. This is the recommended way to create streams for ingestion:
```py filename="IngestionStream.py" copy {10}
from moose_lib import IngestPipeline, IngestPipelineConfig, Key
from pydantic import BaseModel
class RawData(BaseModel):
id: Key[str]
value: int
raw_ingestion_stream = IngestPipeline[RawData]("raw_data", IngestPipelineConfig(
ingest_api = True, # Creates an ingestion API endpoint at `POST /ingest/raw_data`
stream = True, # Buffers data between the ingestion API and the database table
table = True, # Creates an OLAP table named `raw_data`
))
```
While the `IngestPipeline` provides a convenient way to set up streams with ingestion APIs and tables, you can also configure these components individually for more granular control:
```py filename="StreamObject.py" copy {8-12}
from moose_lib import Stream, StreamConfig, Key, IngestApi, IngestConfig
from pydantic import BaseModel
class RawData(BaseModel):
id: Key[str]
value: int
raw_table = OlapTable[RawData]("raw_data")
raw_stream = Stream[RawData]("raw_data", StreamConfig(
destination: raw_table # Optional: Specify a destination table for the stream, sets up a process to sync data from the stream to the table
))
raw_ingest_api = IngestApi[RawData]("raw_data", IngestConfig(
destination: raw_stream # Configure Moose to write all validated data to the stream
))
```
### Streams for Transformations
If the raw data needs to be transformed before landing in the database, you can define a transform destination stream and a transform function to process the data:
#### Single Stream Transformation
```py filename="TransformDestinationStream.py" copy
# Import required libraries
from moose_lib import IngestPipeline, Key
from pydantic import BaseModel
# Define schema for raw incoming data
class RawData(BaseModel):
id: Key[str] # Primary key
value: int # Value to be transformed
# Define schema for transformed data
class TransformedData(BaseModel):
id: Key[str] # Primary key (preserved from raw data)
transformedValue: int # Transformed value
transformedAt: Date # Timestamp of transformation
# Create pipeline for raw data - only for ingestion and streaming
raw_data = IngestPipeline[RawData]("raw_data", IngestPipelineConfig(
ingest_api = True, # Enable API endpoint
stream = True, # Create stream for buffering
table = False # No table needed for raw data
))
# Create pipeline for transformed data - for storage only
transformed_data = IngestPipeline[TransformedData]("transformed_data", IngestPipelineConfig(
ingest_api = False, # No direct API endpoint
stream = True, # Create stream to receive transformed data
table = True # Store transformed data in table
))
# Define a named transformation function
def transform_function(record: RawData) -> TransformedData:
return TransformedData(
id=record.id,
transformedValue=record.value * 2,
transformedAt=datetime.now()
)
# Connect the streams with the transformation function
raw_data.get_stream().add_transform(
destination=transformed_data.get_stream(), # Use the get_stream() method to get the stream object from the IngestPipeline
transformation=transform_function # Can also define a lambda function
)
```
Use the `get_stream()` method to get the stream object from the IngestPipeline to avoid errors when referencing the stream object.
You can use lambda functions to define transformations:
```py filename="TransformDestinationStream.py" copy
from moose_lib import Key, IngestApi, OlapTable, Stream
from pydantic import BaseModel
class RawData(BaseModel):
id: Key[str]
value: int
class TransformedData(BaseModel):
id: Key[str]
transformedValue: int
transformedAt: Date
# Create pipeline components for raw data - only for ingestion and streaming
raw_stream = Stream[RawData]("raw_data") ## No destination table since we're not storing the raw data
raw_api = IngestApi[RawData]("raw_data", IngestConfig(
destination=raw_stream ## Connect the ingestion API to the raw data stream
))
# Create pipeline components for transformed data - no ingestion API since we're not ingesting the transformed data
transformed_table = OlapTable[TransformedData]("transformed_data") ## Store the transformed data in a table
transformed_stream = Stream[TransformedData]("transformed_data", StreamConfig(destination=transformed_table)) ## Connect the transformed data stream to the destination table
## Example transformation using a lambda function
raw_stream.add_transform(
destination=transformed_stream,
transformation=lambda record: TransformedData(
id=record.id,
transformedValue=record.value * 2,
transformedAt=datetime.now()
)
)
```
#### Chaining Transformations
For more complex transformations, you can chain multiple transformations together. This is a use case where using a standalone Stream for intermediate stages of your pipeline may be useful:
```py filename="ChainedTransformations.py" copy
from moose_lib import IngestPipeline, Key, Stream, IngestPipelineConfig
# Define the schema for raw input data
class RawData(BaseModel):
id: Key[str]
value: int
# Define the schema for intermediate transformed data
class IntermediateData(BaseModel):
id: Key[str]
transformedValue: int
transformedAt: Date
# Define the schema for final transformed data
class FinalData(BaseModel):
id: Key[str]
transformedValue: int
anotherTransformedValue: int
transformedAt: Date
# Create the first pipeline for raw data ingestion
# Only create an API and a stream (no table) since we're ingesting the raw data
raw_data = IngestPipeline[RawData]("raw_data", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=False
))
# Create an intermediate stream to hold data between transformations (no api or table needed)
intermediate_stream = Stream[IntermediateData]("intermediate_stream")
# First transformation: double the value and add timestamp
raw_data.get_stream().add_transform(destination=intermediate_stream, transformation=lambda record: IntermediateData(
id=record.id,
transformedValue=record.value * 2,
transformedAt=datetime.now()
))
# Create the final pipeline that will store the fully transformed data
final_data = IngestPipeline[FinalData]("final_stream", IngestPipelineConfig(
ingest_api=False,
stream=True,
table=True
))
# Second transformation: further transform the intermediate data
intermediate_stream.add_transform(destination=final_data.get_stream(), transformation=lambda record: FinalData(
id=record.id,
transformedValue=record.transformedValue * 2,
anotherTransformedValue=record.transformedValue * 3,
transformedAt=datetime.now()
))
```
## Stream Configurations
### Parallelism and Retention
```py filename="StreamConfig.py" copy
from moose_lib import Stream, StreamConfig
high_throughput_stream = Stream[Data]("high_throughput", StreamConfig(
parallelism=4, # Process 4 records simultaneously
retention_period=86400, # Keep data for 1 day
))
```
### LifeCycle Management
Control how Moose manages your stream resources when your code changes. See the [LifeCycle Management guide](./lifecycle) for detailed information.
```py filename="LifeCycleStreamConfig.py" copy
from moose_lib import Stream, StreamConfig, LifeCycle
# Production stream with external management
prod_stream = Stream[Data]("prod_stream", StreamConfig(
life_cycle=LifeCycle.EXTERNALLY_MANAGED
))
# Development stream with full management
dev_stream = Stream[Data]("dev_stream", StreamConfig(
life_cycle=LifeCycle.FULLY_MANAGED
))
```
See the [API Reference](/moose/reference/ts-moose-lib#stream) for complete configuration options.
---
## Dead Letter Queues
Source: moose/streaming/dead-letter-queues.mdx
Handle failed stream processing with dead letter queues
# Dead Letter Queues
## Overview
Dead Letter Queues (DLQs) provide a robust error handling mechanism for stream processing in Moose. When streaming functions fail during transformation or consumption, failed messages are automatically routed to a configured dead letter queue for later analysis and recovery.
## Dead Letter Record Structure
When a message fails processing, Moose creates a dead letter record with the following structure:
```py
class DeadLetterModel(BaseModel, Generic[T]):
original_record: Any # The original message that failed
error_message: str # Error description
error_type: str # Error class/type name
failed_at: datetime.datetime # Timestamp when failure occurred
source: Literal["api", "transform", "table"] # Where the failure happened
def as_typed(self) -> T: # Type-safe access to original record
return self._t.model_validate(self.original_record)
```
## Creating Dead Letter Queues
### Basic Setup
```py filename="dead-letter-setup.py" copy
from moose_lib import DeadLetterQueue
from pydantic import BaseModel
# Define your data model
class UserEvent(BaseModel):
user_id: str
action: str
timestamp: float
# Create a dead letter queue for UserEvent failures
user_event_dlq = DeadLetterQueue[UserEvent](name="UserEventDLQ")
```
### Configuring Transformations with Dead Letter Queues
Add a dead letter queue to your Transformation Function configuration, and any errors thrown in the transformation will trigger the event to be routed to the dead letter queue.
```py filename="transform-with-dlq.py" copy
from moose_lib import DeadLetterQueue, TransformConfig
# Create dead letter queue
event_dlq = DeadLetterQueue[RawEvent](name="EventDLQ")
# Define transformation function, including errors to trigger DLQ
def process_event(event: RawEvent) -> ProcessedEvent:
# This transform might fail for invalid data
if not event.user_id or len(event.user_id) == 0:
raise ValueError("Invalid user_id: cannot be empty")
if event.timestamp < 0:
raise ValueError("Invalid timestamp: cannot be negative")
return ProcessedEvent(
user_id=event.user_id,
action=event.action,
processed_at=datetime.now(),
is_valid=True
)
# Add transform with DLQ configuration
raw_events.get_stream().add_transform(
destination=processed_events.get_stream(),
transformation=process_event,
config=TransformConfig(
dead_letter_queue=event_dlq # Configure DLQ for this transform
)
)
```
### Configuring Consumers with Dead Letter Queues
Add a dead letter queue to your Consumer Function configuration, and any errors thrown in the function will trigger the event to be routed to the dead letter queue.
```py filename="consumer-with-dlq.py" copy
from moose_lib import ConsumerConfig
# Define consumer function with errors to trigger DLQ
def process_event_consumer(event: RawEvent) -> None:
# This consumer might fail for certain events
if event.action == "forbidden_action":
raise ValueError("Forbidden action detected")
# Process the event (e.g., send to external API)
print(f"Processing event for user {event.user_id}")
# Add consumer with DLQ configuration
raw_events.get_stream().add_consumer(
consumer=process_event_consumer,
config=ConsumerConfig(
dead_letter_queue=event_dlq # Configure DLQ for this consumer
)
)
```
### Configuring Ingest APIs with Dead Letter Queues
Add a dead letter queue to your Ingest API configuration, and any runtime data validation failures at the API will trigger the event to be routed to the dead letter queue.
```python filename="ValidationExample.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig, IngestConfig
from pydantic import BaseModel
class Properties(BaseModel):
device: Optional[str]
version: Optional[int]
class ExampleModel(BaseModel):
id: str
userId: str
timestamp: datetime
properties: Properties
api = IngestApi[ExampleModel]("your-api-route", IngestConfig(
destination=Stream[ExampleModel]("your-stream-name"),
dead_letter_queue=DeadLetterQueue[ExampleModel]("your-dlq-name")
))
```
## Processing Dead Letter Messages
### Monitoring Dead Letter Queues
```py filename="dlq-monitoring.py" copy
def monitor_dead_letters(dead_letter: DeadLetterModel[RawEvent]) -> None:
print("Dead letter received:")
print(f"Error: {dead_letter.error_message}")
print(f"Error Type: {dead_letter.error_type}")
print(f"Failed At: {dead_letter.failed_at}")
print(f"Source: {dead_letter.source}")
# Access the original typed data
original_event: RawEvent = dead_letter.as_typed()
print(f"Original User ID: {original_event.user_id}")
# Add consumer to monitor dead letter messages
event_dlq.add_consumer(monitor_dead_letters)
```
### Recovery and Retry Logic
```py filename="dlq-recovery.py" copy
from moose_lib import Stream
from typing import Optional
# Create a recovery stream for fixed messages
recovered_events = Stream[ProcessedEvent]("recovered_events", {
"destination": processed_events.get_table() # Send recovered data to main table
})
def recover_event(dead_letter: DeadLetterModel[RawEvent]) -> Optional[ProcessedEvent]:
try:
original_event = dead_letter.as_typed()
# Apply fixes based on error type
if "Invalid user_id" in dead_letter.error_message:
# Skip events with invalid user IDs
return None
if "Invalid timestamp" in dead_letter.error_message:
# Fix negative timestamps
fixed_timestamp = abs(original_event.timestamp)
return ProcessedEvent(
user_id=original_event.user_id,
action=original_event.action,
processed_at=datetime.now(),
is_valid=True
)
return None # Skip other errors
except Exception as error:
print(f"Recovery failed: {error}")
return None
# Add recovery logic to the DLQ
event_dlq.add_transform(
destination=recovered_events,
transformation=recover_event
)
```
## Best Practices
## Common Patterns
### Circuit Breaker Pattern
### Retry with Exponential Backoff
Dead letter queues add overhead to stream processing. Use them judiciously and monitor their impact on throughput. Consider implementing sampling for high-volume streams where occasional message loss is acceptable.
Dead letter queue events can be integrated with monitoring systems like Prometheus, DataDog, or CloudWatch for alerting and dashboards. Consider tracking metrics like DLQ message rate, error types, and recovery success rates.
## Using Dead Letter Queues in Ingestion Pipelines
Dead Letter Queues (DLQs) can be directly integrated with your ingestion pipelines to capture records that fail validation or processing at the API entry point. This ensures that no data is lost, even if it cannot be immediately processed.
```python filename="IngestPipelineWithDLQ.py" copy
from moose_lib import IngestPipeline, DeadLetterQueue, IngestPipelineConfig
from pydantic import BaseModel
class ExampleSchema(BaseModel):
id: str
name: str
value: int
timestamp: datetime
example_dlq = DeadLetterQueue[ExampleSchema](name="exampleDLQ")
pipeline = IngestPipeline[ExampleSchema](
name="example",
config=IngestPipelineConfig(
ingest_api=True,
stream=True,
table=True,
dead_letter_queue=True # Route failed ingestions to DLQ
)
)
```
See the [Ingestion API documentation](/moose/apis/ingest-api#validation) for more details and best practices on configuring DLQs for ingestion.
---
## Publish Data
Source: moose/streaming/from-your-code.mdx
Write data to streams from applications, APIs, or external sources
# Publishing Data to Streams
## Overview
Publishing data to streams allows you to write data from various sources into your Kafka/Redpanda topics. This is the first step in building real-time data pipelines.
## Publishing Methods
### Using REST APIs
The most common way to publish data is through Moose's built-in ingestion APIs. These are configured to automatically sit in front of your streams and publish data to them whenever a request is made to the endpoint:
```py filename="PublishViaAPI.py" copy
from moose_lib import IngestPipeline, IngestPipelineConfig
# When you create an IngestPipeline with ingest_api: True, Moose automatically creates an API endpoint
raw_data = IngestPipeline[RawData]("raw_data", IngestPipelineConfig(
ingest_api=True, # Creates POST /ingest/raw_data endpoint
stream=True,
table=True
))
# You can then publish data via HTTP POST requests
response = requests.post('/ingest/raw_data', json={
'id': '123',
'value': 42
})
```
See the [OpenAPI documentation](/stack/open-api) to learn more about how to generate type-safe client SDKs in your language of choice for all of your Moose APIs.
### Direct Stream Publishing
You can publish directly to a stream from your Moose code using the stream's `send` method.
This is useful when emitting events from workflows or other backend logic.
`send` accepts a single record or an array of records.
If your `Stream` is configured with `schemaConfig.kind = "JSON"`,
Moose produces using the Confluent envelope automatically (0x00 + schema id + JSON).
No code changes are needed beyond setting `schemaConfig`. See the [Schema Registry guide](/moose/streaming/schema-registry).
```py filename="DirectPublish.py" copy
from moose_lib import Stream, StreamConfig, Key
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
# Create a stream (optionally pass StreamConfig with destination/table settings)
events = Stream[UserEvent]("user_events", StreamConfig())
# Publish a single record
events.send(UserEvent(
id="evt_1",
user_id="user_123",
timestamp=datetime.now(),
event_type="click"
))
# Publish multiple records
events.send([
UserEvent(id="evt_2", user_id="user_456", timestamp=datetime.now(), event_type="view"),
UserEvent(id="evt_3", user_id="user_789", timestamp=datetime.now(), event_type="signup"),
])
```
Moose builds the Kafka topic name from your stream name,
optional namespace, and optional version (dots become underscores).
For example, a stream named `events` with version `1.2.0` becomes `events_1_2_0`
(or `my_ns.events_1_2_0` when the namespace is `"my_ns"`).
### Using the Kafka/Redpanda Client from External Applications
You can also publish to streams from external applications using Kafka/Redpanda clients:
```py filename="ExternalPublish.py" copy
from kafka import KafkaProducer
from datetime import datetime
producer = KafkaProducer(
bootstrap_servers=['localhost:19092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
# Publish to the stream topic
producer.send('user-events', { # Stream name becomes the topic name
'id': 'event-123',
'user_id': 'user-456',
'timestamp': datetime.now().isoformat(),
'event_type': 'page_view'
})
```
#### Locating Redpanda Connection Details
When running your Moose backend within your local dev environment, you can find the connection details for your Redpanda cluster in the `moose.config.toml` file in the root of your project:
```toml filename="moose.config.toml" copy
[redpanda_config]
broker = "localhost:19092"
message_timeout_ms = 1000
retention_ms = 30000
replication_factor = 1
```
---
## Schema Registry
Source: moose/streaming/schema-registry.mdx
Use Confluent Schema Registry with Moose streams (JSON Schema first)
# Schema Registry Integration
The first supported encoding is JSON Schema. Avro and Protobuf are planned.
## Overview
Moose can publish and consume Kafka/Redpanda messages using Confluent Schema Registry. The first supported encoding is JSON Schema; Avro and Protobuf are planned.
## Configure Schema Registry URL
Set the Schema Registry URL in `moose.config.toml` under `redpanda_config` (aliased as `kafka_config`). You can also override with environment variables.
```toml filename="moose.config.toml" copy
[redpanda_config]
broker = "localhost:19092"
schema_registry_url = "http://localhost:8081"
```
Environment overrides (either key works):
```bash filename="Terminal" copy
export MOOSE_REDPANDA_CONFIG__SCHEMA_REGISTRY_URL=http://localhost:8081
# or
export MOOSE_KAFKA_CONFIG__SCHEMA_REGISTRY_URL=http://localhost:8081
```
## Referencing Schemas
You can attach a Schema Registry reference to any `Stream` via `schemaConfig`. Use one of:
- Subject latest: `{ subjectLatest: string }`
- Subject and version: `{ subject: string, version: number }`
- Schema id: `{ id: number }`
```py filename="sr_stream.py" copy {13-21}
from moose_lib import Stream, StreamConfig
from moose_lib.dmv2.stream import KafkaSchemaConfig, SubjectLatest
from pydantic import BaseModel
class Event(BaseModel):
id: str
value: int
schema_config = KafkaSchemaConfig(
kind="JSON",
reference=SubjectLatest(name="event-value"),
)
events = Stream[Event](
"events",
StreamConfig(schema_config=schema_config),
)
events.send(Event(id="e1", value=42))
```
## Consuming SR JSON in Runners
Moose streaming runners automatically detect the Confluent JSON envelope
when consuming and strip the header before parsing the JSON.
Your transformation code continues to work unchanged.
## Ingestion APIs and SR
When an Ingest API routes to a topic that has a `schemaConfig` of kind JSON,
Moose resolves the schema id and publishes requests using the Schema Registry envelope.
You can also set the reference to a fixed `id` to skip lookups.
## Discover existing topics and schemas
Use the CLI to pull external topics and optionally fetch JSON Schemas from Schema Registry to emit typed models.
```bash filename="Terminal" copy
moose kafka pull \
--schema-registry http://localhost:8081 \
--path app/external-topics \
--include "*" \
--exclude "{__consumer_offsets,_schemas}"
```
This writes external topic declarations under the provided path based on language (default path is inferred).
## Current limitations
- JSON Schema only (Avro/Protobuf planned)
- Ingest API schema declared in code may not match the actual schema in registry.
---
## Sync to Table
Source: moose/streaming/sync-to-table.mdx
Automatically sync stream data to OLAP tables with intelligent batching
# Sync to Table
## Overview
Moose automatically handles batch writes between streams and OLAP tables through a **destination configuration**. When you specify a `destination` OLAP table for a stream, Moose provisions a background synchronization process that batches and writes data from the stream to the table.
### Basic Usage
```py filename="SyncToTable.py" copy {12}
from moose_lib import Stream, OlapTable, Key
class Event(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
events_table = OlapTable[Event]("events")
events_stream = Stream[Event]("events", StreamConfig(
destination=events_table # This configures automatic batching
))
```
## Setting Up Automatic Sync
### Using IngestPipeline (Easiest)
The simplest way to set up automatic syncing is with an `IngestPipeline`, which creates all components and wires them together:
```py filename="AutoSync.py" copy {14-15}
from moose_lib import IngestPipeline, IngestPipelineConfig, Key
from pydantic import BaseModel
from datetime import datetime
class Event(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
# Creates stream, table, API, and automatic sync
events_pipeline = IngestPipeline[Event]("events", IngestPipelineConfig(
ingest_api=True,
stream=True, # Creates stream
table=True # Creates destination table + auto-sync process
))
```
### Standalone Components
For more granular control, you can configure components individually:
```py filename="ManualSync.py" copy
from moose_lib import Stream, OlapTable, IngestApi, StreamConfig, Key
from pydantic import BaseModel
from datetime import datetime
class Event(BaseModel):
id: Key[str]
user_id: str
timestamp: datetime
event_type: str
# Create table first
events_table = OlapTable[Event]("events")
# Create stream with destination table (enables auto-sync)
events_stream = Stream[Event]("events", StreamConfig(
destination=events_table # This configures automatic batching
))
# Create API that writes to the stream
events_api = IngestApi[Event]("events", {
"destination": events_stream
})
```
## How Automatic Syncing Works
When you configure a stream with a `destination` table, Moose automatically handles the synchronization by managing a Rust process process in the background.
Moose creates a **Rust background process** that:
1. **Consumes** messages from the stream (Kafka/Redpanda topic)
2. **Batches** records up to 100,000 or flushes every second (whichever comes first)
3. **Executes** optimized ClickHouse `INSERT` statements
4. **Commits** stream offsets after successful writes
5. **Retries** failed batches with exponential backoff
Default batching parameters:
| Parameter | Value | Description |
|-----------|-------|-------------|
| `MAX_BATCH_SIZE` | 100,000 records | Maximum records per batch insert |
| `FLUSH_INTERVAL` | 1 second | Automatic flush regardless of batch size |
Currently, you cannot configure the batching parameters, but we're interested in adding this feature. If you need this capability, let us know on slack!
[ClickHouse inserts need to be batched for optimal performance](https://clickhouse.com/blog/asynchronous-data-inserts-in-clickhouse#data-needs-to-be-batched-for-optimal-performance). Moose automatically handles this optimization internally, ensuring your data is efficiently written to ClickHouse without any configuration required.
## Data Flow Example
Here's how data flows through the automatic sync process:
```py filename="DataFlow.py" copy
# 1. Data sent to ingestion API
requests.post('http://localhost:4000/ingest/events', json={
"id": "evt_123",
"user_id": "user_456",
"timestamp": "2024-01-15T10:30:00Z",
"event_type": "click"
})
# 2. API validates and writes to stream
# 3. Background sync process batches stream data
# 4. Batch automatically written to ClickHouse table when:
# - Batch reaches 100,000 records, OR
# - 1 second has elapsed since last flush
# 5. Data available for queries in events table
# SELECT * FROM events WHERE user_id = 'user_456';
```
## Monitoring and Observability
The sync process provides built-in observability within the Moose runtime:
- **Batch Insert Logs**: Records successful batch insertions with sizes and offsets
- **Error Handling**: Logs transient failures with retry information
- **Metrics**: Tracks throughput, batch sizes, and error rates
- **Offset Tracking**: Maintains Kafka consumer group offsets for reliability
---
## Transformation Functions
Source: moose/streaming/transform-functions.mdx
Process and transform data in-flight between streams
# Transformation Functions
## Overview
Transformations allow you to process and reshape data as it flows between streams. You can filter, enrich, reshape, and combine data in-flight before it reaches its destination.
## Implementing Transformations
### Reshape and Enrich Data
Transform data shape or enrich records:
```py filename="DataTransform.py" copy
from moose_lib import Stream, Key
from pydantic import BaseModel
from datetime import datetime
class EventProperties(BaseModel):
user_id: str
platform: str
app_version: str
ip_address: str
class RawEvent(BaseModel):
id: Key[str]
timestamp: str
data: EventProperties
class EnrichedEventProperties(BaseModel):
platform: str
version: str
country: str
class EnrichedEventMetadata(BaseModel):
originalTimestamp: str
processedAt: datetime
class EnrichedEvent(BaseModel):
eventId: Key[str]
timestamp: datetime
userId: Key[str]
properties: EnrichedEventProperties
metadata: EnrichedEventMetadata
raw_stream = Stream[RawEvent]("raw_events")
enriched_stream = Stream[EnrichedEvent]("enriched_events")
raw_stream.add_transform(destination=enriched_stream, transformation=lambda record: EnrichedEvent(
eventId=record.id,
timestamp=datetime.fromisoformat(record.timestamp),
userId=record.data.user_id,
properties=EnrichedEventProperties(
platform=record.data.platform,
version=record.data.app_version,
country=lookupCountry(record.data.ip_address)
),
metadata=EnrichedEventMetadata(
originalTimestamp=record.timestamp,
processedAt=datetime.now()
)
))
```
### Filtering
Remove or filter records based on conditions:
```py filename="FilterStream.py" copy
from moose_lib import Stream, Key
from pydantic import BaseModel
class MetricRecord(BaseModel):
id: Key[str]
name: str
value: float
timestamp: Date
class ValidMetrics(BaseModel):
id: Key[str]
name: str
value: float
timestamp: Date
input_stream = Stream[MetricRecord]("input_metrics")
valid_metrics = Stream[ValidMetrics]("valid_metrics")
def filter_function(record: MetricRecord) -> ValidMetrics | None:
if record.value > 0 and record.timestamp > getStartOfDay() and not record.name.startswith('debug_'):
return ValidMetrics(
id=record.id,
name=record.name,
value=record.value,
timestamp=record.timestamp
)
return None
input_stream.add_transform(destination=valid_metrics, transformation=filter_function)
```
### Fan Out (1:N)
Send data to multiple downstream processors:
```py filename="FanOut.py" copy
from moose_lib import Stream, Key
from pydantic import BaseModel
# Define data models
class Order(BaseModel):
orderId: Key[str]
userId: Key[str]
amount: float
items: List[str]
class HighPriorityOrder(Order):
priority: str = 'high'
class ArchivedOrder(Order):
archivedAt: Date
# Create source and destination streams
order_stream = Stream[Order]("orders")
analytics_stream = Stream[Order]("order_analytics")
notification_stream = Stream[HighPriorityOrder]("order_notifications")
archive_stream = Stream[ArchivedOrder]("order_archive")
# Send all orders to analytics
def analytics_transform(order: Order) -> Order:
return order
order_stream.add_transform(destination=analytics_stream, transformation=analytics_transform)
# Send large orders to notifications
def high_priority_transform(order: Order) -> HighPriorityOrder | None:
if order.amount > 1000:
return HighPriorityOrder(
orderId=order.orderId,
userId=order.userId,
amount=order.amount,
items=order.items,
priority='high'
)
return None # Skip small orders
order_stream.add_transform(destination=notification_stream, transformation=high_priority_transform)
# Archive all orders with timestamp
def archive_transform(order: Order) -> ArchivedOrder | None:
return ArchivedOrder(
orderId=order.orderId,
userId=order.userId,
amount=order.amount,
items=order.items,
archivedAt=datetime.now()
)
order_stream.add_transform(destination=archive_stream, transformation=archive_transform)
```
### Fan In (N:1)
Combine data from multiple sources:
```py filename="FanIn.py" copy
from moose_lib import Stream, OlapTable, Key, StreamConfig
class UserEvent(BaseModel):
userId: Key[str]
eventType: str
timestamp: Date
source: str
# Create source and destination streams
web_events = Stream[UserEvent]("web_events")
mobile_events = Stream[UserEvent]("mobile_events")
api_events = Stream[UserEvent]("api_events")
# Create a stream and table for the combined events
events_table = OlapTable[UserEvent]("all_events")
all_events = Stream[UserEvent]("all_events", StreamConfig(
destination=events_table
))
# Fan in from web
def web_transform(event: UserEvent) -> UserEvent:
return UserEvent(
userId=event.userId,
eventType=event.eventType,
timestamp=event.timestamp,
source='web'
)
web_events.add_transform(destination=all_events, transformation=web_transform)
# Fan in from mobile
def mobile_transform(event: UserEvent) -> UserEvent:
return UserEvent(
userId=event.userId,
eventType=event.eventType,
timestamp=event.timestamp,
source='mobile'
)
mobile_events.add_transform(destination=all_events, transformation=mobile_transform)
# Fan in from API
def api_transform(event: UserEvent) -> UserEvent:
return UserEvent(
userId=event.userId,
eventType=event.eventType,
timestamp=event.timestamp,
source='api'
)
api_events.add_transform(destination=all_events, transformation=api_transform)
```
### Unnesting
Flatten nested records:
```py filename="Unnest.py" copy
from moose_lib import Stream, Key
class NestedRecord(BaseModel):
id: Key[str]
nested: List[NestedValue]
class FlattenedRecord(BaseModel):
id: Key[str]
value: int
nested_stream = Stream[NestedRecord]("nested_records")
flattened_stream = Stream[FlattenedRecord]("flattened_records")
def unnest_transform(record: NestedRecord) -> List[FlattenedRecord]:
result = []
for nested in record.nested:
result.append(FlattenedRecord(
id=record.id,
value=nested.value
))
return result
nested_stream.add_transform(flattened_stream, unnest_transform)
```
You cannot have multiple transforms between the same source and destination stream. If you need multiple transformation routes, you must either:
- Use conditional logic inside a single streaming function to handle different cases, or
- Implement a fan-out/fan-in pattern, where you route records to different intermediate streams and then merge them back into the destination stream.
## Error Handling with Dead Letter Queues
When stream processing fails, you can configure dead letter queues to capture failed messages for later analysis and recovery. This prevents single message failures from stopping your entire pipeline.
```py filename="DeadLetterQueue.py" copy
from moose_lib import DeadLetterQueue, IngestPipeline, IngestPipelineConfig, TransformConfig, DeadLetterModel
from pydantic import BaseModel
from datetime import datetime
class UserEvent(BaseModel):
user_id: str
action: str
timestamp: float
class ProcessedEvent(BaseModel):
user_id: str
action: str
processed_at: datetime
is_valid: bool
# Create pipelines
raw_events = IngestPipeline[UserEvent]("raw_events", IngestPipelineConfig(
ingest_api=True,
stream=True,
table=False
))
processed_events = IngestPipeline[ProcessedEvent]("processed_events", IngestPipelineConfig(
ingest_api=False,
stream=True,
table=True
))
# Create dead letter queue for failed transformations
event_dlq = DeadLetterQueue[UserEvent](name="EventDLQ")
def process_event(event: UserEvent) -> ProcessedEvent:
# This might fail for invalid data
if not event.user_id or len(event.user_id) == 0:
raise ValueError("Invalid user_id: cannot be empty")
return ProcessedEvent(
user_id=event.user_id,
action=event.action,
processed_at=datetime.now(),
is_valid=True
)
# Add transform with error handling
raw_events.get_stream().add_transform(
destination=processed_events.get_stream(),
transformation=process_event,
config=TransformConfig(
dead_letter_queue=event_dlq # Failed messages go here
)
)
def monitor_dead_letters(dead_letter: DeadLetterModel[UserEvent]) -> None:
print(f"Error: {dead_letter.error_message}")
print(f"Failed at: {dead_letter.failed_at}")
# Access original typed data
original_event: UserEvent = dead_letter.as_typed()
print(f"Original User ID: {original_event.user_id}")
# Monitor dead letter messages
event_dlq.add_consumer(monitor_dead_letters)
```
For comprehensive dead letter queue patterns, recovery strategies, and best practices, see the [Dead Letter Queues guide](./dead-letter-queues).
---
## Templates & Apps
Source: moose/templates-examples.mdx
Browse templates and demo apps for MooseStack
# Templates & Apps
Moose provides two ways to get started: **templates** and **demo apps**. Templates are simple skeleton applications that you can initialize with `moose init`, while demo apps are more advanced examples available on GitHub that showcase real-world use cases and integrations.
**Initialize a template:**
```bash filename="Terminal" copy
moose init PROJECT_NAME TEMPLATE_NAME
```
**List available templates:**
```bash filename="Terminal" copy
moose template list
```
## Popular Apps
---
## Browse Apps
### Nextjs + Express + MCP demo app: Aircraft data [#plane-transponder-demo]
Complete demo application featuring real-time aircraft transponder data with MCP chat integration.
**Repository:** [https://github.com/514-labs/planes](https://github.com/514-labs/planes)
Key Features:
Next.js
Express
MCP
Moose OLAP
ClickHouse
---
### Postgres to ClickHouse CDC with Debezium [#postgres-clickhouse-cdc]
Easy-to-run demo of a CDC pipeline using Debezium, PostgreSQL, Redpanda, and ClickHouse.
**Repository:** [https://github.com/514-labs/debezium-cdc](https://github.com/514-labs/debezium-cdc)
**Blog Post:** [Code-First CDC to ClickHouse with Debezium, Redpanda, and MooseStack](https://www.fiveonefour.com/blog/cdc-postgres-to-clickhouse-debezium-drizzle)
Key Features:
CDC
Debezium
PostgreSQL
Redpanda
ClickHouse
Drizzle ORM
---
### User-facing analytics reference app (Postgres + Clickhouse + React) [#foobar-ufa]
A complete reference architecture showing how to add a dedicated analytics microservice to an existing application without impacting your primary database. Features Postgres + ClickHouse + React frontend with chat analytics.
**Repository:** [https://github.com/514-labs/area-code/tree/main/ufa](https://github.com/514-labs/area-code/tree/main/ufa)
Key Features:
PostgreSQL
ClickHouse
React
TanStack Query
Supabase
Moose OLAP
Moose Streaming
Moose APIs
Elasticsearch
Temporal
---
### User-facing analytics reference app (Clickhouse Cloud + React) [#foobar-ufa-lite]
A simplified version of the UFA architecture using ClickHouse Cloud + React frontend with chat analytics. This version demonstrates a cloud-native approach without local infrastructure dependencies.
**Repository:** [https://github.com/514-labs/area-code/tree/main/ufa-lite](https://github.com/514-labs/area-code/tree/main/ufa-lite)
Key Features:
ClickHouse Cloud
React
Moose OLAP
Moose APIs
---
## Browse Templates
### TypeScript (Default) [#typescript-default]
Default TypeScript project, seeded with foobar example components.
```bash filename="Terminal" copy
moose init PROJECT_NAME typescript
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/typescript](https://github.com/514-labs/moosestack/tree/main/templates/typescript)
Key Features:
Moose APIs
Moose OLAP
Moose Streaming
Moose Workflows
---
### Python (Default) [#python-default]
Default Python project, seeded with foobar example components.
```bash filename="Terminal" copy
moose init PROJECT_NAME python
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/python](https://github.com/514-labs/moosestack/tree/main/templates/python)
Key Features:
Moose APIs
Moose OLAP
Moose Streaming
Moose Workflows
---
### TypeScript (Empty) [#typescript-empty]
Empty TypeScript project with minimal structure.
```bash filename="Terminal" copy
moose init PROJECT_NAME typescript-empty
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/typescript-empty](https://github.com/514-labs/moosestack/tree/main/templates/typescript-empty)
Key Features:
TypeScript
Moose OLAP
---
### Python (Empty) [#python-empty]
Empty Python project with minimal structure.
```bash filename="Terminal" copy
moose init PROJECT_NAME python-empty
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/python-empty](https://github.com/514-labs/moosestack/tree/main/templates/python-empty)
Key Features:
Python
Moose OLAP
---
### Next.js (Empty) [#nextjs-empty]
TypeScript project with a Next.js frontend (empty).
```bash filename="Terminal" copy
moose init PROJECT_NAME next-app-empty
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/next-app-empty](https://github.com/514-labs/moosestack/tree/main/templates/next-app-empty)
Key Features:
Next.js
TypeScript
Moose APIs
Moose OLAP
Moose Streaming
Moose Workflows
---
### Express.js [#express]
TypeScript project using Express for serving analytical APIs.
```bash filename="Terminal" copy
moose init PROJECT_NAME typescript-express
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/typescript-express](https://github.com/514-labs/moosestack/tree/main/templates/typescript-express)
Key Features:
Express.js
TypeScript
Moose OLAP
Moose Streaming
Moose Workflows
---
### TypeScript MCP [#typescript-mcp]
TypeScript project with an MCP (Model Context Protocol) implementation using Express. The included example tool enables LLMs to query ClickHouse.
```bash filename="Terminal" copy
moose init PROJECT_NAME typescript-mcp
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/typescript-mcp](https://github.com/514-labs/moosestack/tree/main/templates/typescript-mcp)
Key Features:
TypeScript
MCP
Express
Moose OLAP
Moose Streaming
Moose Workflows
---
### FastAPI [#fastapi]
Python project using FastAPI for serving analytical APIs.
```bash filename="Terminal" copy
moose init PROJECT_NAME python-fastapi
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/python-fastapi](https://github.com/514-labs/moosestack/tree/main/templates/python-fastapi)
Key Features:
FastAPI
Python
Moose OLAP
Moose Streaming
Moose Workflows
---
### FastAPI (Client-Only) [#fastapi-client-only]
FastAPI client-only project using MooseStack libraries without requiring the Moose runtime.
```bash filename="Terminal" copy
moose init PROJECT_NAME python-fastapi-client-only
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/python-fastapi-client-only](https://github.com/514-labs/moosestack/tree/main/templates/python-fastapi-client-only)
Key Features:
FastAPI
Python
Moose OLAP
Moose Streaming
Moose Workflows
---
### ADS-B (Aircraft Tracking) [#adsb]
Real-time aircraft transponder data tracking.
```bash filename="Terminal" copy
moose init PROJECT_NAME ads-b
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/ads-b](https://github.com/514-labs/moosestack/tree/main/templates/ads-b)
Key Features:
Moose APIs
Moose OLAP
Moose Streaming
Moose Workflows
---
### ADS-B with Frontend [#adsb-frontend]
Real-time aircraft transponder data with a React frontend.
```bash filename="Terminal" copy
moose init PROJECT_NAME ads-b-frontend
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/ads-b-frontend](https://github.com/514-labs/moosestack/tree/main/templates/ads-b-frontend)
Key Features:
Next.js
React
Moose APIs
Moose OLAP
Moose Workflows
---
### Live Heart Rate Leaderboard [#heartrate]
Live heart rate leaderboard inspired by F45 with Streamlit frontend.
```bash filename="Terminal" copy
moose init PROJECT_NAME live-heartrate-leaderboard
```
**Repository:** [https://github.com/514-labs/moosestack/tree/main/templates/live-heartrate-leaderboard](https://github.com/514-labs/moosestack/tree/main/templates/live-heartrate-leaderboard)
Key Features:
Streamlit
Python
Moose APIs
Moose OLAP
Moose Streaming
Moose Workflows
---
## Moose Workflows
Source: moose/workflows.mdx
Build ETL pipelines, scheduled jobs, and long-running tasks with orchestration
# Moose Workflows
## Overview
The Workflows module provides standalone task orchestration and automation. You can use this capability independently to build ETL pipelines, run scheduled jobs, trigger background tasks, and manage long-running tasks without requiring other MooseStack components like databases or streams.
### Basic Usage
```python filename="DataFlow.py" copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig
from pydantic import BaseModel
class Foo(BaseModel):
name: str
class Bar(BaseModel):
name: str
greeting: str
counter: int
def run_task1(ctx: TaskContext[Foo]) -> Bar:
greeting = f"hello, {ctx.input.name}!"
return Bar(
name=ctx.input.name,
greeting=greeting,
counter=1
)
def run_task2(ctx: TaskContext[Bar]) -> None:
print(f"{ctx.input.greeting} (count: {ctx.input.counter})")
task1 = Task[Foo, Bar](
name="task1",
config=TaskConfig(run=run_task1, on_complete=[task2])
)
task2 = Task[Bar, None](
name="task2",
config=TaskConfig(run=run_task2)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1)
)
# No export needed - Python modules are automatically discovered
```
### Enabling Workflows
To enable workflows, you need to add the `workflows` feature to your `moose.config.toml` file:
```toml filename="moose.config.toml" copy
[features]
workflows = true
```
## Core Capabilities
## Integration with Other Capabilities
While the Workflows capability works independently, it is designed to be used in conjunction with other MooseStack capabilities:
---
## moose/workflows/cancel-workflow
Source: moose/workflows/cancel-workflow.mdx
# Cancel a Running Workflow
To stop a workflow before it has finished running, use the `workflow cancel` command.
```bash filename="Terminal" copy
moose workflow cancel
```
### Implementing Cancelation Callbacks
For workflows that are running and have clean up operations to perform, you can implement a termination callback.
This is especially useful for any long running tasks that have open connections or subscriptions to other services that need to be closed.
You may also use the `state` within the run/cancel context to supplement your business logic.
```python filename="workflows/workflows.py" copy
def run_task1(ctx: TaskContext[Foo]) -> None:
connection.open()
def on_cancel(ctx: TaskContext[Foo]) -> None:
# Clean up any resources
connection.close()
task1 = Task[Foo, None](
name="task1",
config=TaskConfig(run=run_task1, on_cancel=on_cancel)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, retries=3)
)
```
---
## Define Workflows
Source: moose/workflows/define-workflow.mdx
Create workflow definitions with task sequences and data flow
# Define Workflows
## Overview
Workflows automate task sequences with built-in reliability and monitoring. Tasks execute in order, passing data between steps.
Built on Temporal for reliability, retries, and monitoring via GUI dashboard.
## Writing Workflow Tasks
```python filename="app/main.py" copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig
from pydantic import BaseModel
class Foo(BaseModel):
name: str;
def run_task1(ctx: TaskContext[Foo]) -> None:
name = ctx.input.name or "world"
greeting = f"hello, {name}!"
task1 = Task[Foo, None](
name="task1",
config=TaskConfig(run=run_task1)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1)
)
```
Export `Task` and `Workflow` objects. Specify `starting_task` in the `WorkflowConfig`.
## Data Flow Between Tasks
Tasks communicate through their return values. Each task can return an object that is automatically passed as input to the next task in the workflow.
- Only values inside the object are passed to the next task.
- The object must be JSON-serializable.
```python filename="app/main.py" copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig, Logger
from pydantic import BaseModel
class Foo(BaseModel):
name: str
class Bar(BaseModel):
name: str
greeting: str
counter: int
def run_task2(ctx: TaskContext[Bar]) -> None:
logger = Logger(action="run_task2")
logger.info(f"task2 input: {ctx.input.model_dump_json()}")
task2 = Task[Bar, None](
name="task2",
config=TaskConfig(run=run_task2)
)
def run_task1(ctx: TaskContext[Foo]) -> Bar:
name = ctx.input.name or "world"
greeting = f"hello, {name}!"
return Bar(
name=name,
greeting=greeting,
counter=1
)
task1 = Task[Foo, Bar](
name="task1",
config=TaskConfig(
run=run_task1,
on_complete=[task2]
)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1)
)
```
## Debugging Workflows
While the Temporal dashboard is a helpful tool for debugging, you can also leverage the Moose CLI to monitor and debug workflows. This is useful if you want to monitor a workflow without having to leave your terminal.
Use the `moose workflow status` command to monitor a workflow:
```bash filename="Terminal" copy
moose workflow status example
```
This will print high level information about the workflow run:
```txt filename="Terminal"
Workflow Workflow Status: example
Run ID: 446eab6e-663d-4913-93fe-f79d6109391f
Status: WORKFLOW_EXECUTION_STATUS_COMPLETED ✅
Execution Time: 66s
```
If you want more detailed information about the workflow's status, including task level logs and inputs/outputs, you can use the `--verbose` flag:
```bash filename="Terminal" copy
moose workflow status example --verbose
```
```txt filename="Terminal"
Workflow Workflow Status: example
Run ID: 446eab6e-663d-4913-93fe-f79d6109391f
Status: WORKFLOW_EXECUTION_STATUS_COMPLETED ✅
Execution Time: 66s
Request: GetWorkflowExecutionHistoryRequest { namespace: "default", execution: Some(WorkflowExecution { workflow_id: "example", run_id: "446eab6e-663d-4913-93fe-f79d6109391f" }), maximum_page_size: 0, next_page_token: [], wait_new_event: false, history_event_filter_type: Unspecified, skip_archival: false }
Found 17 events
Event History:
• [2025-02-21T14:16:56.234808764+00:00] EVENT_TYPE_WORKFLOW_EXECUTION_STARTED
• [2025-02-21T14:16:56.235132389+00:00] EVENT_TYPE_WORKFLOW_TASK_SCHEDULED
• [2025-02-21T14:16:56.259341847+00:00] EVENT_TYPE_WORKFLOW_TASK_STARTED
• [2025-02-21T14:16:56.329856180+00:00] EVENT_TYPE_WORKFLOW_TASK_COMPLETED
• [2025-02-21T14:16:56.329951889+00:00] EVENT_TYPE_ACTIVITY_TASK_SCHEDULED
Activity: example/task1
• [2025-02-21T14:16:56.333761680+00:00] EVENT_TYPE_ACTIVITY_TASK_STARTED
• [2025-02-21T14:16:56.497156055+00:00] EVENT_TYPE_ACTIVITY_TASK_COMPLETED
Result:
{
"counter": 1,
"greeting": "hello, no name!",
"name": "no name",
}
```
With this more detailed output, you can see the exact sequence of events and the inputs and outputs of each task. This is useful for debugging and understanding the workflow's behavior.
The result of each task is included in the output, allowing you to inspect the data that was passed between task for debugging purposes.
If your workflow fails due to some runtime error, you can use the event history timeline to identify the task that failed.
---
## moose/workflows/retries-and-timeouts
Source: moose/workflows/retries-and-timeouts.mdx
# Error Detection and Handling
Moose provides multiple layers of error protection, both at the workflow and task level:
### Workflow-Level Retries and Timeouts
Moose automatically catches any runtime errors during workflow execution. Errors are logged for debugging, and the orchestrator will retry failed tasks according to the `retries` option.
In your `Workflow`, you can configure the following options to control workflow behavior, including timeouts and retries:
```python filename="app/main.py" {5} copy
from moose_lib import Task, TaskConfig, Workflow, WorkflowConfig
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, retries=1, timeout="10m")
)
```
### Task-Level Errors and Retries
For more granular control over task-level errors and retries, you can configure your individual tasks to have their own retry behavior.
For workflows & tasks that may not have a predefined timeout, you may set `never` as the timeout.
```python filename="app/main.py" {8} copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig
def run_task1(ctx: TaskContext[Foo]) -> None:
pass
task1 = Task[Foo, None](
name="task1",
config=TaskConfig(run=run_task1, retries=1, timeout="5m")
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, retries=2, timeout="10m")
)
```
### Example: Workflow and Task Retry Interplay
When configuring retries, it's important to understand how workflow-level and task-level retries interact. Consider the following scenario:
```python filename="app/main.py" {8,13} copy
from moose_lib import Task, TaskConfig, TaskContext, Workflow, WorkflowConfig
def run_task1(ctx: TaskContext[Foo]) -> None:
pass
task1 = Task[Foo, None](
name="task1",
config=TaskConfig(run=run_task1, retries=2)
)
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, retries=3)
)
```
If the execution of the workflow encounters an error, the retry sequence would proceed as follows:
1. **Workflow Attempt 1**
- **Task Attempt 1**: Task fails
- **Task Attempt 2**: Task fails
- **Task Attempt 3**: Task fails
- Workflow attempt fails after exhausting task retries
2. **Workflow Attempt 2**
- **Task Attempt 1**: Task fails
- **Task Attempt 2**: Task fails
- **Task Attempt 3**: Task fails
- Workflow attempt fails after exhausting task retries
In this example, the workflow will make a total of 2 attempts, and each task within those attempts will retry up to 3 times before the workflow itself retries.
---
## Schedule Workflows
Source: moose/workflows/schedule-workflow.mdx
Set up recurring and scheduled workflow execution
# Schedule Workflows
## Overview
Moose workflows can be configured to run automatically on a schedule using cron expressions or interval-based scheduling. This enables you to automate recurring tasks, data processing jobs, and maintenance operations.
## Scheduling Workflows
Workflows can be configured to run on a schedule using the `schedule` field in `Workflow`. This field is optional and blank by default.
### Cron Expressions
```python filename="app/scheduled_workflow.py" copy
from moose_lib import Task, TaskConfig, Workflow, WorkflowConfig
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, schedule="0 12 * * *") # Runs at 12:00 PM every day
)
```
#### Cron Expression Format
```text
|------------------------------- Minute (0-59)
| |------------------------- Hour (0-23)
| | |------------------- Day of the month (1-31)
| | | |------------- Month (1-12; or JAN to DEC)
| | | | |------- Day of the week (0-6; or SUN to SAT; or 7 for Sunday)
| | | | |
| | | | |
* * * * *
```
#### Common Cron Examples
| Cron Expression | Description |
|-----------------|-------------|
| 0 12 * * * | Runs at 12:00 PM every day |
| 0 0 * * 0 | Runs at 12:00 AM every Sunday |
| 0 8 * * 1-5 | Runs at 8:00 AM on weekdays (Monday to Friday) |
| * * * * * | Runs every minute |
| 0 */6 * * * | Runs every 6 hours |
| 0 9 1 * * | Runs at 9:00 AM on the first day of every month |
| 0 0 1 1 * | Runs at midnight on January 1st every year |
Use an online cron expression visualizer like [crontab.guru](https://crontab.guru/) to help you understand how the cron expression will schedule your workflow.
### Interval Schedules
Interval schedules can be specified as a string `"@every "`. The interval follows standard duration format:
```python filename="app/interval_workflow.py" copy
from moose_lib import Task, TaskConfig, Workflow, WorkflowConfig
myworkflow = Workflow(
name="myworkflow",
config=WorkflowConfig(starting_task=task1, schedule="@every 1h") # Runs every hour
)
```
#### Interval Examples
| Interval | Description |
|----------|-------------|
| `@every 30s` | Every 30 seconds |
| `@every 5m` | Every 5 minutes |
| `@every 1h` | Every hour |
| `@every 12h` | Every 12 hours |
| `@every 24h` | Every 24 hours |
| `@every 7d` | Every 7 days |
## Practical Scheduling Examples
### Daily Data Processing
```python filename="app/daily_etl.py" copy
from moose_lib import Workflow, WorkflowConfig
daily_data_processing = Workflow(
name="daily-data-processing",
config=WorkflowConfig(
starting_task=extract_data_task,
schedule="0 2 * * *", # Run at 2 AM every day
retries=2,
timeout="2h"
)
)
```
### Weekly Reports
```python filename="app/weekly_reports.py" copy
weekly_reports = Workflow(
name="weekly-reports",
config=WorkflowConfig(
starting_task=generate_report_task,
schedule="0 9 * * 1", # Run at 9 AM every Monday
retries=1,
timeout="1h"
)
)
```
### High-Frequency Monitoring
```python filename="app/monitoring.py" copy
system_monitoring = Workflow(
name="system-monitoring",
config=WorkflowConfig(
starting_task=check_system_health_task,
schedule="@every 5m", # Check every 5 minutes
retries=0, # Don't retry monitoring checks
timeout="30s"
)
)
```
## Monitoring Scheduled Workflows
### Development Environment
If your dev server is running, you should see logs in the terminal when your scheduled workflow is executed:
```bash filename="Terminal" copy
moose dev
```
```txt filename="Terminal"
[2024-01-15 12:00:00] Scheduled workflow 'daily-data-processing' started
[2024-01-15 12:00:01] Task 'extract' completed successfully
[2024-01-15 12:00:15] Task 'transform' completed successfully
[2024-01-15 12:00:30] Task 'load' completed successfully
[2024-01-15 12:00:30] Workflow 'daily-data-processing' completed successfully
```
### Checking Workflow Status
You can check the status of scheduled workflows using the CLI:
```bash filename="Terminal" copy
# List all workflows defined in your project
moose workflow list
# Alternative command to list all workflows
moose ls --type workflows
# View workflow execution history
moose workflow history
# Check specific workflow status
moose workflow status daily-data-processing
# Get detailed execution history
moose workflow status daily-data-processing --verbose
```
### Temporal Dashboard
Access the Temporal dashboard to view scheduled workflow executions:
```bash filename="Terminal" copy
# Open Temporal dashboard (typically at http://localhost:8080)
open http://localhost:8080
```
The dashboard shows:
- Scheduled workflow definitions
- Execution history and timing
- Success/failure rates
- Retry attempts and errors
## Best Practices for Scheduled Workflows
### Timeout and Retry Configuration
Configure appropriate timeouts and retries for scheduled workflows:
```python filename="app/robust_scheduled_workflow.py" copy
def run_main_task() -> None:
# Long-running task logic
pass
main_task = Task[None, None](
name="main",
config=TaskConfig(
run=run_main_task,
retries=3, # Retry individual tasks
timeout="1h" # Task-level timeout
)
)
robust_scheduled_workflow = Workflow(
name="robust-scheduled",
config=WorkflowConfig(
starting_task=main_task,
schedule="0 3 * * *", # Run at 3 AM daily
retries=2, # Retry failed workflows
timeout="4h" # Allow sufficient time
)
)
```
## Troubleshooting Scheduled Workflows
### Common Issues
- **Timezone considerations**: Cron schedules use UTC by default
- **Resource conflicts**: Ensure scheduled workflows don't compete for resources
- **Long-running tasks**: Set appropriate timeouts for lengthy operations
- **Error handling**: Implement proper error handling and logging
---
## Trigger Workflows
Source: moose/workflows/trigger-workflow.mdx
Start workflows from events, APIs, or external triggers
# Trigger Workflows
## Overview
Moose workflows can be triggered programmatically from various sources including APIs, events, external systems, or manual execution. This enables you to build reactive data processing pipelines and on-demand task execution.
## Manual Workflow Execution
The simplest way to trigger a workflow is using the Moose CLI:
```bash filename="Terminal" copy
# Run a workflow manually
moose workflow run example
# Run with input parameters
moose workflow run example --input '{"name": "John", "email": "john@example.com"}'
```
### Passing Input to Workflows
When triggering workflows, you can pass input data that will be passed to the starting task:
```bash filename="Terminal" copy
moose workflow run data-processing --input '{
"sourceUrl": "https://api.example.com/data",
"apiKey": "your-api-key",
"batchSize": 100
}'
```
The input is parsed as JSON and passed to the workflow's starting task.
## API-Triggered Workflows
Trigger workflows directly via an HTTP POST endpoint exposed by the webserver.
- Endpoint: `/workflows/{workflowName}/trigger`
### Request
- Body: optional JSON payload passed to the workflow's starting task.
Example:
```bash filename="Terminal" copy
curl -X POST 'http://localhost:4000/workflows/data-processing/trigger' \
-H 'Content-Type: application/json' \
-d '{
"inputValue": "process-user-data",
"priority": "high"
}'
```
### Authentication
- Local development: no auth required.
- Production: protect the endpoint using an API key. Follow these steps:
1. Generate a token and hashed key (see the Token Generation section in the API Auth docs):
```bash filename="Terminal" copy
moose generate hash-token
# Outputs:
# - ENV API Key (hashed) → for environment/config
# - Bearer Token (plain) → for Authorization header
```
2. Configure the server with the hashed key:
```bash copy
MOOSE_CONSUMPTION_API_KEY=""
```
3. Call the endpoint using the plain Bearer token from step 1:
```bash filename="Terminal" copy
curl -X POST 'https://your-host/workflows/data-processing/trigger' \
-H 'Authorization: Bearer ' \
-H 'Content-Type: application/json' \
-d '{"inputValue":"process-user-data"}'
```
For details, see the API Auth page under “Token Generation” and “API Endpoints”.
### Response
```json filename="Response"
{
"workflowId": "data-processing-",
"runId": "",
}
```
In local development, the response also includes a `dashboardUrl` to Temporal UI:
```json filename="Response (dev)"
{
"workflowId": "data-processing-",
"runId": "",
"dashboardUrl": "http://localhost:8080/namespaces//workflows/data-processing-//history"
}
```
## Terminate a Running Workflow
After triggering a workflow, you can terminate it via an HTTP endpoint.
- Endpoint: `POST /workflows/{workflowId}/terminate`
### Request
- Local development (no auth):
```bash filename="Terminal" copy
curl -X POST 'http://localhost:4000/workflows/data-processing-/terminate'
```
- Production (Bearer token required):
```bash filename="Terminal" copy
curl -X POST 'https://your-host/workflows/data-processing-/terminate' \
-H 'Authorization: Bearer ''
```
---
## October 24, 2025
Source: release-notes/2025-10-24.mdx
Release notes for October 24, 2025
# October 24, 2025
* **New:** `moose migrate --clickhouse-url` enables serverless ClickHouse schema deploys
* **Improved:** JSON columns now accept dynamic payloads with fine-grained controls
## Serverless ClickHouse migrations with `moose migrate`
Run schema diffs and applies straight against a ClickHouse endpoint—perfect for OLAP-only or CI/CD environments that don’t boot the full Moose runtime.
```bash
# Detect changes and persist the migration plan
moose generate migration \
--clickhouse-url "https://user:pass@ch.serverless.dev/main" \
--save
# Apply the plan directly to ClickHouse
moose migrate --clickhouse-url "https://user:pass@ch.serverless.dev/main"
```
🐙 PR: ([#2872](https://github.com/514labs/moose/pull/2872)) | 📘 Docs: [Serverless ClickHouse migrations guide](/moose/migrate#serverless-deployments)
## Adaptive JSON columns with `__mooseJsonOptions`
Model semi-structured payloads while locking in typed paths for the fields you care about.
```typescript
export interface UserActivity {
id: Key;
metadata: {
userId: string;
sessionId: string;
__mooseJsonOptions: {
maxDynamicPaths: 256;
typedPaths: [
["userId", "String"],
["sessionId", "String"]
];
skipRegexps: ["^debug\\."];
};
};
}
```
🐙 PR:([#2887](https://github.com/514labs/moose/pull/2887)) | 📘 Docs: [Configurable JSON columns reference](/moose/data-modeling#configurable-json-columns)
## Moose
- **`moose migrate --clickhouse-url`** – Generate and apply migrations directly against hosted ClickHouse, ideal for OLAP-only or CI/CD workflows that run without the full Moose runtime. [Docs: Serverless ClickHouse migrations](/moose/migrate#serverless-deployments) | PRs [#2872](https://github.com/514labs/moose/pull/2872).
- **LLM-friendly docs & endpoints** – Framework pages expose TS/Py “LLM view” links and the CLI now serves `/llm-ts.txt` + `/llm-py.txt` for assistants that need scoped context. [Docs: LLM docs](/moose/llm-docs) | PRs [#2892](https://github.com/514labs/moose/pull/2892).
- **Flexible JSON columns** – `__mooseJsonOptions` lets models cap dynamic paths, pin typed paths, or skip keys/regexes so ingestion can accept evolving payloads without breaking typed reads. [Docs: Configurable JSON columns](/moose/data-modeling#configurable-json-columns) | PRs [#2887](https://github.com/514labs/moose/pull/2887).
- **Configurable `source_dir`** – `moose.config.toml` can point at `src/` (or any folder) instead of the default `app/`, simplifying adoption inside existing repos. [Docs: Custom source directory](/moose/configuration#custom-source-directory) | PRs [#2886](https://github.com/514labs/moose/pull/2886).
- **Array transforms can fan out events** – Transform functions that return arrays automatically emit one Kafka message per element, covering explode/normalize patterns without extra producers. [Docs: Array transforms](/moose/streaming/transform-functions#array-transforms) | PRs [#2882](https://github.com/514labs/moose/pull/2882).
- **ClickHouse modeling controls** – Table DSL now covers TTL per table/column, `sampleByExpression`, and fully configurable secondary indexes (type, args, granularity) so you can encode retention + performance plans directly in code. [Docs: TTL](/moose/olap/ttl) • [Docs: Sample BY](/moose/olap/schema-optimization#sample-by-expressions) • [Docs: Secondary indexes](/moose/olap/indexes) | PRs [#2845](https://github.com/514labs/moose/pull/2845), [#2867](https://github.com/514labs/moose/pull/2867), [#2869](https://github.com/514labs/moose/pull/2869).
- **`get_source` MCP tool** – AI assistants can resolve a Moose component (tables, APIs, streams) back to its source file for faster code navigation. [Docs: MCP get_source tool](/moose/mcp-dev-server#get_source) | PRs [#2848](https://github.com/514labs/moose/pull/2848).
- **Google Analytics v4 connector** – Service-account authenticated connector streams GA4 reports and realtime metrics into Moose pipelines so marketing data lands without bespoke ETL. [Docs: Connector reference](https://registry.moosestack.com/connectors/google-analytics) | PRs [registry#121](https://github.com/514-labs/registry/pull/121).
- **Connector registry APIs** – Public REST endpoints expose connector metadata, docs, schemas, and versions for catalogs or automation. [Docs: Registry API docs](https://registry.moosestack.com/docs/api) | PRs [registry#120](https://github.com/514-labs/registry/pull/120).
- **Onboarding & docs polish** – Quickstart, auth, materialized view, and config guides now call out install checkpoints, nullable column behavior, and when to prefer `moose.config.toml` over Docker overrides. [Docs: Quickstart](/moose/getting-started/quickstart) | PRs [#2903](https://github.com/514labs/moose/pull/2903), [#2894](https://github.com/514labs/moose/pull/2894), [#2893](https://github.com/514labs/moose/pull/2893), [#2890](https://github.com/514labs/moose/pull/2890).
- **Integer validation parity** – The ingest API enforces every ClickHouse integer range (Int8–UInt256) with clear errors, preventing silent overflows. [Docs: Ingest API](/moose/apis/ingest-api) | PRs [#2861](https://github.com/514labs/moose/pull/2861).
- **MCP watcher stability** – The MCP server now waits for file-system changes to settle before responding so IDE bots always read consistent artifacts. [Docs: MCP server](/moose/mcp-dev-server) | PRs [#2884](https://github.com/514labs/moose/pull/2884).
- **Release + schema compiler hardening** – Version detection ignores CI-only tags, and ClickHouse parsing handles ORDER BY around `PARTITION BY`, `TTL`, `SAMPLE BY`, and secondary indexes even when optional arguments are omitted. PRs [#2902](https://github.com/514labs/moose/pull/2902), [#2898](https://github.com/514labs/moose/pull/2898), [#2897](https://github.com/514labs/moose/pull/2897), [#2889](https://github.com/514labs/moose/pull/2889).
- **Proxy request fidelity** – Consumption APIs now preserve headers/body metadata end-to-end, keeping auth tokens and content negotiation intact. PRs [#2881](https://github.com/514labs/moose/pull/2881).
## Boreal
- **Navigation slug fix** – Visiting a 404 no longer strips the org ID from subsequent links, so multi-tenant operators stay on the right workspace. ([commercial#1014](https://github.com/514-labs/commercial/pull/1014))
---
## November 1, 2025
Source: release-notes/2025-11-01.mdx
Release notes for November 1, 2025
# November 1, 2025
- **New:** Support for ClickHouse table engines:
- [Buffer table engine](/moose/olap/model-table#in-memory-buffer-buffer): smooth ingest spikes before data lands in MergeTree ([ClickHouse docs](https://clickhouse.com/docs/en/engines/table-engines/special/buffer))
- [S3 table engine](/moose/olap/model-table#direct-s3-access-s3): keep data in S3 while ClickHouse reads and writes it on demand ([ClickHouse docs](https://clickhouse.com/docs/en/engines/table-engines/integrations/s3))
- Beta (self-hosted only): [Distributed table engine](/moose/olap/model-table#distributed-tables-distributed): serve cluster-wide queries through Moose-managed tables ([ClickHouse docs](https://clickhouse.com/docs/en/engines/table-engines/special/distributed))
- **Improved:** Serverless migrations gain Redis-backed state storage plus per-table databases.
- **Feature that will make a small number of people very happy:** Moose now has a flake.nix to let users install the cli via nix run github:514-labs/moosestack (also provides a dev shell to make contributing easier!)
## Buffer tables for burst protection
You can now model ClickHouse Buffer engines in MooseOLAP TypeScript and Python projects.
```ts filename="bufferTable.ts" copy
interface PaymentEvent {
eventId: Key;
amount: number;
capturedAt: Date;
}
);
```
PR: [#2908](https://github.com/514-labs/moosestack/pull/2908) | Docs: [Buffer engine](/moose/olap/model-table#in-memory-buffer-buffer) • [ClickHouse Buffer engine docs](https://clickhouse.com/docs/en/engines/table-engines/special/buffer)
## S3 tables for object storage
You can now model ClickHouse S3 engines in MooseOLAP TypeScript and Python projects. The CLI serializes engine settings, resolves runtime credentials, and enforces the S3 rule set (`PARTITION BY` allowed, `ORDER BY` rejected) so you can read or write datasets that live entirely in S3.
```ts filename="s3Archive.ts" copy
interface ArchivedOrder {
orderId: string;
status: string;
processedAt: Date;
}
);
```
PR: [#2908](https://github.com/514-labs/moosestack/pull/2908) | Docs: [S3 engine](/moose/olap/model-table#direct-s3-access-s3) • [ClickHouse S3 docs](https://clickhouse.com/docs/en/engines/table-engines/integrations/s3)
## Distributed tables for cluster fan-out
Beta (self-hosted only): Not supported on Boreal or ClickHouse Cloud.
You can now model ClickHouse Distributed engines in MooseOLAP TypeScript and Python projects. Plans capture the cluster, target database/table, and optional sharding key, while validation checks that the referenced local tables exist on every node before executing migrations.
```ts filename="distributedEvents.ts" copy
interface UserEvent {
userId: Key;
eventType: string;
occurredAt: Date;
}
);
```
PR: [#2908](https://github.com/514-labs/moosestack/pull/2908) | Docs: [Distributed engine](/moose/olap/model-table#distributed-tables-distributed) • [ClickHouse Distributed docs](https://clickhouse.com/docs/en/engines/table-engines/special/distributed)
## Redis state storage for serverless migrations
`moose generate migration` and `moose migrate` accept a Redis URL (flag or `MOOSE_REDIS_CONFIG__URL`) whenever `state_config.storage = "redis"`. The CLI resolves ClickHouse + Redis endpoints, acquires migration locks in Redis, and reuses the same builder across serverless tooling.
```bash filename="Terminal" copy
export MOOSE_CLICKHOUSE_CONFIG__URL="https://user:pass@ch.serverless.dev/main"
export MOOSE_REDIS_CONFIG__URL="redis://redis.example.com:6379"
moose migrate --clickhouse-url "$MOOSE_CLICKHOUSE_CONFIG__URL" --redis-url "$MOOSE_REDIS_CONFIG__URL"
```
PR: [#2907](https://github.com/514-labs/moosestack/pull/2907) | Docs: [Redis state storage](/moose/migrate#state-storage-options)
## Multi-database tables
Tables now carry a `database` field through the CLI, codegen, and infrastructure map. Moose will create any `additional_databases`, validate plans that attempt to move an existing table, and surface fully qualified names in `moose ls`.
```ts filename="auditLogs.ts" copy
interface AuditLog {
id: Key;
recordedAt: Date;
message: string;
}
);
```
PR: [#2876](https://github.com/514-labs/moosestack/pull/2876) | Docs: [Multi-database setup](/moose/olap/model-table#multi-database-setup)
## Moose
- **Buffer tables** – Burst-friendly Buffer engines ship with typed config, CLI validation, and template coverage. Docs: [Buffer engine](/moose/olap/model-table#in-memory-buffer-buffer) • [ClickHouse Buffer docs](https://clickhouse.com/docs/en/engines/table-engines/special/buffer) | PR [#2908](https://github.com/514-labs/moosestack/pull/2908)
- **S3 tables** – Direct object storage workflows stay in code with S3 engine support and credential handling. Docs: [S3 engine](/moose/olap/model-table#direct-s3-access-s3) • [ClickHouse S3 docs](https://clickhouse.com/docs/en/engines/table-engines/integrations/s3) | PR [#2908](https://github.com/514-labs/moosestack/pull/2908)
- **Distributed tables** – Cluster fan-out models emit the correct ClickHouse DDL and guard against missing local tables. Docs: [Distributed engine](/moose/olap/model-table#distributed-tables-distributed) • [ClickHouse Distributed docs](https://clickhouse.com/docs/en/engines/table-engines/special/distributed) | PR [#2908](https://github.com/514-labs/moosestack/pull/2908)
- **Serverless migrations stay coordinated** – Redis-backed locks and state storage plug into the existing `moose migrate` flow with env-var overrides for CI/CD. Docs: [Redis state storage](/moose/migrate#state-storage-options) | PR [#2907](https://github.com/514-labs/moosestack/pull/2907)
- **Per-table databases** – The migration planner now respects `database` overrides, auto-creates configured databases, and blocks accidental moves between them. Docs: [Multi-database setup](/moose/olap/model-table#multi-database-setup) | PR [#2876](https://github.com/514-labs/moosestack/pull/2876)
- **Runtime S3Queue credentials** – Environment variable markers resolve at deploy time for S3Queue sources, keeping AWS keys out of source. Docs: [Streaming from S3](/moose/olap/model-table#streaming-from-s3-s3queue) | PR [#2875](https://github.com/514-labs/moosestack/pull/2875)
## Boreal
- Blog redesign.
- Fixed the redirect loop after deleting an organization so users land back on the create-org screen instead of bouncing between routes.
## Nix development environment
`flake.nix` now bootstraps a full MooseStack build environment (`nix develop` drops you into a development shell that will have everything you need to build all the components of moosestack). If you use Nix, let us know!
Give it a go if you have nix installed on your machine with:
```bash
nix run github:514-labs/moosestack # -- to pass additional arguments to the cli
```
PR: [#2920](https://github.com/514-labs/moosestack/pull/2920)
---
## Release Notes
Source: release-notes/index.mdx
Moose / Sloan / Boreal Release Notes
# Release Notes
Welcome to the Moose, Sloan, and Boreal release notes. Here you'll find information about new features, improvements, and bug fixes for all our products.
## Latest Updates
* [November 1, 2025](/release-notes/2025-11-01)
* [October 24, 2025](/release-notes/2025-10-24)
## Installation
To get the latest versions of Moose and Sloan:
```bash
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose,sloan
```
## Products
Our release notes cover updates across three main products:
**Moose** - Build analytical backends with code-first infrastructure, including OLAP tables, streaming pipelines, and APIs.
**Sloan** - AI-powered MCP tools for automated data engineering and project setup.
**Boreal** - Our hosting platform for deploying and managing Moose applications.
Select a date from the sidebar to view detailed release notes for that period.
---
## release-notes/upcoming
Source: release-notes/upcoming.mdx
### Highlights
* **Moose and Sloan users** can now embed metadata into their Moose primitives, for use by users, their agents, and their metadata tools.
* **Sloan users** can read and write from **Databricks** (more coming here soon).
### Sloan MCP
#### [Experimental] Wrangler Agent—Databricks tools.
We've had a bunch of our users (especially in the enterprise) request deeper integration with Databricks. We've created MCP tooling to allow you to read from Databricks, and create new derivative Databricks managed tables.
Turn it on with `sloan config tools`, and adding `experimental-databricks-tools`. [Docs](https://docs.fiveonefour.com/sloan/tool-reference). To connect with your Databricks instance, you'll need to modify the relevant `MCP.json` file to add:
```JSON
"DATABRICKS_HOST": "[-].databricks.com",
"DATABRICKS_PATH": "/sql/1.0/warehouses/[-]",
"DATABRICKS_TOKEN": "[-]",
```
**This allows you to:**
* Read from Databricks tables
* Create queries and run them against Databricks tables
* Create new Databricks managed tables
**Future of the feature:**
* **Workflows V2**: We're working on bringing schemas into our workflow creation tools, our Databricks integration will be able to leverage these in interacting with Databricks.
* **DatabricksTable**: We're working on a new primitive that will allow you to create a Databricks table from a Moose primitive.
# Moose + Sloan
#### Descriptions in your Moose primitives
**Store context about why you are building what you are building, for you and your agents.**
Moose primitives can now include descriptions.
```ts
const acPipeline = new IngestPipeline(
"AircraftTrackingProcessed",
{
table: true,
stream: true,
ingest: false,
},
{ description: "Pipeline for ingesting raw aircraft data" } // new description field!
);
```
**Where is this used?** Sloan tools that create Moose primitives will now write the intention of the Moose primitive into the description field to give tools/agents that work with that primitive in the future more context. Practically, every description is now served to the tools as context too when the infra map is loaded up as context.
**Future of the feature:** Two main extensions of this feature are planned:
* Embedding the descriptions into the underlying infrastructure (e.g. as table level comments in ClickHouse)
* Extending the metadata:
* To be on a field level as well as a primitive level
* To include any arbitrary key value pair, not just a description (use this for managing labels like PII!)
---
## Data Collection Policy
Source: sloan/data-collection-policy.mdx
Data Collection Policy for Sloan
# Sloan Data Collection Policy
Sloan is in research preview. Accordingly, we collect usage data to improve the product. By default, we collect the most granular data on "ready to use" templates (`comprehensive` data collection), and we collect less information from the other templates (`standard` data collection).
You can change the data you share with us by changing the following:
```ts filename="sloan.config.toml"
enable_telemetry = "standard"
```
The available options are:
- `standard` - collects tool usage and success or error status.
- `comprehensive` - collects tool usage, success or error status, and parameters used by the tools.
- `off` - disables telemetry.
Example standard data collection:
```javascript filename="standard data collection" copy
{
distinctId: distinctIdForEvent,
event: 'sloan_mcp_tool_usage',
properties: {
tool_name: toolName,
success: !error,
telemetry_level: 'standard',
source: 'sloan_mcp_execution',
timestamp: new Date().toISOString()
}
}
```
Example comprehensive data collection:
```javascript filename="comprehensive data collection" copy
{distinctId: distinctIdForEvent,
event: 'sloan_mcp_tool_usage',
properties: {
tool_name: toolName,
success: !error,
telemetry_level: 'comprehensive',
source: 'sloan_mcp_execution',
timestamp: new Date().toISOString(),
parameters: args, // Full tool parameters
error_message: error?.message, // Detailed error information
error_stack: error?.stack, // Error stack trace
machine_id: machineId, // Machine identifier
os: process.platform, // Operating system
arch: process.arch, // Architecture
node_version: process.version // Node.js version
}
}
```
Our privacy policy is available [here](https://www.fiveonefour.com/legal/privacy.pdf).
---
## Sloan Demos
Source: sloan/demos.mdx
Templates and instructions for achieving common tasks with Sloan
# Sloan Demos
This page will provide a series of templates and instructions for achieving common tasks with Sloan.
All demo flows will assume that you have already installed the Sloan CLI and Moose CLI.
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) sloan,moose
```
---
## Context Demo -- Aircraft Metrics Definition
Source: sloan/demos/context.mdx
Learn how to use Sloan's context management to build knowledge-driven data applications
# Aircraft Metrics Definition
## Airspeed Metrics
**Ground Speed vs True Airspeed**: Ground speed (`gs`) represents the aircraft's speed relative to the ground, while true airspeed (`tas`) accounts for air density and temperature conditions. True airspeed calculation requires outside air temperature (`oat`) and pressure altitude data not currently available in our model.
**Indicated Airspeed (IAS)**: The airspeed reading from the aircraft's pitot-static system (`ias`), which differs from true airspeed based on altitude and atmospheric conditions. This metric requires direct airspeed sensor data not present in our current ADS-B feed.
## Climb/Descent Performance Metrics
**Vertical Speed**: Calculated using `baro_rate` (barometric rate) and `geom_rate` (geometric rate) to determine climb or descent performance. Positive values indicate climb, negative values indicate descent.
**Climb Efficiency**: Ratio of altitude gained to ground distance covered, calculated using altitude change (`alt_baro` or `alt_geom`) and position changes (`lat`, `lon`).
## Flight Phase Detection Metrics
**Takeoff Phase**: Identified by rapid altitude gain (`alt_baro` increasing) combined with increasing ground speed (`gs`) and high climb rate (`baro_rate` > 500 ft/min).
**Cruise Phase**: Characterized by stable altitude (minimal `baro_rate`), consistent ground speed (`gs`), and straight track (`track` changes < 5°).
**Approach Phase**: Detected by decreasing altitude (`baro_rate` < -300 ft/min), decreasing ground speed, and altitude below typical cruise levels.
**Landing Phase**: Final approach with very low altitude (`alt_baro` < 1000 ft), decreasing speed, and stable track toward runway.
## Signal Quality Metrics
**Signal Strength**: Direct measurement using `rssi` (Received Signal Strength Indicator) to assess reception quality.
**Data Freshness**: Calculated using `seen` (seconds since last message) and `seen_pos` (seconds since last position update) to determine data reliability.
**Message Frequency**: Messages per minute calculated from `messages` count and time window to assess tracking consistency.
## Position Accuracy Metrics
**Navigation Accuracy**: Composite score using `nic` (Navigation Integrity Category), `nac_p` (Navigation Accuracy Category - Position), and `nac_v` (Navigation Accuracy Category - Velocity) to determine positional reliability.
**Surveillance Accuracy**: Assessment using `sil` (Surveillance Integrity Level) and `sda` (System Design Assurance) to evaluate overall tracking quality.
## Flight Efficiency Metrics
**Great Circle Deviation**: Comparison of actual flight path (derived from sequential `lat`, `lon` coordinates) against the shortest great circle distance between origin and destination.
**Altitude Optimization**: Analysis of altitude profile against optimal flight levels for given aircraft type and distance.
**Speed Consistency**: Variance in ground speed (`gs`) throughout different flight phases to assess flight smoothness.
**Fuel Efficiency**: Calculated using fuel flow rate (`fuel_flow`) and ground speed to determine nautical miles per gallon. Requires engine performance data not available in our current dataset.
## Environmental & Weather Metrics
**Wind Speed & Direction**: Calculated by comparing true airspeed (`tas`) with ground speed (`gs`) and track changes. Requires true airspeed data and wind vector information (`wind_speed`, `wind_direction`) not present in our model.
**Turbulence Detection**: Identified through rapid changes in altitude (`alt_baro`) and track (`track`) combined with accelerometer data (`vertical_g_force`, `lateral_g_force`) not available in ADS-B transmissions.
**Weather Avoidance**: Analysis of flight path deviations around weather systems using onboard weather radar data (`weather_radar_returns`) and precipitation intensity (`precip_intensity`) not included in our current data model.
## Traffic Density & Separation Metrics
**Aircraft Density**: Count of aircraft within defined geographical boundaries using `lat`, `lon` coordinates and configurable radius.
**Separation Metrics**: Minimum distances between aircraft calculated using position data and altitude differences.
**Airspace Utilization**: Percentage of available airspace occupied by tracked aircraft at different altitude bands.
## Operational Metrics
**Emergency Detection**: Identification of emergency situations using `emergency` codes, `squawk` codes (7500, 7600, 7700), and `alert` flags.
**Autopilot Usage**: Analysis of autopilot engagement using navigation modes (`nav_modes`) and flight path consistency.
**Communication Quality**: Assessment based on transponder performance, message consistency, and data completeness across all available fields.
---
## Egress Demo
Source: sloan/demos/egress.mdx
Learn how to create data egress APIs using Sloan with a step-by-step TypeScript example
# Data Egress
## Create analytics APIs to serve your data
This demo will walk you through using Sloan tools to prompt your way to creating analytics APIs that serve the aircraft telemetry data you've ingested.
[Skip to prompt](#prompt-the-llm-to-create-a-geolocation-api) if you started with the ads-b template.
- **Sloan and Moose CLIs**: [Install them here](/sloan/getting-started/cursor)
- **OS**: macOS or Linux (WSL supported for Windows)
- **Docker Desktop/Engine**: [24.0.0+](https://docs.docker.com/get-started/get-docker/)
- **Node**: [version 20+](https://nodejs.org/en/download) (LTS recommended)
- **Anthropic API Key**: [Get one here](https://docs.anthropic.com/en/docs/initial-setup)
- **Client**: [Cursor](https://www.cursor.com/)
- **Completed**: [Ingest Demo](/sloan/demos/ingest) (or have data already in your system)
### Start with the ads-b template
```bash filename="Terminal" copy
sloan init egress-demo ads-b
```
```bash filename="Terminal" copy
cd egress-demo
npm install
```
### Run the Moose Dev Server
```bash filename="Terminal" copy
moose dev
```
### Open the project in Cursor
### Initialize the Sloan MCP
Navigate to `Cursor > Settings > Cursor Settings > Tools and Integrations` then toggle on the `Sloan MCP` tool.
### For best results, set the LLM to `claude-4-sonnet`
Gemini 2.5 and o3 are also reasonably good, but claude-4-sonnet has the most consistent results.
### Prompt the LLM to create a geolocation API
> Can you create an API that returns every aircraft within X miles of Y,Z coordinates.
### Action
The LLM should now use sloan tools to:
1. create the analytics API endpoint that calculates distance and filters aircraft
2. test the API to ensure it's working correctly with the existing aircraft data
You'll know it is succeeding if:
1. the LLM successfully tests the API
2. you can curl the API generated
3. you can see the generated openapi spec in `path/to/your/project/.moose/openapi.yaml`
### Continue this demo with:
* [Creating Materialized Views](/sloan/demos/mvs)
---
## Ingest Demo
Source: sloan/demos/ingest.mdx
Learn how to ingest data using Sloan with a step-by-step TypeScript example
# Ingest
## Ingest data periodically from an API
This demo will walk you through using Sloan tools to prompt your way to ingesting a whole bunch of aircraft telemetry data from adsb.lol.
[Skip to prompt](#prompt-the-llm-to-create-an-ingest) if you started with the ads-b template.
- **Sloan and Moose CLIs**: [Install them here](/sloan/getting-started/cursor)
- **OS**: macOS or Linux (WSL supported for Windows)
- **Docker Desktop/Engine**: [24.0.0+](https://docs.docker.com/get-started/get-docker/)
- **Node**: [version 20+](https://nodejs.org/en/download) (LTS recommended)
- **Anthropic API Key**: [Get one here](https://docs.anthropic.com/en/docs/initial-setup)
- **Client**: [Cursor](https://www.cursor.com/)
### Create and initialize a new typescript project
```bash filename="Terminal" copy
sloan init ingest-demo typescript-empty
```
```bash filename="Terminal" copy
cd ingest-demo
npm install
```
### Open the project in Cursor
### Run the Moose Dev Servers
```bash filename="Terminal" copy
moose dev
```
### Initialize the Sloan MCP
Navigate to `Cursor > Settings > Cursor Settings > Tools and Integrations` then toggle on the `Sloan MCP` tool.
### For best results, set the LLM to `claude-4-sonnet`
Gemini 2.5 and o3 are also reasonably good, but claude-4-sonnet has the most consistent results.
### Prompt the LLM to create an ingest
> I want to ingest data from the following aircraft transponder data api every 5 seconds for the purpose of creating a set of visualizations.
>
> API: @https://api.adsb.lol/v2/mil
>
> Docs: @https://api.adsb.lol/docs#/v2/v2_mil_v2_mil_get (note, I think the schema here is inaccurate, check out the data before you trust the docs).
>
> Can you execute on this?
The LLM might do a planning step, in which case, you can ask it to execute on the plan.
> Go for it!
### Action
The LLM should now use sloan tools to:
1. get sample data from the API using a temporary script
2. use that sample to create a schema that's used to create an ingest pipeline (ingest API, Redpanda stream, ClickHouse table)
3. create a Moose managed temporal workflow to periodically ingest the data
You'll know it is succeeding if you see dozens of events per second hit your Moose dev console.
### Continue this demo with:
* [Creating Egress APIs](/sloan/demos/egress)
* [Creating Materialized Views](/sloan/demos/mvs)
---
## Materialized Views Demo
Source: sloan/demos/mvs.mdx
Learn how to create materialized views using Sloan with a step-by-step TypeScript example
# Materialized Views
## Create materialized views to pre-aggregate your data, based on your egress API
This demo will walk you through using Sloan tools to prompt your way to creating materialized views that pre-aggregate the aircraft telemetry data you've ingested.
[Skip to prompt](#prompt-the-llm-to-create-a-materialized-view) if you started with the ads-b template.
- **Sloan and Moose CLIs**: [Install them here](/sloan/getting-started/cursor)
- **OS**: macOS or Linux (WSL supported for Windows)
- **Docker Desktop/Engine**: [24.0.0+](https://docs.docker.com/get-started/get-docker/)
- **Node**: [version 20+](https://nodejs.org/en/download) (LTS recommended)
- **Anthropic API Key**: [Get one here](https://docs.anthropic.com/en/docs/initial-setup)
- **Client**: [Cursor](https://www.cursor.com/)
- **Completed**: [Ingest Demo](/sloan/demos/ingest) (or have data already in your system)
### Start with the ads-b template
```bash filename="Terminal" copy
sloan init mvs-demo ads-b
```
```bash filename="Terminal" copy
cd mvs-demo
npm install
```
### Run the Moose Dev Server
```bash filename="Terminal" copy
moose dev
```
### Open the project in Cursor
### Initialize the Sloan MCP
Navigate to `Cursor > Settings > Cursor Settings > Tools and Integrations` then toggle on the `Sloan MCP` tool.
### For best results, set the LLM to `claude-4-sonnet`
Gemini 2.5 and o3 are also reasonably good, but claude-4-sonnet has the most consistent results.
### Create an egress API
You can skip this step if you've already completed the [Egress Demo](/sloan/demos/egress).
> Can you create an API that returns every aircraft within X miles of Y,Z coordinates.
If you prefer to implement the egress API manually, you can create the following analytics API:
```typescript filename="app/index.ts"
...
export * from './apis/getAircraftWithinRadius';
```
```typescript filename="app/apis/getAircraftWithinRadius.ts"
/**
* Parameters for the getAircraftWithinRadius API
*/
interface AircraftRadiusParams {
/** Latitude of the center point */
center_lat: number;
/** Longitude of the center point */
center_lon: number;
/** Radius in miles from the center point */
radius_miles: number;
/** Maximum number of results to return (optional) */
limit?: number & tags.Type<"int64"> & tags.Minimum<1> & tags.Maximum<1000>;
}
/**
* API to retrieve aircraft within a specified radius from given coordinates
* Uses ClickHouse's geoDistance function to calculate great circle distance
*/
) => {
// Execute the query with proper parameter handling
const result = await client.query.execute(sql`
SELECT
hex,
flight,
aircraft_type,
lat,
lon,
alt_baro,
gs,
track,
timestamp,
round(geoDistance(lon, lat, ${params.center_lon}, ${params.center_lat}) * 0.000621371, 2) as distance_miles
FROM AircraftTrackingProcessed
WHERE geoDistance(lon, lat, ${params.center_lon}, ${params.center_lat}) * 0.000621371 <= ${params.radius_miles}
ORDER BY distance_miles ASC
LIMIT ${params.limit || 100}
`);
return result;
},
{
metadata: {
description: "Returns all aircraft within a specified distance (in miles) from given coordinates"
}
}
);
```
### Prompt the LLM to create a materialized view
> Given the egress API, can you create a materialized view that improves the performance of the query?
### Action
The LLM should now use sloan tools to:
1. analyze the Moose project and the egress API
2. create a Materialized View that pre-aggregates the data
You'll know it is succeeding if:
1. the LLM successfully creates the materialized view
2. the LLM sees that the materialized view was created in the ClickHouse database
### Optional further prompts
> Can you create a new egress API that leverages the materialized view to improve the performance of the query?
> Can you test both APIs to see what the performance difference is?
> I want to see the difference in speed, number of rows read, amount of data read, compute, and other vectors you think are pertinent.
---
## Getting Started
Source: sloan/getting-started.mdx
Getting started guide for Sloan
CTACard,
CTACards,
ZoomImg,
ChipButton,
Columns,
Column,
FeatureCard,
FeatureGrid,
BulletPointsCard,
QnABullets,
CheckmarkBullets,
Icons,
} from "@/components";
## Getting Started
These guides will walk you through setting up Sloan MCP with your client of choice, using the CLI and using the MCP.JSON configuration file.
---
## Getting Started
Source: sloan/getting-started/claude.mdx
Getting started guide for Sloan
CTACard,
CTACards,
ZoomImg,
ChipButton,
Columns,
Column,
FeatureCard,
FeatureGrid,
BulletPointsCard,
QnABullets,
CheckmarkBullets,
Icons,
} from "@/components";
## Claude Desktop
### Install Claude Desktop
[Install the Claude Desktop application here](https://claude.ai/download). Note, the Pro version appears to work much more stably with MCPs.
### Install Moose and Sloan CLI
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) sloan,moose
```
### Configure Sloan MCP
Create a new project with Claude Desktop MCP preconfigured:
```bash filename="Terminal" copy
sloan init --mcp claude-desktop
```
For other options, see [Sloan CLI docs](/sloan/getting-started/sloan-cli).
```json filename="~/Library/Application Support/Claude/claude_desktop_config.json" copy
{
"mcpServers": {
"sloan": {
"args": [
"@514labs/sloan-mcp@latest",
"--moose-read-tools",
"path/to/your/moose/project"
],
"command": "npx",
"env": {
"ANTHROPIC_API_KEY": "",
"MOOSE_PATH": "path/to/your/moose/installation",
"NODE_PATH": "path/to/your/node/installation",
"PYTHON_PATH": "path/to/your/python/installation"
}
}
}
}
```
### Adding other toolsets
For more information on available toolsets, see [Sloan MCP toolsets](/sloan/reference/tool-reference). For Claude Desktop, we recommend the following toolsets:
* [Moose Read Tools](../sloan/reference/tool-reference#moose-read-tools): gives you chat access to your Moose project and the data within it (enabled by default)
* [Remote ClickHouse Tools](../sloan/reference/tool-reference#remote-clickhouse) (read only): gives you chat access to your remote ClickHouse data
### Using the MCP
1. Open the Claude Desktop application (note, you often have to reload the application after adding a new MCP)
2. If you are using Moose tools, you will need to run your moose dev server
### Warnings / Peculiarities
* You shouldn't use "write"/generative tools with Claude Desktop.
* Every time you add an MCP or change its configuration, you will need to reload the application.
* If you want to change the Moose Project that the Sloan MCP is referring to, manually edit the MCP.JSON file or run `sloan config focus` and select a new project.
### Common issues / troubleshooting
* The MCP is running, but you aren't able to get your data? Look at the tool call response, it will tell you if your Moose dev server is running. If it is not, run `moose dev` in your Moose project directory.
* The MCP is not running. Check your configuration and then restart the application.
## Reference
---
## Getting Started
Source: sloan/getting-started/cursor.mdx
Getting started guide for Sloan
CTACard,
CTACards,
ZoomImg,
ChipButton,
Columns,
Column,
FeatureCard,
FeatureGrid,
BulletPointsCard,
QnABullets,
CheckmarkBullets,
Icons,
} from "@/components";
## Cursor
### Install
[Install Cursor here](https://www.cursor.com/).
### Install Moose and Sloan CLI
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) sloan,moose
```
### Configure Sloan MCP
Create a new project with Cursor MCP preconfigured:
```bash filename="Terminal" copy
sloan init --mcp cursor-project
```
If you want to use this as a global Cursor MCP, use `cursor-global` instead of `cursor-project`.
For other options, see [Sloan CLI docs](/sloan/getting-started/sloan-cli).
```json filename="/path/to/your/project/.cursor/mcp.json" copy
{
"mcpServers": {
"sloan": {
"args": [
"@514labs/sloan-mcp@latest",
"--moose-read-tools",
"path/to/your/moose/project"
],
"command": "npx",
"env": {
"ANTHROPIC_API_KEY": "",
"MOOSE_PATH": "path/to/your/moose/installation",
"NODE_PATH": "path/to/your/node/installation",
"PYTHON_PATH": "path/to/your/python/installation"
}
}
}
}
```
### Adding other toolsets
For more information on available toolsets, see [Sloan MCP toolsets](/sloan/reference/tool-reference). All toolsets are available for Cursor.
### Using the MCP
1. Open Cursor
2. You will see a popup saying that an MCP is detected, and can be enabled. Our experience is that this is not always reliable, and the MCP is more stably launched if you go to `cursor > settings > cursor settings > tools and integrations` and enable the MCP there.
3. If you are using Moose tools, you will need to run your moose dev server with `moose dev`.
### Warnings / Peculiarities
* Every time you add an MCP or change its configuration, you will need to reload the MCP. You can do this by going to `cursor > settings > cursor settings > tools and integrations` and toggling the MCP off and on. If this doesn't work, you can also restart the Cursor application.
* If you have configured the MCP globally, and want to change the Moose Project that the Sloan MCP is referring to, manually edit the MCP.JSON file or run `sloan config focus` and select a new project.
### Common issues / troubleshooting
* The MCP is running, but you aren't able to get your data? Look at the tool call response, it will tell you if your Moose dev server is running. If it is not, run `moose dev` in your Moose project directory.
* The MCP is not running. Check your configuration and then restart the application.
## Reference
---
## Getting Started
Source: sloan/getting-started/other-clients.mdx
Getting started guide for Sloan
CTACard,
CTACards,
ZoomImg,
ChipButton,
Columns,
Column,
FeatureCard,
FeatureGrid,
BulletPointsCard,
QnABullets,
CheckmarkBullets,
Icons,
} from "@/components";
### Other clients
We are working on adding MCP support for other clients. If you are interested in using other clients, please [contact us](mailto:sloan@fiveonefour.com).
To try set up your client with Sloan MCP, use the MCP.JSON file within your client of choice.
### Configure Sloan MCP
**Configuration Object Naming**: Different MCP clients use different naming conventions for the server configuration object:
- **Cursor, Windsurf**: Uses `"mcpServers"`
- **Claude Desktop, VS Code**: Use `"servers"` instead
Make sure to check your specific client's documentation for the correct naming convention.
```json filename="MCP configuration" copy
{
"mcpServers": {
"sloan": {
"args": [
"@514labs/sloan-mcp@latest",
"--moose-read-tools",
"path/to/your/moose/project"
],
"command": "npx",
"env": {
"ANTHROPIC_API_KEY": "",
"MOOSE_PATH": "path/to/your/moose/installation",
"NODE_PATH": "path/to/your/node/installation",
"PYTHON_PATH": "path/to/your/python/installation"
}
}
}
}
```
```json filename="MCP configuration" copy
{
"servers": {
"sloan": {
"args": [
"@514labs/sloan-mcp@latest",
"--moose-read-tools",
"path/to/your/moose/project"
],
"command": "npx",
"env": {
"ANTHROPIC_API_KEY": "",
"MOOSE_PATH": "path/to/your/moose/installation",
"NODE_PATH": "path/to/your/node/installation",
"PYTHON_PATH": "path/to/your/python/installation"
}
}
}
}
```
### Adding other toolsets
For more information on available toolsets, see [Sloan MCP toolsets](/sloan/reference/tool-reference). If the client you are using is a chat client, we recommend the following toolsets:
* [Moose Read Tools](../sloan/reference/tool-reference#moose-read-tools): gives you chat access to your Moose project and the data within it (enabled by default)
* [Remote ClickHouse Tools](../sloan/reference/tool-reference#remote-clickhouse) (read only): gives you chat access to your remote ClickHouse data
If it is an IDE type client, all toolsets are available.
### Using the MCP
1. Open the Claude Desktop application (note, you often have to reload the application after adding a new MCP)
2. If you are using Moose tools, you will need to run your moose dev server
### Common issues / troubleshooting
* The MCP is running, but you aren't able to get your data? Look at the tool call response, it will tell you if your Moose dev server is running. If it is not, run `moose dev` in your Moose project directory.
* The MCP is not running. Check your configuration and then restart the application.
## Reference
---
## Getting Started
Source: sloan/getting-started/vs-code.mdx
Getting started guide for Sloan
CTACard,
CTACards,
ZoomImg,
ChipButton,
Columns,
Column,
FeatureCard,
FeatureGrid,
BulletPointsCard,
QnABullets,
CheckmarkBullets,
Icons,
} from "@/components";
## VS Code
- **VS Code**: [Install VS Code here](https://code.visualstudio.com/).
- **GitHub Copilot in VS Code**: [See VS Code docs](https://code.visualstudio.com/docs/copilot/setup)
### Install VS Code and enable MCPs
[Install VS Code here](https://code.visualstudio.com/).
[Enable MCPs](vscode://settings/chat.mcp.enabled) by toggling on the `chat.mcp.enabled` setting.
### Install Moose and Sloan CLI
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) sloan,moose
```
### Configure Sloan MCP
Create a new project with Claude Desktop MCP preconfigured:
```bash filename="Terminal" copy
sloan init --mcp vscode-project
```
If you want to use this as a global VS Code MCP, use `vscode-global` instead of `vscode-project`.
For other options, see [Sloan CLI docs](/sloan/getting-started/sloan-cli).
```json filename="/path/to/your/project/.cursor/settings.json" copy
{
"mcp": {
"input": [],
"servers": {
"sloan": {
"args": [
"@514labs/sloan-mcp@latest",
"--moose-read-tools",
"path/to/your/moose/project"
],
"command": "npx",
"env": {
"ANTHROPIC_API_KEY": "",
"MOOSE_PATH": "path/to/your/moose/installation",
"NODE_PATH": "path/to/your/node/installation",
"PYTHON_PATH": "path/to/your/python/installation"
}
}
}
}
}
```
### Adding other toolsets
For more information on available toolsets, see [Sloan MCP toolsets](/sloan/reference/tool-reference). All toolsets are available for Cursor.
### Using the MCP
[ ] TODO: Add instructions for running the MCP server in VS Code
### Warnings / Peculiarities
**Recommended Configuration Method**: While VS Code has a feature that allows you to use MCPs from other clients (like Claude Desktop), we strongly recommend using either the **Sloan CLI** or the **settings.json file** method shown above instead. These methods provide better reliability and configuration control specifically for VS Code environments.
### Common issues / troubleshooting
* The MCP is running, but you aren't able to get your data? Look at the tool call response, it will tell you if your Moose dev server is running. If it is not, run `moose dev` in your Moose project directory.
* The MCP is not running. Check your configuration and then restart the application.
## Reference
---
## Getting Started
Source: sloan/getting-started/windsurf.mdx
Getting started guide for Sloan
CTACard,
CTACards,
ZoomImg,
ChipButton,
Columns,
Column,
FeatureCard,
FeatureGrid,
BulletPointsCard,
QnABullets,
CheckmarkBullets,
Icons,
} from "@/components";
## Windsurf
### Install Windsurf
[Install Windsurf here](https://windsurf.com/).
### Install Moose and Sloan CLI
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) sloan,moose
```
### Configure Sloan MCP
Create a new project with Windsurf MCP preconfigured:
```bash filename="Terminal" copy
sloan init --mcp windsurf-global
```
For other options, see [Sloan CLI docs](/sloan/getting-started/sloan-cli).
```json filename="~/.codeium/windsurf/mcp_config.json" copy
{
"mcpServers": {
"sloan": {
"args": [
"@514labs/sloan-mcp@latest",
"--moose-read-tools",
"path/to/your/moose/project"
],
"command": "npx",
"env": {
"ANTHROPIC_API_KEY": "",
"MOOSE_PATH": "path/to/your/moose/installation",
"NODE_PATH": "path/to/your/node/installation",
"PYTHON_PATH": "path/to/your/python/installation"
}
}
}
}
```
### Adding other toolsets
For more information on available toolsets, see [Sloan MCP toolsets](/sloan/reference/tool-reference). All toolsets are available for Windsurf.
### Using the MCP
1. Open Windsurf
2. Run the MCP by going to `windsurf > settings > windsurf settings > cascade > Model Context Protocol (MCP) Servers` and enable the MCP there.
3. If you are using Moose tools, you will need to run your moose dev server with `moose dev`.
### Warnings / Peculiarities
* Every time you add an MCP or change its configuration, you will need to reload the MCP. You can do this by going to `windsurf > settings > windsurf settings > cascade > Model Context Protocol (MCP) Servers` and toggling the MCP off and on or refreshing the server. If this doesn't work, you can also restart the Windsurf application.
* If you have configured the MCP globally, and want to change the Moose Project that the Sloan MCP is referring to, manually edit the MCP.JSON file or run `sloan config focus` and select a new project.
### Common issues / troubleshooting
* The MCP is running, but you aren't able to get your data? Look at the tool call response, it will tell you if your Moose dev server is running. If it is not, run `moose dev` in your Moose project directory.
* The MCP is not running. Check your configuration and then restart the application.
## Reference
---
## Quickstart
Source: sloan/guides.mdx
Quickstart guide for Sloan
CTACard,
CTACards,
ZoomImg,
ChipButton,
Columns,
Column,
FeatureCard,
FeatureGrid,
BulletPointsCard,
QnABullets,
CheckmarkBullets,
Icons,
} from "@/components";
# Sloan Quickstart Guides
Sloan exposes tools to your LLM client for reading, exploring, and building on ClickHouse data—locally or in production. Get started with one of our quickstart guides below:
---
## sloan/guides/clickhouse-chat
Source: sloan/guides/clickhouse-chat.mdx
## Quickstart: AI Chat with ClickHouse
*Use your LLM client to explore your ClickHouse with Sloan MCP tools.*
This will walk you through using Sloan CLI to connect Sloan MCP tools to your ClickHouse database, allowing you to chat with your data in your client of choice. We'll use the ClickHouse Playground as our example database, but you can use any ClickHouse database.
- **OS**: macOS or Linux (WSL supported for Windows)
- **Node**: [version 20+](https://nodejs.org/en/download) (LTS recommended)
- **Client**: [Cursor](https://www.cursor.com/) or [Claude Desktop](https://claude.ai/download) or [Windsurf](https://windsurf.ai/download). For this particular use-case, we recommend Claude Desktop.
### Install Sloan CLI
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) sloan
```
### Configure your Sloan MCP
```bash filename="Terminal" copy
sloan connect clickhouse --connection-string "https://explorer:@play.clickhouse.com:443/?database=default" --mcp cursor-project
```
You need a ClickHouse connection URL. Format looks like this:
```
http://username:password@host:port/?database=database_name
```
Want to test without your own ClickHouse? Use the [ClickHouse Playground](https://clickhouse.com/docs/getting-started/playground) with the connection string above. It has sample datasets (read-only) you can experiment with.
```txt copy
https://explorer:@play.clickhouse.com:443/?database=default
```
1. Log into your [ClickHouse Cloud console](https://clickhouse.cloud/)
2. Go to your service details page
3. Find "Connect" or "Connection Details" section
4. Copy the HTTPS endpoint and your username/password
- Check your ClickHouse config file (usually `/etc/clickhouse-server/config.xml`)
- Look for `` (default: 8123) and `` (default: 8443)
- Check users config in `/etc/clickhouse-server/users.xml` or users.d/ directory
- Default user is often `default` with no password
- Check your docker-compose.yml or docker run command for environment variables
- Look for `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, `CLICKHOUSE_DB`
- Default is usually `http://default:@localhost:8123/?database=default`
- **Can't connect?** Try `curl http://your-host:8123/ping` to test connectivity
- **Authentication failed?** Verify username/password with `clickhouse-client --user=username --password=password`
- **Database not found?** Run `SHOW DATABASES` to see available databases
- **Permission denied?** Check user permissions with `SHOW GRANTS FOR username`
**Still stuck?** Check the [ClickHouse documentation](https://clickhouse.com/docs/en/getting-started/install) for your specific deployment method.
### Chat
Open your Claude Desktop client. We recommend starting the chat with a context setting question like "tell me about the data I have available to me in ClickHouse".
You can check that the MCP is correctly configured by looking at `claude > settings > developer > sloan`. It should say "running".
You can can also look at `search and tools` beneath the chat window, you should see `sloan` in the list of MCPs—if you click into it, you should see the tools that are enabled.
### What's next?
Try [creating a Moose Project from your ClickHouse database](https://docs.fiveonefour.com/sloan/quickstart/clickhouse-proj). That way, you can use Sloan MCP tools to create new primitives, like ingestion paths, data models, egress APIs, and more!
Or try [deploying your project to Boreal](https://www.fiveonefour.com/boreal), our hosting platform for Moose projects.
## Other Quickstart Guides
---
## sloan/guides/clickhouse-proj
Source: sloan/guides/clickhouse-proj.mdx
## Quickstart: AI analytics engineering from your ClickHouse
*Generate a local OLAP project from your ClickHouse deployment; Sloan MCP pre-configured for analytics engineering.*
This will walk you through creating a new local Moose project reflecting the structure of your ClickHouse database. It will allow you to add data to your local dev environment from your remote ClickHouse database, and use Sloan MCP tools to enrich your project with metadata, or create new Moose primitives that you can use in your project (e.g.egress APIs). We'll use the ClickHouse Playground as our example database, but you can use any ClickHouse database.
- **OS**: macOS or Linux (WSL supported for Windows)
- **Docker Desktop/Engine**: [24.0.0+](https://docs.docker.com/get-started/get-docker/)
- **Node**: [version 20+](https://nodejs.org/en/download) (LTS recommended)
- **Anthropic API Key**: [Get one here](https://docs.anthropic.com/en/docs/initial-setup)
- **Client**: [Cursor](https://www.cursor.com/) or [Claude Desktop](https://claude.ai/download) or [Windsurf](https://windsurf.ai/download). For this particular use-case, we recommend Claude Desktop.
### Install Moose and Sloan CLIs
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose,sloan
```
We'll be using generative MCP tools here, so make sure you add your Anthropic API key in install. If you installed without adding it, you can add it later with `sloan config keys anthropic `. If you need to create one, see: https://docs.anthropic.com/en/docs/initial-setup.
### Create a new Moose project from your ClickHouse database
```bash filename="Terminal" copy
sloan init my_project_name --from-remote 'https://explorer:@play.clickhouse.com:443/?database=default' --language python --mcp cursor-project
```
Want to test without your own ClickHouse? Use the [ClickHouse Playground](https://clickhouse.com/docs/getting-started/playground) with the connection string above. It has sample datasets (read-only) you can experiment with.
```txt copy
https://explorer:@play.clickhouse.com:443/?database=default
```
1. Log into your [ClickHouse Cloud console](https://clickhouse.cloud/)
2. Go to your service details page
3. Find "Connect" or "Connection Details" section
4. Copy the HTTPS endpoint and your username/password
- Check your ClickHouse config file (usually `/etc/clickhouse-server/config.xml`)
- Look for `` (default: 8123) and `` (default: 8443)
- Check users config in `/etc/clickhouse-server/users.xml` or users.d/ directory
- Default user is often `default` with no password
- Check your docker-compose.yml or docker run command for environment variables
- Look for `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, `CLICKHOUSE_DB`
- Default is usually `http://default:@localhost:8123/?database=default`
- **Can't connect?** Try `curl http://your-host:8123/ping` to test connectivity
- **Authentication failed?** Verify username/password with `clickhouse-client --user=username --password=password`
- **Database not found?** Run `SHOW DATABASES` to see available databases
- **Permission denied?** Check user permissions with `SHOW GRANTS FOR username`
**Still stuck?** Check the [ClickHouse documentation](https://clickhouse.com/docs/en/getting-started/install) for your specific deployment method.
This will create a new Moose project from your ClickHouse database. [See Moose docs](https://docs.fiveonefour.com/moose) for more information about the project structure, and how it spins up your local development environment (including a local ClickHouse database).
The new project is called "my_project_name" and is created in the current directory. the string after `--from-remote` is the connection string to your ClickHouse database, structured as `clickhouse://:@:/` (note, the ClickHouse Playground has no password).
### Install dependencies and run the dev server
Before you can run Moose's local dev server, Docker Desktop must be running.
Navigate into the project directory:
```bash filename="Terminal" copy
cd my_project_name
```
Install the dependencies:
```bash filename="Terminal" copy
npm i
```
Run the dev server:
```bash filename="Terminal" copy
moose dev
```
### Get sample data
```bash filename="Terminal" copy
moose seed clickhouse --connection-string clickhouse://explorer:@play.clickhouse.com:9440/default --limit 100
```
This will seed your local ClickHouse database with 100 rows of sample data from your remote ClickHouse database—here, the ClickHouse Playground. You can change the number of rows with the `--limit` flag.
This will improve the context provided to Sloan's MCP tools, and make it easier to validate analytic engineering tasks.
### Set up your Client
The `sloan init` command above configured Cursor to use Sloan MCP tools. You can check this by opening Cursor and looking at `cursor > settings > cursor settings > MCP` menu. You should see `sloan` in the list of MCPs, alongside a list of tools.
You may need to enable the MCP. Once you do so, you should see a green 🟢 status indicator next to it.
If you would like to use a different client, you can use the following command from within the project directory:
```bash filename="Terminal" copy
sloan setup --mcp
```
### Enrich project with metadata [coming soon]
Since we have a Moose project with sample data and some metadata, we can use this to create more metadata!
If we ask our client "Can you add a description to each Moose primitive in this project?", the LLM will use the `write_metadata` tool to add a description to each Moose primitive.
```TypeScript filename="my_project_name/index.ts"
const acPipeline = new IngestPipeline(
"AircraftTrackingProcessed",
{
table: true,
stream: true,
ingestApi: false,
metadata: {
description: "Pipeline for ingesting raw aircraft data" } // new description field!
}
);
```
### Chat with your data
You can also now just chat with your client about your data! Try asking "Look at my MTA data in ClickHouse, tell me about the trains that ran in the last 24 hours."
The client will use `read_moose_project`, `read_clickhouse_tables` and maybe `read_production_clickhouse` to answer your question.
### Create new Egress APIs with Sloan MCP tools
If you find a thread that you find interesting enough to want to productionize, try asking the client "can you create an egress API to furnish that data?"
The client will use `create_egress_api` and `test_egress_api` to create an egress API primitives in Moose, that will automatically deploy in your local dev environment when you save.
### What's next?
Try adding new [ingestion scripts, data models, or materialized views to your project using Sloan's experimental tools](https://docs.fiveonefour.com/sloan/reference/tool-reference#experimental-moose-tools)!
## Other Quickstart Guides
---
## sloan/guides/from-template
Source: sloan/guides/from-template.mdx
## Quickstart: AI powered OLAP templates
*Bootstrap a complete OLAP pipeline with a Moose template; with Sloan's AI tools already set up for you.*
This will get you started with a Moose data engineering project ingesting Aircraft Transponder data that you can use to learn about Sloan's Analytics Engineering MCP toolset.
- **OS**: macOS or Linux (WSL supported for Windows)
- **Docker Desktop/Engine**: [24.0.0+](https://docs.docker.com/get-started/get-docker/)
- **Node**: [version 20+](https://nodejs.org/en/download) (LTS recommended)
- **Anthropic API Key**: [Get one here](https://docs.anthropic.com/en/docs/initial-setup)
- **Client**: [Cursor](https://www.cursor.com/) or [Claude Desktop](https://claude.ai/download) or [Windsurf](https://windsurf.ai/download). For this particular use-case, we recommend Claude Desktop.
### Install Sloan and Moose CLIs
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) sloan,moose
```
We'll be using generative MCP tools here, so make sure you add your Anthropic API key in install. If you installed without adding it, you can add it later with `sloan config keys anthropic `. If you need to create one, see: https://docs.anthropic.com/en/docs/initial-setup.
### Create a new Moose project from the ADS-B template
```bash filename="Terminal" copy
sloan init ads-b
```
This will create a new Moose project using the ADS-B template to gather ADS-B (aircraft transponder) data that you can use to explore Sloan's MCP offerings. By default, it will create the project configured for use with Cursor (by creating `~/.cursor/mcp.config`), but if you would like to use Claude Desktop, append `--mcp claude-desktop`.
If you want to create an empty project, and build your own Data Models and ingestion, try `sloan init typescript-empty` or `sloan init python-empty`
### Install dependencies and run the dev server
Navigate into the created project directory:
```bash filename="Terminal" copy
cd
```
Install the dependencies:
```bash filename="Terminal" copy
npm i
```
Run the dev server:
```bash filename="Terminal" copy
moose dev
```
### Set up your client: open Cursor and Enable the MCPs
Then open your code editor (e.g. by `cursor .`).
Cursor should prompt you to enable the MCP. If it doesn't, go to `cursor > settings > cursor settings > MCP` and enable the MCP called "sloan". Note, the tools will not all work until the dev server is run locally! Note, you might need to refresh the MCP until its status indicator shows 🟢.
### Start Ingesting Data
Run the command to start ingesting data with the configured ingest scripts: `moose workflow run military_aircraft_tracking`
You should start to see hundreds of live datapoints ingesting instantly!
### Enrich project with metadata [coming soon]
Since we have a Moose project with sample data and some metadata, we can use this to create more metadata!
If we ask our client "Can you add a description to each Moose primitive in this project?", the LLM will use the `write_metadata` tool to add a description to each Moose primitive.
```TypeScript filename="my_project_name/index.ts"
const acPipeline = new IngestPipeline(
"AircraftTrackingProcessed",
{
table: true,
stream: true,
ingestApi: false,
metadata: {
description: "Pipeline for ingesting raw aircraft data" } // new description field!
}
);
```
### Chat with your data
You can also now just chat with your client about your data! Try asking "What aircraft are listed in the data I have available."
The client will use `read_moose_project`, `read_clickhouse_tables` and maybe `read_production_clickhouse` to answer your question.
### Create new Egress APIs with Sloan MCP tools
If you find a thread that you find interesting enough to want to productionize, try asking the client "can you create an egress API to furnish that data?"
The client will use `create_egress_api` and `test_egress_api` to create an egress API primitives in Moose, that will automatically deploy in your local dev environment when you save.
## Other Quickstart Guides
---
## Sloan - Automated Data Engineering
Source: sloan/index.mdx
AI-powered tools and agents exposed through an easy to configure MCP server for data engineering and infrastructure
# Welcome to Sloan
Install [Sloan](/sloan/reference/cli-reference) and [Moose](/moose/reference/cli-reference) CLIs:
```
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) sloan,moose
```
Quick Start →
Start from ClickHouse
See Examples
Tools Reference
## What is Sloan?
Sloan is a set of tools to make your chat, copilot or BI fluent in data engineering. Use the CLI to set up MCPs with the tools you need in the clients you use. Create new data engineering projects with a moose-managed ClickHouse based infrastructure or use these agents with your existing data infrastructure.
## Quickstart Guides
## Core features
Sloan offers a comprehensive suite of MCP tools and agents designed to streamline data engineering workflows, enabling faster and more efficient deployment and management.
3 steps in CLI to chat with ClickHouse
5 minutes to build a full stack OLAP project
Template based new projects for building your own infrastructure
}
variant="sloan"
/>
Data engineering in your IDE
Data analytics and analytics engineering in your chat client
BI in your BI tool of choice
}
variant="sloan"
/>
Opinionated OLAP deployments with Moose: Optimized ClickHouse development and deployment
Direct integration with your architecture: DuckDB, Snowflake, Databricks
Integration with your enterprise: metadata, CICD, logging and more
}
variant="sloan"
/>
Full-stack context: code, logs, data, docs
Self-improving feedback loops
Embedded metadata for continuity
}
variant="sloan"
/>
Each agent: context gathering → implementation → testing → doc workflows
Governance defaults, easily configurable: SDLC, data quality, reporting, privacy and security default practices
Learns your policies with minimal context, enforces them automatically
}
variant="sloan"
/>
## Why Sloan exists
### The DIY Approach
### The Sloan Approach
## What jobs can you do with Sloan?
Ad hoc analytics
Give your LLM client a way to chat with your data, in ClickHouse, Databricks, Snowflake, or wherever it lives
Analytics Engineering
Agents that can build new data products, materialized views and egress methods
Data Engineering
Have agents build and test end to end data pipelines
Data Wrangling
Agents that interact with your data systems like DuckDB, Databricks, Snowflake and more to create scripts, clean data, and prepare data for use
Data Migration
Automated creation of data pipelines to migrate data from legacy systems to a modern data backend
Data quality, governance and reporting
Agents that can help you enforce data quality, governance and reporting during development and in run time
## Getting Involved / Giving Feedback / Community
---
## CLI Reference
Source: sloan/reference/cli-reference.mdx
CLI Reference for Sloan
# CLI Reference
The Sloan CLI is a tool we created to facilitate the easy setup of Sloan MCPs. It is not required, and we hope that LLM clients will eventually support all these features natively, so that we can remove the CLI.
It allows you to:
- set up your Sloan MCP with a variety of clients
- manage multiple projects
- manage multiple toolsets
- update multiple MCP configurations at once
## Install CLI
```bash filename="Terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) sloan
```
## Commands
### init
Creates a data engineering project with Moose, with Sloan MCP preconfigured.
```bash filename="Terminal" copy
sloan init <--mcp > <--location >
```
",
required: true,
description: "Name of your application (this will be the ).",
examples: ["e.g. my-app"]
},
{
name: "--mcp ",
description: "Choice of which MCP host to use.",
examples: ["default: cursor-project", "other options: cursor-global", "claude-desktop", "windsurf-global"]
},
{
name: "--location ",
description: "Location of your app or service. The default is the name of the project.",
examples: ["e.g. my-app"]
},
{
name: "--no-fail-already-exists",
description: "By default, the init command fails if `location` exists, to prevent accidental reruns. This flag disables the check."
}
]}
/>
",
required: true,
description: "The template you are basing your application on.",
examples: ["typescript-empty", "typescript", "ads-b"]
}
]}
/>
",
required: true,
description: "The connection string to your ClickHouse database.",
examples: ["e.g. clickhouse://user:pass@host:port/db"]
},
{
name: "--language ",
required: true,
description: "The language of your application.",
examples: ["typescript", "python"]
}
]}
/>
### connect
Connects to an existing data source. Currently only `clickhouse` is supported.
```bash filename="Terminal" copy
sloan connect clickhouse <--connection-string > <--mcp >
```
",
required: true,
description: "The connection string to your ClickHouse database.",
examples: ["e.g. clickhouse://user:pass@host:port/db"]
},
{
name: "--mcp ",
description: "Choice of which MCP host to use.",
examples: ["default: cursor-project", "other options: cursor-global", "claude-desktop", "windsurf-global"]
}
]}
/>
### setup
Takes an existing data engineering project build with Moose and configures Sloan MCP for it.
```bash filename="Terminal" copy
sloan setup [path] <--mcp >
```
",
description: "Choice of which MCP host to use. If flag is not provided, the default is `cursor-project`.",
examples: ["default: cursor-project", "other options: claude-desktop", "cursor-global", "cursor-project", "windsurf-global"]
},
{
name: "--",
description: "Setup Sloan MCP to already have access to Moose read tools",
examples: ["--moose-read-tools", "--moose-write-tools", "--remote-clickhouse-tools"]
}
]}
/>
### config
Configure Sloan settings
#### config focus
Sloan allows you to configure project level MCPs (e.g. for Cursor) and global MCPs (e.g. for Claude Desktop). To configure which Sloan project is being used with the Global MCPS, use `sloan config focus`.
```bash filename="Terminal" copy
sloan config focus
```
#### config keys
Updates all MCP files for projects listed in ~/.sloan/sloan-config.toml to use updated API key.
```bash filename="Terminal" copy
sloan config keys
```
#### config tools
Toggles availability of experimental MCP tools. See [Tools Reference](/sloan/reference/tool-reference).
```bash filename="Terminal" copy
sloan config tools
```
Note: if you select `remote-clickhouse`, you will need to add your ClickHouse Cloud / Boreal credentials to `mcp.json`.
---
## Getting Started
Source: sloan/reference/mcp-json-reference.mdx
Getting started guide for Sloan
CTACard,
CTACards,
ZoomImg,
ChipButton,
Columns,
Column,
FeatureCard,
FeatureGrid,
BulletPointsCard,
QnABullets,
CheckmarkBullets,
Icons,
} from "@/components";
## Configuring Sloan MCP with MCP.JSON
- **Node**: [version 20+](https://nodejs.org/en/download) (LTS recommended)
- **Anthropic API key**: [Get your API key from Anthropic](https://docs.anthropic.com/en/docs/initial-setup). This is required for using Sloan's generative MCP tools.
- **Moose CLI** and Moose Project required for "write" tools.
### Where is your MCP.JSON file?
Your MCP.JSON file is located in a different location depending on your client of choice. The below is a list of default locations for the MCP.JSON file for our supported clients.
* [Claude Desktop](https://modelcontextprotocol.io/quickstart/user): `~/Library/Application Support/Claude/claude_desktop_config.json` (note, you can create this by going to `claude > settings > Developer > Edit Config`)
* [Cursor (global MCP)](https://docs.cursor.com/context/model-context-protocol#configuration-locations): `~/.cursor/settings/mcp.json`
* [Cursor (project MCP)](https://docs.cursor.com/context/model-context-protocol#configuration-locations): `/path/to/your/project/.cursor/mcp.json`
* [Windsurf](https://docs.windsurf.com/windsurf/cascade/mcp#configuring-mcp-tools): `~/.codeium/windsurf/mcp_config.json`
For other clients, look to their documentation for where their MCP.JSON file is located.
### Adding Sloan MCP to your MCP.JSON file
**Configuration Object Naming**: Different MCP clients use different naming conventions for the server configuration object:
- **Cursor, Windsurf**: Uses `"mcpServers"`
- **Claude Desktop, VS Code**: Use `"servers"` instead
Make sure to check your specific client's documentation for the correct naming convention.
```json filename="MCP configuration" copy
{
"mcpServers": {
"sloan": {
"args": [
"@514labs/sloan-mcp@latest",
"--moose-read-tools",
"path/to/your/moose/project"
],
"command": "npx",
"env": {
"ANTHROPIC_API_KEY": "",
"MOOSE_PATH": "path/to/your/moose/installation",
"NODE_PATH": "path/to/your/node/installation",
"PYTHON_PATH": "path/to/your/python/installation"
}
}
}
}
```
```json filename="MCP configuration" copy
{
"servers": {
"sloan": {
"args": [
"@514labs/sloan-mcp@latest",
"--moose-read-tools",
"path/to/your/moose/project"
],
"command": "npx",
"env": {
"ANTHROPIC_API_KEY": "",
"MOOSE_PATH": "path/to/your/moose/installation",
"NODE_PATH": "path/to/your/node/installation",
"PYTHON_PATH": "path/to/your/python/installation"
}
}
}
}
```
### Arguments / Environment Variables
### Adding tool sets to your MCP.JSON file
In general, we recommend using read-only tool-sets (`moose-read-tools` and `remote-clickhouse`) with chat type clients (like Claude Desktop)
For information about toolsets, see [tools reference](../sloan/reference/tool-reference).
```json filename="MCP.JSON" copy
"args": [
...
"--moose-read-tools",
]
```
[Moose read tools documentation](../sloan/reference/tool-reference#moose-read-tools)
```json filename="MCP.JSON" copy
"args": [
...
"--moose-read-tools",
"--moose-write-tools",
]
```
[Moose write tools documentation](../sloan/reference/tool-reference#moose-write-tools)
```json filename="MCP.JSON" copy
"args": [
...
"--remote-clickhouse",
...
]
"env": {
...
"BOREAL_CLICKHOUSE_HOST": "...",
"BOREAL_CLICKHOUSE_PORT": "...",
"BOREAL_CLICKHOUSE_USER": "...",
"BOREAL_CLICKHOUSE_PASSWORD": "...",
"BOREAL_CLICKHOUSE_DATABASE": "...",
}
```
[Remote Clickhouse read tools documentation](../sloan/reference/tool-reference#remote-clickhouse)
---
## Tool Reference
Source: sloan/reference/tool-reference.mdx
MCP tools and external tools for Sloan
FeatureCard,
FeatureGrid,
Icons,
ToggleBlock,
} from "@/components";
# Tool Reference
Sloan CLI can allow you to use various Sloan MCP tool-sets, as well as install and configure certain external tools. See [CLI reference](../cli-reference.mdx) for more information. You can also configure the toolsets available in your Sloan MCP by manually editing your MCP.json file. See [MCP.json reference](../mcp-json-reference.mdx) for more information.
Tools are grouped by their purpose into toolsets.
The fewer tools you enable, the higher performance the agent using the tool set will have, especially with respect to tool selection.
## Toolsets
## Read Only Moose tools
Tools needed to read the Moose project and its associated infrastructure.
### Activate this toolset
These tools are enabled by default.
To enable them explicitly:
Run the following command:
```bash filename="Terminal" copy
sloan config tools
```
and enable `moose-read-tools`. e.g.:
```txt filename="Terminal" copy
? Select tools to enable (no selection defaults to moose-read-tools):
[x] moose-read-tools - Enable moose read tools for data inspection
[ ] moose-write-tools - Enable moose write tools for full functionality (requires API key, auto-enables read tools)
[ ] remote-clickhouse-tools - Enable Remote Clickhouse integration
```
If you are managing the Sloan MCP with `mcp.json`, you can enable this toolset by adding the following to your `mcp.json`:
```json filename="MCP.json" copy
"args": [
...
"--moose-read-tools",
...
]
```
### When is this toolset useful?
When you need to understand what data exists in your Moose project and how it's structured.
**Sample prompts:**
- *"Tell me about the data in my Moose project"*
- *"Tell me about the data in my analytics table"*
- *"Tell me about the data streams in my Moose project"*
- *"Describe my DLQ topic"*
- *"Tell me about the data in my enrichment data stream"*
**Tools used:** `read_moose_project`, `read_clickhouse_tables`, `read_redpanda_topic`
When you need to verify data flow and check if your data transformations are working correctly. *(Also requires `write-moose-tools`)*
**Sample prompts:**
- *"Did data land in my analytics table?"*
- *"Did the data in my analytics table change?"*
- *"Did my materialized view update?"*
- *"Show me recent data in my user_events stream"*
**Tools used:** `read_clickhouse_tables`, `read_redpanda_topic`, `read_moose_project`
When you need to check the health and status of your local Moose development environment.
**Sample prompts:**
- *"Is my Moose development server running?"*
- *"Is my Moose development server healthy?"*
- *"Did my new workflow kill my local dev server?"*
- *"What's the status of my Moose infrastructure?"*
**Tools used:** `check_moose_status`, `read_moose_project`
### Tools
#### `read_moose_project`
Retrieves an infrastructure map from the local Moose development server. Retrieves a list of all primitives in the project (e.g. Data Models, Workflows, Streaming Functions, Materialized Views, APIs), as well as their associated infrastructure (e.g. ClickHouse tables, Redpanda topics, etc.).
#### `read_clickhouse_tables`
Queries local ClickHouse.
#### `read_redpanda_topic`
Reads from local Redpanda.
#### `check_moose_status`
Checks the status of the Moose project. Useful for debugging.
## Write Moose tools
Tools needed to create and test Moose primitives. These are used to ingest data, transform data and create egress patterns for data in your project.
### Activate this toolset
These tools are enabled by default (except for in Claude Desktop, where these tools aren't recommended).
These tools require that the `read-only-moose-tools` toolset is enabled.
To enable them explicitly:
Run the following command:
```bash filename="Terminal" copy
sloan config tools
```
and enable `moose-write-tools`. e.g.:
```txt filename="Terminal" copy
? Select tools to enable (no selection defaults to moose-read-tools):
[ ] moose-read-tools - Enable moose read tools for data inspection
[x] moose-write-tools - Enable moose write tools for full functionality (requires API key, auto-enables read tools)
[ ] remote-clickhouse-tools - Enable Remote Clickhouse integration
```
If you are managing the Sloan MCP with `mcp.json`, you can enable this toolset by adding the following to your `mcp.json`:
```json filename="MCP.json" copy
"args": [
...
"--moose-write-tools",
...
]
```
### Usage tips
Consider the flow of data through your Moose project when creating new primitives. For example, if you want to create a streaming function, the `write_stream_function` tool will have a much better chance of success if there is a data sample, a source and a destination data model in place. See [Moose Docs](../../moose) for best practices.
Many of the tools have testing corollaries (and for the ones that don't, there are general purpose tools that can allow the agent to test the created primitive, like `read_moose_project`, `read_clickhouse_tables`, `read_redpanda_topic`). Use those testing tools to ensure that the agent is doing what you intend it to do.
Step by step prompting (e.g. "Get sample data from X API", "Create an ingest data model", "Create a workflow to periodically grab data from the API", ...) has a higher success rate than trying to one-shot the entire project creation flow.
### When is this toolset useful?
When you need to build the core components of your data infrastructure.
**Sample prompts:**
- *"Create a new data model for the following API source: [API docs/schema]"*
- *"Create a new workflow to ingest data from this API source every 5 minutes"*
- *"Create a streaming function to enrich data from my user_events stream with user profile data"*
- *"Create a materialized view that will improve the performance of the aircraft API"*
**Tools used:** `write_ingestion_pipeline`, `write_workflow`, `write_stream_function`, `write_materialized_view`
When you want to build a complete data product from ingestion to analytics.
**Sample prompts:**
- *"Create a new Moose project that ingests GitHub star data, enriches it with repository metadata, and creates APIs for analytics dashboards"*
- *"Build a real-time analytics system for aircraft tracking data with location-based queries"*
- *"Create a data pipeline that processes IoT sensor data and provides aggregated insights via REST APIs"*
**Tools used:** `write_spec`, `write_ingestion_pipeline`, `write_workflow`, `write_stream_function`, `write_materialized_view`, `create_egress_api`
When you want to create optimized data products for specific use cases.
**Sample prompts:**
- *"Create a data product for visualizing user engagement metrics with real-time updates"*
- *"Build materialized views that improve query performance for my analytics dashboard"*
- *"Create analytics APIs that support my frontend application's data needs"*
**Tools used:** `write_materialized_view`, `create_egress_api`, `test_egress_api`
When you need to fix data issues or improve data processing reliability.
**Sample prompts:**
- *"Look at my DLQ topic and create a streaming function to fix the data format issues"*
- *"Create a data validation workflow that checks data quality before processing"*
- *"Build error handling for my data ingestion pipeline"*
**Tools used:** `read_redpanda_topic`, `write_stream_function`, `write_workflow`
When you want to improve the speed and efficiency of your data products.
**Sample prompts:**
- *"Create materialized views that will improve the performance of my user analytics APIs"*
- *"Analyze query patterns and optimize my data model for better performance"*
- *"Create pre-computed aggregations for my most common dashboard queries"*
**Tools used:** `write_materialized_view`, `create_egress_api`, `test_egress_api`
### Tools
#### `write_spec`
Generates a plan for an end to end Moose project (from ingest, through transformation and egress).
Files created:
* *Creates a .md file in the current directory with the plan.*
```txt filename="/project/app/specs/specification.md" copy
# Technical Specification: ASD-B Aircraft Data Ingestion and Visualization
## 1. Overview
This specification outlines the implementation of a system to ...
```
#### `write_and_run_temp_script`
Creates and runs a temporary script, usually either for sampling or for API inspection.
Files created:
* *Creates a temporary script in the /project/.moose/sloan directory, and runs it, usually creating a sample data file in the same directory.*
```js filename="/project/.moose/sloan-scratch/script.js" copy
const ADSB_API_URL = 'https://api.adsb.lol/v2/mil';
const INGEST_URL = 'http://localhost:4000/ingest/AircraftTransponder';
const FETCH_INTERVAL = 5000; // 5 seconds
const MAX_RUNTIME = 30000; // 30 seconds
async function fetchAndIngestData() {
const startTime = Date.now();
const processedAircraft = [];
while (Date.now() - startTime < MAX_RUNTIME) {
try {
// Fetch data from ADSB API
const response = await fetch(ADSB_API_URL);
const data = await response.json();
if (data && data.ac && Array.isArray(data.ac)) {
// Process each aircraft
for (const aircraft of data.ac) {
// Ingest aircraft data
const ingestResponse = await fetch(INGEST_URL, {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify(aircraft)
});
if (ingestResponse.ok) {
processedAircraft.push({
icao: aircraft.hex,
timestamp: new Date().toISOString(),
status: 'ingested'
});
}
}
}
} catch (error) {
// Let errors crash the script
throw error;
}
// Wait for the next interval if we haven't exceeded MAX_RUNTIME
if (Date.now() - startTime + FETCH_INTERVAL < MAX_RUNTIME) {
await new Promise(resolve => setTimeout(resolve, FETCH_INTERVAL));
} else {
break;
}
}
return {
totalProcessed: processedAircraft.length,
processedAircraft,
startTime: new Date(startTime).toISOString(),
endTime: new Date().toISOString()
};
}
// Run the script and write results to output.json
const result = await fetchAndIngestData();
await fs.writeFile('output.json', JSON.stringify(result, null, 2));
```
```json filename="/project/.moose/sloan-scratch/output.json" copy
{
"ac": [
{
"hex": "ae63c1",
"type": "tisb_icao",
...remaining fields
},
]
}
```
#### `write_workflow`
Creates a Moose Workflow, typically run on a schedule, typically used to ingest data. [Moose documentation](../../moose/building/workflows.mdx).
Files created:
* *Creates a script in the project directory with the workflow, and imports it into `index.ts` or `main.py`.*
Infrastructure created:
* *Creates a Moose managed temporal script, orchestrating the workflow that was created.*
```ts filename="/project/app/scripts/script.ts" copy
/**
* Task to fetch and ingest aircraft transponder data
*/
, TaskResult>("FetchAndIngestAircraftTransponder", {
run: async () => {
// Fetch data from external API and ingest into Moose
// ... see full code above ...
},
retries: 3,
timeout: "1h"
});
/**
* Workflow to ingest aircraft transponder data
*/
);
/**
* Default export function that returns the workflow instance
* Required by the Moose runtime
*/
return ingestAircraftTransponderWorkflow;
}
```
```ts filename="/project/app/index.ts" copy
export { ingestAircraftTransponderWorkflow } from "./scripts/IngestAircraftTransponder/IngestAircraftTransponder";
```
#### `run_workflow`
Runs a Moose workflow. Used as part of the `write_workflow` flow. [Moose documentation](../../moose/building/workflows.mdx).
#### `write_ingestion_pipeline`
Creates a Moose data model, typically used to define the schema of the data in your Moose project. [Moose documentation](../../moose/building/data-models.mdx).
Files created:
* *Creates a data model in the project directory, and imports it into `index.ts` or `main.py`.*
Infrastructure created:
* *Depending on the configuration, creates a ClickHouse table, a Redpanda topic, and/or an ingest API.*
```ts filename="/project/app/datamodels/.ts" copy
// Define aircraft tracking data interface
export interface AircraftTrackingData {
hex: string; // Unique aircraft identifier
flight: string; // Flight number
// ... other fields
}
// Define ingest pipelines
);
```
```ts filename="/project/app/index.ts" copy
export * from "./datamodels/models";
```
#### `write_stream_function`
Creates a Moose stream processing function. Runs on a per row basis as data is ingested into the stream. [Moose documentation](../../moose/building/streaming-functions.mdx).
Files created:
* *Creates a stream function in the project directory, and imports it into `index.ts` or `main.py`.*
Infrastructure created:
* *Creates a streaming function.*
```ts filename="/project/app/functions/.ts" copy
// Transform raw aircraft data to processed format
function transformAircraft(record: AircraftTrackingData): AircraftTrackingProcessed {
const zorderCoordinate = calculateZOrder(record.lat, record.lon);
const navFlags = parseNavModes(record.nav_modes);
return {
...record,
zorderCoordinate,
...navFlags,
timestamp: new Date(record.timestamp),
};
}
// Connect the data pipeline
AircraftTrackingDataPipeline.stream!.addTransform(
AircraftTrackingProcessedPipeline.stream!,
transformAircraft
);
```
```ts filename="/project/app/index.ts" copy
export * from "./functions/process_aircraft";
```
#### `write_materialized_view`
Creates a Moose materialized view. [Moose documentation](../../moose/building/materialized-views.mdx).
Files created:
* *Creates a materialized view in the project directory, and imports it into `index.ts` or `main.py`.*
Infrastructure created:
* *Creates a materialized view in ClickHouse.*
```ts filename="/project/app/views/.ts" copy
// Define schema for aggregated aircraft data near San Francisco
interface AircraftTrackingProcessed_NearbySFSchema {
hex: string;
last_seen: string & Aggregated<"max", [Date]>;
count: string & Aggregated<"count", []>;
avg_lat: string & Aggregated<"avg", [number]>;
avg_lon: string & Aggregated<"avg", [number]>;
}
// SQL query to find aircraft within 50 miles of San Francisco
const query = sql`
// SQL Query Here
`;
// Create materialized view for efficient querying
);
```
```ts filename="/project/app/index.ts" copy
export * from './views/AircraftTrackingProcessed_NearbySF';
```
#### `create_egress_api`
Creates an egress API from Moose ClickHouse. Can utilize type safe parameters. [Moose documentation](../../moose/building/egress-apis.mdx).
Files created:
* *Creates an egress API in the project directory, and imports it into `index.ts` or `main.py`.*
Infrastructure created:
* *Creates an egress API.*
```ts filename="/project/app/apis/.ts" copy
interface AircraftRadiusParams {
lat: number; // Center latitude
lon: number; // Center longitude
radius: number; // Search radius in miles
}
) => {
// Execute the query using the Haversine formula to calculate distances
const result = await client.query.execute(sql`
// SQL Query Here
`);
return result;
},
{
metadata: {
description: "Returns all aircraft within a given radius (in miles) of a specified latitude and longitude"
}
}
);
```
```ts filename="/project/app/index.ts" copy
export * from './apis/getAircraftWithinRadius';
```
#### `test_egress_api`
Tests the specified APIs. Used as part of the `create_egress_api` flow.
## Experimental: Remote ClickHouse Tools (Alpha)
Tools for reading data in your external ClickHouse database (e.g. those hosted in Boreal or ClickHouse Cloud). These are useful for iterating off a production project, for debugging production issues or for local testing of changes to production data.
For an abundance of caution, we suggest using read-only credentials for this toolset. [ClickHouse documentation](https://clickhouse.com/docs/operations/access-rights). If you want to modify your ClickHouse database, we suggest using Moose to do so.
### Activate this toolset
These tools are **not** enabled by default.
To enable them explicitly:
Run the following command:
```bash filename="Terminal" copy
sloan config tools
```
and enable `remote-clickhouse-tools`. e.g.:
```txt filename="Terminal" copy
? Select tools to enable (no selection defaults to moose-read-tools):
[ ] moose-read-tools - Enable moose read tools for data inspection
[ ] moose-write-tools - Enable moose write tools for full functionality (requires API key, auto-enables read tools)
[x] remote-clickhouse-tools - Enable Remote Clickhouse integration
```
If you are managing the Sloan MCP with `mcp.json` file, you can enable this toolset by adding the following to your `mcp.json` file:
```json filename="MCP.json" copy
"args": [
...
"--remote-clickhouse-tools",
...
]
"env": {
...
"BOREAL_CLICKHOUSE_HOST": "...",
"BOREAL_CLICKHOUSE_PORT": "...",
"BOREAL_CLICKHOUSE_USER": "...",
"BOREAL_CLICKHOUSE_PASSWORD": "...",
"BOREAL_CLICKHOUSE_DATABASE": "...",
}
```
## Other Experimental Tools
These tools are in research preview, [let us know if you are interested in using them](mailto:sloan@fiveonefour.com).
### Experimental: Context management tools (Research Preview)
Tools for managing context related to your Moose project.
Useful for:
- testing your data infrastructure against policy (e.g. data quality, data security, data governance)
- ensuring metrics definition standardization for ad hoc queries of your data
If you are interested in using these tools, please contact us at [support@moose.ai](mailto:sloan@fiveonefour.com).
### Experimental: External tools (Research Preview)
Tools for interacting with external data systems.
Current supported external data systems:
- DuckDB
- Databricks
If you are interested in using these tools, or any other external data tools, please contact us at [support@moose.ai](mailto:sloan@fiveonefour.com).
---
## Moose Templates - Example Projects
Source: templates.mdx
Pre-configured Moose applications to jumpstart your data engineering projects with Next.js, Streamlit, TypeScript, and Python
# Templates
Templates are pre-configured Moose applications that help you get started with your data projects. Choose a template based on your preferred:
- Frontend framework (Next.js, Streamlit)
- Language (TypeScript, Python)
- Moose architectural features (streaming, ETL, materialized views, APIs)
To get started with a template, run:
```bash copy
moose init
```
### View List of Templates
Run this command to see a list of available templates (and supported languages):
```bash copy
moose template list
```
## Moose + Next.js
## Moose + Streamlit
## TypeScript
## Python
## Streaming Data Processing
## ETL & Workflows
See the [templates repository](https://github.com/514-labs/moose/tree/main/templates) for source code and additional templates.
---
## Aircraft Transponder (ADS-B) Template
Source: templates/adsb.mdx
Build a real-time aircraft tracking dashboard with Moose, Next.js, and open ADS-B data
# Aircraft Transponder (ADS-B) Template
This template demonstrates how to build a modern full-stack application that ingests and analyzes aircraft transponder data (ADS-B) from military aircraft. It showcases a complete architecture combining a Next.js frontend with a Moose backend for real-time data processing and storage.
Aircraft have transponders, and there are a group of folk around the world that open-source record the transponder data and push it to [ADSB.lol](https://adsb.lol/). Lets grab that data and create something interesting.
View Source Code →
## Getting Started
### Prerequisites
Before getting started, make sure you have the following installed:
* [Docker Desktop](https://www.docker.com/products/docker-desktop/)
* [NodeJS 20+](https://nodejs.org/en/download)
* [Anthropic API Key](https://docs.anthropic.com/en/api/getting-started)
* [Claude Desktop 24+](https://claude.ai/download)
* [Cursor](https://www.cursor.com/)
### Install Moose and Sloan
To get started, install Moose (open source developer platform for data engineering) and Sloan (AI data engineering product).
```bash filename="terminal" copy
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose,sloan
```
This installation will ask for your Anthropic API key if you want to use the AI features. If you don't have one, you can follow the [setup guide at Anthropic's website](https://docs.anthropic.com/en/api/getting-started).
### Create a new project using the ADS-B template
```bash filename="terminal" copy
moose init aircraft ads-b-frontend
```
This template creates a project "aircraft" with two subdirectories:
```
project/
├── frontend/ # Next.js frontend application
│ ├── src/
│ └── package.json
│
├── moose/ # Moose backend
│ ├── app/
│ │ ├── datamodels/ # Data models for aircraft tracking
│ │ ├── functions/ # Processing functions
│ │ └── scripts/ # Workflow scripts
│ ├── moose.config.toml # Moose configuration
│ └── package.json
```
The frontend—a template NextJS application to be used by the LLM for generating a frontend. Moose—the backend ingesting the data, processing and storing it.
### Set up the project
Navigate to the moose subdirectory and install dependencies:
```bash filename="terminal" copy
cd aircraft/moose
npm install
```
### Start the Docker services
Make sure Docker Desktop is running, then start the Moose development server:
```bash filename="terminal" copy
moose dev
```
This will start all necessary local infrastructure including ClickHouse, Redpanda, Temporal, and the Rust ingest servers.
Thiw will also start the ingestion workflow, retrieve data from the adsb.lol military aircraft tracking API, process it according to the data model, and ingest it into ClickHouse.
You should see a large amount of data being ingested in the Moose CLI.
```
[POST] Data received at ingest API sink for AircraftTrackingData
Received AircraftTrackingData -> AircraftTrackingProcessed - 1 1 message(s)
POST ingest/AircraftTrackingData
[POST] Data received at ingest API sink for AircraftTrackingData
POST ingest/AircraftTrackingData
Received AircraftTrackingData -> AircraftTrackingProcessed - 1 1 message(s)
[POST] Data received at ingest API sink for AircraftTrackingData
Received AircraftTrackingData -> AircraftTrackingProcessed - 1 1 message(s)
POST ingest/AircraftTrackingData
[DB] 136 row(s) successfully written to DB table (AircraftTrackingProcessed)
[POST] Data received at ingest API sink for AircraftTrackingData
```
### Test Moose deployment and data ingestion
To see what infrastructure has been set up, run `moose ls` or `moose ls --json`.
If you connect to the locally deployed ClickHouse table, query it! Here's a query that will return unique types of aircraft in the dataset.
```SQL
SELECT DISTINCT aircraft_type
FROM aircraft_tracking_data;
```
## Explore the Data
Using Sloan's MCP tools, you can explore the ingested aircraft data using AI tools like Claude Desktop or Cursor. We recommend using Claude for exploring your data and ad-hoc analytics, and Cursor for productionizing your results.
### Explore with Claude Desktop
Claude Desktop can help you analyze the data through natural language queries.
#### Initialize the Sloan MCP for your project with the Claude Client
```
cd /path/to/your/project/moose
sloan setup --mcp claude-desktop
```
#### Explore your data in Claude
Try asking Claude exploratory questions like:
- "Tell me about the data in my ClickHouse tables"
- "Tell me about the flow of data in Moose project"
- "Create a pie chart of the types of aircraft in the air right now"
- "Create a visualization of aircraft type against altitude"
### Productionize Your Results with Cursor
For a code-forward workflow, Cursor provides a great environment to productionize your queries.
#### Initialize the Sloan MCP for your project with the Claude Client
```bash filename="terminal" copy
sloan setup --mcp cursor-global
```
This will create a `/.cursor/mcp.json` file with Sloan's MCP configuration.
#### Enable the Sloan Cursor MCP
Enable the MCP in Cursor by going to:
`cursor > settings > cursor settings > MCP` and clicking `enable` and `refresh`.
Try asking Cursor to help you productionize your analysis:
- "Could you create a query that returns all the aircraft types that are currently in the sky?"
- "Could you create an egress API to furnish that data to a frontend?"
## Integration Points
You If you create an egress API above, you can create a frontend using the provided NextJS project.
### Install NextJS dependencies
```bash filename="terminal" copy
cd path/to/your/project/frontend
npm i
```
### Run the local development server
```bash filename="terminal" copy
npm run dev
```
Drag the frontend folder to the chat as context, as well as the file containing the generated egress API, and prompt:
- "using the packagages you've been given, generate a frontend showing the data offered by the API."
---
## Live Brainwave Analytics Template
Source: templates/brainmoose.mdx
Build a real-time brainwave monitoring and analytics platform using Moose, EEG devices, and advanced data processing
# Brainwaves Template
## Overview
The Brainwaves template demonstrates how to build a comprehensive brain mapping and movement analytics platform using Moose. It features real-time collection, analysis, and visualization of brainwave data from EEG devices like the Muse Headband, with support for both live device streaming and simulation using datasets.
## Features
- Real-time brainwave data collection and analysis from EEG devices
- Live terminal dashboard with interactive charts and visualizations
- Session-based data logging and tracking
- Movement and relaxation state analysis
- Dual-application architecture (DAS + Brainmoose backend)
- Optional OpenAI-powered insights and analysis
- Support for both live device streaming and CSV simulation
## Architecture
### DAS: Data Acquisition Server
The Data Acquisition Server is a Node.js/TypeScript application that handles:
1. **Real-time Data Ingestion**
- UDP/OSC data collection from Muse devices or simulators
- Session-based CSV logging with unique identifiers
2. **Live Analysis & Visualization**
- Terminal dashboard with real-time charts and tables
- Brainwave band analysis (Alpha, Beta, Delta, Theta, Gamma)
- Movement detection using accelerometer and gyroscope data
3. **Data Forwarding**
- Automatic forwarding to Moose backend for storage and analytics
### Brainmoose: Analytics & API Backend
The Moose-powered backend provides:
- Modular data ingestion and storage pipeline
- Advanced analytics blocks for movement scoring
- RESTful APIs for querying session insights
- Optional OpenAI GPT-4o integration for enhanced analysis
- ClickHouse-based data warehouse for complex queries
## Getting Started
Node.js 20+
Moose CLI
Optional: Muse Headband EEG device
```bash
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
1. Create a new Moose project from the template:
```bash
moose init moose-brainwaves brainwaves
cd moose-brainwaves
```
2. Install dependencies for both applications:
```bash
# Install DAS dependencies
cd apps/das
npm install
# Install Brainmoose dependencies
cd ../brainmoose
npm install
```
3. Start with simulation (no device required):
```bash
# Download sample data
cd ../das
./download.sh
# Start DAS
npm run dev -- --sessionId=MyTestSession
```
4. In another terminal start the Moose analytics backend:
```bash
cd {project_root}/apps/brainmoose
moose dev
```
5. In another terminal run the simulation (no device required):
```bash
cd {project_root}/apps/das
./sim.sh brain_data_coding.csv
```
## API Endpoints
The template exposes key API endpoints for accessing brainwave analytics:
- `/api/sessionInsights`
- Parameters: sessions (comma-separated sessionId|sessionLabel pairs)
- Returns: Movement scores and session analytics for specified sessions
## Data Models
### Brain Data Schema
```typescript
interface BrainData {
timestamp: DateTime;
bandOn: boolean;
acc: {
x: number;
y: number;
z: number;
};
gyro: {
x: number;
y: number;
z: number;
};
alpha: number;
beta: number;
delta: number;
theta: number;
gamma: number;
ppm: {
channel1: number;
channel2: number;
channel3: number;
};
sessionId: string;
}
```
## Analytics & Insights
The template includes sophisticated analytics for:
### Movement Analysis
- Accelerometer-based movement scoring
- Gyroscope rotation detection
- Combined movement metrics calculation
### Brainwave Band Analysis
- Alpha wave relaxation indicators
- Beta wave focus measurements
- Delta, Theta, and Gamma band processing
- Real-time state classification
### Session Scoring
Example movement score calculation:
```sql
SELECT
sessionId,
SUM(sqrt((arrayElement(acc, 1)^2) + (arrayElement(acc, 2)^2) + (arrayElement(acc, 3)^2))) AS acc_movement_score,
SUM(sqrt((arrayElement(gyro, 1)^2) + (arrayElement(gyro, 2)^2) + (arrayElement(gyro, 3)^2))) AS gyro_movement_score
FROM Brain_0_0
WHERE sessionId = '1735785243'
GROUP BY sessionId;
```
## Device Support
### Muse Headband Integration
- Direct OSC data streaming support
- Real-time EEG signal processing
- Multi-channel brainwave analysis
### Simulation Support
- CSV data playback for testing
- Sample datasets included
- Configurable replay speeds
## Customization
You can customize the template by:
- Extending brainwave analysis algorithms
- Adding new visualization components to the terminal UI
- Implementing custom movement detection logic
- Integrating additional EEG devices or data sources
- Enhancing the OpenAI analysis prompts and insights
- Creating custom API endpoints for specific analytics needs
## Educational Resources
Listen to these ~15 minute podcasts generated using NotebookLM:
- [Muse Headband Overview](https://downloads.fiveonefour.com/moose/template-data/brainwaves/podcasts/MuseHeadband.mp3)
- [Research Using Consumer EEG Devices](https://downloads.fiveonefour.com/moose/template-data/brainwaves/podcasts/ResearchUsingConsumerEEGDevices.mp3)
---
## Github Trending Topics Template
Source: templates/github.mdx
Build a real-time GitHub trending topics dashboard with Moose and Next.js
# Github Trending Topics
This template demonstrates how to build a real-time data pipeline and dashboard for tracking GitHub trending topics. It showcases a complete full-stack architecture that combines a Next.js frontend with a Moose backend for data ingestion, processing, and API generation.
View Source Code →
## Architecture Overview
The template implements a modern full-stack architecture with the following components:
1. **Frontend Layer (Next.js)**
- Real-time dashboard for trending topics
- Interactive data visualization
- Type-safe API integration with Moose Analytics APIs
- Built with TypeScript and Tailwind CSS
2. **Backend Layer (Moose)**
- GitHub Events data ingestion
- Real-time data processing pipeline
- Type-safe APIs with Moose Analytics APIs
- ClickHouse for analytics storage
3. **Infrastructure Layer**
- ClickHouse for high-performance analytics
- Redpanda for event streaming
- Temporal for workflow orchestration
- Type-safe HTTP API endpoints
```mermaid
graph TD
A[Next.js Dashboard] --> B[Moose Analytics APIs]
B --> C[ClickHouse]
D[GitHub API] --> E[Moose Workflow]
E --> F[Moose Ingest Pipeline]
F --> C
```
## Getting Started
### Clone the repository:
To get started with this template, you can run the following command:
```bash filename="terminal" copy
moose init moose-github-dev-trends github-dev-trends
```
### Install dependencies:
```bash filename="terminal" copy
cd moose-github-dev-trends/moose-backend && npm install
```
## Project Structure
The template is organized into two main components:
```
moose-github-dev-trends/
├── dashboard/ # Frontend dashboard
│ ├── app/ # Next.js pages and routes
│ ├── components/ # React components
│ ├── generated-client/ # Auto-generated API client
│ └── lib/ # Utility functions
│
├── moose-backend/ # Backend services
│ ├── app/
│ │ ├── apis/ # Analytics API definitions
│ │ ├── ingest/ # Data ingestion logic
│ │ ├── blocks/ # Reusable processing blocks
│ │ └── scripts/ # Workflow scripts
│ └── moose.config.toml # Moose configuration
```
### Run the application
```bash filename="terminal" copy
moose dev
```
### Watch the polling workflow run
When you run `moose dev`, you'll see the workflow run in the Moose logs.
This workflow will run a script every 60 seconds to poll GitHub for trending topics.
[Learn more about scheduled workflows](/moose/building/workflows#scheduling-workflows).
### (Optional) Configure GitHub API Token
```bash filename="terminal" copy
touch .env
```
```.env filename=".env" copy
GITHUB_TOKEN=
```
Without authentication, you're limited to 60 requests/hour. With a token, this increases to 5,000 requests/hour.
## Development Workflow
The visualization combines several modern technologies:
- **Moose** for type-safe API generation
- **Tanstack Query** for data management
- **Recharts** for visualization
- **shadcn/ui** for UI components
- **Tailwind CSS** for styling
The template demonstrates a complete integration between Moose and Next.js:
### Moose Backend
The backend is responsible for data ingestion and API generation:
```bash filename="terminal" copy
cd moose-backend && moose dev
```
Key components:
- **GitHub Event Polling**: Workflow in `app/scripts/` fetches public events and posts them to the Moose Ingest Pipeline.
- **Ingest Pipeline**: Data Model and infrastructure for ingesting GitHub Events in `app/ingest/WatchEvent.ts`
- **Analytics APIs**: Analytics APIs in `app/apis/` expose the data for the frontend
#### Start the Moose Backend
Spin up the Moose dev server:
```bash filename="terminal" copy
moose dev
```
This will automatically start the workflow that polls GitHub for trending topics and ingests the data into ClickHouse.
### Frontend Integration
The frontend automatically integrates with the Moose backend through generated APIs:
#### API Client Generation
Moose generates an OpenAPI spec from your Analytics APIs that we use with [OpenAPI Generator](https://openapi-generator.tech/):
- Generated client located in `live-dev-trends-dashboard/generated-client`
- Based on OpenAPI schema from `moose-backend/.moose/openapi.yaml`
When running the following command, make sure:
- You're in the `moose-backend` directory
- The dev server is running: `moose dev`
```bash filename="Terminal" copy
cd moose-backend
openapi-generator-cli generate -i .moose/openapi.yaml -g typescript-fetch -o ../dashboard/generated-client
```
Remember to regenerate the client when you:
- Update Analytics API schemas
- Modify Ingest Pipeline definitions
- Change API endpoints or parameters
#### Data Flow & Visualization
The dashboard implements a modern data flow pattern using [Tanstack Query](https://tanstack.com/query/latest/docs/framework/react/react-query-data-fetching) and [Shadcn/UI Charts](https://ui.shadcn.com/docs/components/charts):
```ts filename="lib/moose-client.ts" copy
const apiConfig = new Configuration({
basePath: "http://localhost:4000",
});
const mooseClient = new DefaultApi(apiConfig);
export type TopicTimeseriesRequest = ApiTopicTimeseriesGetRequest;
export type TopicTimeseriesResponse = TopicTimeseriesGet200ResponseInner;
```
```ts filename="components/trending-topics-chart.tsx" copy
// components/trending-topics-chart.tsx
export function TrendingTopicsChart() {
const [interval, setInterval] = useState("hour");
const [limit, setLimit] = useState(10);
const [exclude, setExclude] = useState("");
const { data, isLoading, error } = useQuery({
queryKey: ["topicTimeseries", interval, limit, exclude],
queryFn: async () => {
const result = await mooseClient.ApiTopicTimeseriesGet({
interval,
limit,
exclude: exclude || undefined,
});
return result;
},
});
// Handle loading and error states
if (isLoading) return ;
if (error) return ;
}
```
The dashboard creates an animated bar chart of trending topics over time:
```tsx filename="components/trending-topics-chart.tsx" copy
// Transform the current time slice of data for the chart
const chartData = data[currentTimeIndex].topicStats.map((stat) => ({
eventCount: parseInt(stat.eventCount),
topic: stat.topic,
fill: `var(--color-${stat.topic})`
}));
// Render vertical bar chart with controls
return (
);
```
#### Start the Frontend
```bash filename="terminal" copy
cd dashboard && npm install && npm run dev
```
Visit [http://localhost:3000](http://localhost:3000) to see real-time trending topics.
## Next Steps
Once you have the data flowing, you can:
1. Add custom metrics and visualizations
2. Implement additional GitHub data sources
3. Create new API endpoints for specific analyses
4. Build alerting for trending topic detection
Feel free to modify the data models, processing functions, or create new APIs to suit your specific needs!
## Deployment:
Deploying this project involves deploying the Moose backend service and the frontend dashboard separately.
**Prerequisites:**
* A GitHub account and your project code pushed to a GitHub repository.
* A [Boreal](https://boreal.cloud/signup) account for the backend.
* A [Vercel](https://vercel.com/signup) account (or similar platform) for the frontend.
### 1. Push to GitHub
Push your code to a GitHub repository. Configure a remote repository for your Moose project.
```bash filename="terminal" copy
git remote add origin
git push -u origin main
```
### 2. Deploying the Moose Backend (Boreal)
* **Create Boreal Project:**
* Log in to your Boreal account and create a new project.
* Connect Boreal to your GitHub account and select the repository containing your project.
* Configure the project settings, setting the **Project Root** to the `moose-backend` directory.
* **Configure Environment Variables:**
* In the Boreal project settings, add the `GITHUB_TOKEN` environment variable with your GitHub Personal Access Token as the value.
* **Deploy:** Boreal should automatically build and deploy your Moose service based on your repository configuration. It will also typically start any polling sources (like the GitHub event poller) defined in your `moose.config.toml`.
* **Note API URL:** Once deployed, Boreal will provide a public URL for your Moose backend API. You will need this for the frontend deployment.
### 3. Deploying the Frontend Dashboard (Vercel)
* **Create Vercel Project:**
* Log in to your Vercel account and create a new project.
* Connect Vercel to your GitHub account and select the repository containing your project.
* Set the **Root Directory** in Vercel to `dashboard` (or wherever your frontend code resides within the repository).
* **Configure Environment Variables:**
* This is crucial: The frontend needs to know where the deployed backend API is located.
* Add an environment variable in Vercel to point to your Boreal API URL. The variable name depends on how the frontend code expects it (e.g., `NEXT_PUBLIC_API_URL`). Check the frontend code (`dashboard/`) for the exact variable name.
```
# Example Vercel Environment Variable
NEXT_PUBLIC_API_URL=https://your-boreal-project-url.boreal.cloud
```
* **Deploy:** Vercel will build and deploy your Next.js frontend.
Once both backend and frontend are deployed and configured correctly, your live GitHub Trends Dashboard should be accessible via the Vercel deployment URL.
---
## Live Heart Rate Monitoring Template
Source: templates/heartrate.mdx
Build a real-time health analytics dashboard with Moose, Streamlit, and Python
# Live Heart Rate Leaderboard Template
## Overview
The Live Heart Rate Leaderboard template demonstrates how to build a real-time health monitoring application using Moose. It features a Streamlit-based dashboard that displays live heart rate data, calculates performance metrics, and maintains a competitive leaderboard.
## Features
- Real-time heart rate monitoring dashboard with interactive graphs
- Live leaderboard tracking multiple users
- Heart rate zone visualization
- Performance metrics calculation (power output and calories burned)
- User-specific data tracking and visualization
## Architecture
### Moose Data Pipeline Backend
The template implements a three-stage data processing pipeline:
1. **Raw Data Ingestion** (`RawAntHRPacket`)
- Captures raw heart rate data from ANT+ devices
- Includes basic device and timestamp information
2. **Data Processing** (`ProcessedAntHRPacket`)
- Transforms raw data into processed format
- Adds calculated metrics and validation
3. **Unified Format** (`UnifiedHRPacket`)
- Standardizes heart rate data for analytics
- Includes user information and derived metrics
### Streamlit Frontend Dashboard
The Streamlit dashboard (`streamlit_app.py`) provides:
- Real-time heart rate visualization
- Performance metrics display
- Interactive user selection
- Live-updating leaderboard
- Heart rate zone indicators
## Getting Started
Python 3.12+
Moose CLI
```bash
bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose
```
1. Create a new Moose project from the template:
```bash
moose init moose-heartrate live-heartrate-leaderboard
cd moose-heartrate
```
2. Create a new virtual environment and install the dependencies:
```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
3. Configure your environment variables in `.env`
4. Start the Moose pipeline:
```bash
moose dev
```
5. Launch the Streamlit dashboard:
```bash
streamlit run streamlit_app.py
```
## API Endpoints
The template exposes two main API endpoints:
- `/api/getUserLiveHeartRateStats`
- Parameters: user_name, window_seconds
- Returns: Recent heart rate statistics for a specific user
- `/api/getLeaderboard`
- Parameters: time_window_seconds, limit
- Returns: Ranked list of users based on performance metrics
## Data Models
### UnifiedHRPacket
```python
from moose_lib import Key
from pydantic import BaseModel
from datetime import datetime
class UnifiedHRPacket(BaseModel):
user_id: Key[int]
user_name: str
device_id: int
hr_timestamp_seconds: float
hr_value: float
rr_interval_ms: float
processed_timestamp: datetime
```
## Performance Calculations
The template includes calculations for:
- Heart rate zones (1-5)
- Estimated power output
- Cumulative calories burned
- Average performance metrics
## Customization
You can customize the template by:
- Modifying heart rate zone thresholds
- Adjusting performance calculation formulas
- Extending the data pipeline with additional metrics
- Customizing the dashboard layout and visualizations
---
## Usage Data
Source: usage-data.mdx
Usage Data for Moose
### Information Collected
Please note that Moose may collect information about how you use the service, such as activity data and feature usage.
MooseJS will NEVER collect the data being ingested/processed/consumed with your Moose application. Moose collects aggregated or identifiable data solely to
improve our operations and understand how to provide you with the best experience.