# Moose Documentation – Python ## Included Files 1. moose/apis.mdx 2. moose/apis/admin-api.mdx 3. moose/apis/analytics-api.mdx 4. moose/apis/auth.mdx 5. moose/apis/ingest-api.mdx 6. moose/apis/openapi-sdk.mdx 7. moose/apis/trigger-api.mdx 8. moose/changelog.mdx 9. moose/configuration.mdx 10. moose/data-modeling.mdx 11. moose/deploying.mdx 12. moose/deploying/configuring-moose-for-cloud.mdx 13. moose/deploying/deploying-on-an-offline-server.mdx 14. moose/deploying/deploying-on-ecs.mdx 15. moose/deploying/deploying-on-kubernetes.mdx 16. moose/deploying/deploying-with-docker-compose.mdx 17. moose/deploying/monitoring.mdx 18. moose/deploying/packaging-moose-for-deployment.mdx 19. moose/deploying/preparing-clickhouse-redpanda.mdx 20. moose/getting-started/from-clickhouse.mdx 21. moose/getting-started/quickstart.mdx 22. moose/help/minimum-requirements.mdx 23. moose/help/troubleshooting.mdx 24. moose/in-your-stack.mdx 25. moose/index.mdx 26. moose/llm-docs.mdx 27. moose/local-dev.mdx 28. moose/mcp-dev-server.mdx 29. moose/metrics.mdx 30. moose/migrate.mdx 31. moose/migrate/lifecycle.mdx 32. moose/migrate/migration-types.mdx 33. moose/moose-cli.mdx 34. moose/olap.mdx 35. moose/olap/apply-migrations.mdx 36. moose/olap/db-pull.mdx 37. moose/olap/external-tables.mdx 38. moose/olap/indexes.mdx 39. moose/olap/insert-data.mdx 40. moose/olap/model-materialized-view.mdx 41. moose/olap/model-table.mdx 42. moose/olap/model-view.mdx 43. moose/olap/planned-migrations.mdx 44. moose/olap/read-data.mdx 45. moose/olap/schema-change.mdx 46. moose/olap/schema-optimization.mdx 47. moose/olap/schema-versioning.mdx 48. moose/olap/supported-types.mdx 49. moose/olap/ttl.mdx 50. moose/reference/py-moose-lib.mdx 51. moose/reference/ts-moose-lib.mdx 52. moose/streaming.mdx 53. moose/streaming/connect-cdc.mdx 54. moose/streaming/consumer-functions.mdx 55. moose/streaming/create-stream.mdx 56. moose/streaming/dead-letter-queues.mdx 57. moose/streaming/from-your-code.mdx 58. moose/streaming/schema-registry.mdx 59. moose/streaming/sync-to-table.mdx 60. moose/streaming/transform-functions.mdx 61. moose/workflows.mdx 62. moose/workflows/cancel-workflow.mdx 63. moose/workflows/define-workflow.mdx 64. moose/workflows/retries-and-timeouts.mdx 65. moose/workflows/schedule-workflow.mdx 66. moose/workflows/trigger-workflow.mdx ## Moose APIs Source: moose/apis.mdx Create type-safe ingestion and analytics APIs for data access and integration # Moose APIs ## Overview The APIs module provides standalone HTTP endpoints for data ingestion and analytics. Unlike other modules of the MooseStack, APIs are meant to be paired with other MooseStack modules like OLAP tables and streams. ## Core Capabilities ## Basic Examples ### Ingestion API ```py filename="IngestApi.py" copy from moose_lib import IngestApi from pydantic import BaseModel from datetime import datetime class UserEvent(BaseModel): id: str user_id: str timestamp: datetime event_type: str # Create a standalone ingestion API user_events_api = IngestApi[UserEvent]("user-events", IngestConfig(destination=event_stream)) # No export needed - Python modules are automatically discovered ``` ### Analytics API ```py filename="AnalyticsApi.py" copy from moose_lib import Api, MooseClient from pydantic import BaseModel class Params(BaseModel): user_id: str limit: int class ResultData(BaseModel): id: str name: str email: str def query_function(client: MooseClient, params: QueryParams) -> list[UserData]: # Query external service or custom logic using parameter binding query = "SELECT * FROM user_data WHERE user_id = {user_id} LIMIT {limit}" return client.query.execute(query, {"user_id": params.user_id, "limit": params.limit}) user_data_api = Api[Params, ResultData]("get-data", query_function) # No export needed - Python modules are automatically discovered ``` --- ## apis/admin-api Source: moose/apis/admin-api.mdx # Coming Soon --- ## APIs Source: moose/apis/analytics-api.mdx APIs for Moose # APIs ## Overview APIs are functions that run on your server and automatically exposed as HTTP `GET` endpoints. They are designed to read data from your OLAP database. Out of the box, these APIs provide: - Automatic type validation and type conversion for your query parameters, which are sent in the URL, and response body - Managed database client connection - Automatic OpenAPI documentation generation Common use cases include: - Powering user-facing analytics, dashboards and other front-end components - Enabling AI tools to interact with your data - Building custom APIs for your internal tools ### Enabling APIs Analytics APIs are enabled by default. To explicitly control this feature in your `moose.config.toml`: ```toml filename="moose.config.toml" copy [features] apis = true ``` ### Basic Usage `execute` is the recommended way to execute queries. It provides a thin wrapper around the ClickHouse Python client so that you can safely pass `OlapTable` and `Column` objects to your query without needing to worry about ClickHouse identifiers: ```python filename="ExampleApi.py" copy from moose_lib import Api, MooseClient from pydantic import BaseModel ## Import the source pipeline from app.path.to.SourcePipeline import SourcePipeline # Define the query parameters class QueryParams(BaseModel): filter_field: str max_results: int # Define the response body class ResponseBody(BaseModel): id: int name: str value: float SourceTable = SourcePipeline.get_table() # Define the route handler function (parameterized) def run(client: MooseClient, params: QueryParams) -> list[ResponseBody]: query = """ SELECT id, name, value FROM {table} WHERE category = {category} LIMIT {limit} """ return client.query.execute(query, {"table": SourceTable, "category": params.filter_field, "limit": params.max_results}) # Create the API example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run) ``` Use `execute_raw` with parameter binding for safe, typed queries: ```python filename="ExampleApi.py" copy from moose_lib import Api, MooseClient from pydantic import BaseModel ## Import the source pipeline from app.path.to.SourcePipeline import SourcePipeline # Define the query parameters class QueryParams(BaseModel): filterField: str maxResults: int # Define the response body class ResponseBody(BaseModel): id: int name: str value: float SourceTable = SourcePipeline.get_table() # Define the route handler function (using execute_raw with typed parameters) def run(client: MooseClient, params: QueryParams) -> list[ResponseBody]: query = """ SELECT id, name, value FROM Source WHERE category = {category:String} LIMIT {limit:UInt32} """ return client.query.execute_raw(query, {"category": params.filterField, "limit": params.maxResults}) # Create the API example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run) ``` ```python filename="SourcePipeline.py" copy from moose_lib import IngestPipeline, IngestPipelineConfig, Key from pydantic import BaseModel class SourceSchema(BaseModel): id: Key[int] name: str value: float SourcePipeline = IngestPipeline[SourceSchema]("Source", IngestPipelineConfig( ingest_api=False, stream=True, table=True, )) ``` The `Api` class takes: - Route name: The URL path to access your API (e.g., `"example_endpoint"`) - Handler function: Processes requests with typed parameters and returns the result The generic type parameters specify: - `QueryParams`: The structure of accepted URL parameters - `ResponseBody`: The exact shape of your API's response data You can name these types anything you want. The first type generates validation for query parameters, while the second defines the response structure for OpenAPI documentation. ## Type Validation You can also model the query parameters and response body as Pydantic models, which Moose will use to provide automatic type validation and type conversion for your query parameters, which are sent in the URL, and response body. ### Modeling Query Parameters Define your API's parameters as a Pydantic model: ```python filename="ExampleQueryParams.py" copy from pydantic import BaseModel from typing import Optional class QueryParams(BaseModel): filterField: str = Field(..., description="The field to filter by") maxResults: int = Field(..., description="The maximum number of results to return") optionalParam: Optional[str] = Field(None, description="An optional parameter") ``` Moose automatically handles: - Runtime validation - Clear error messages for invalid parameters - OpenAPI documentation generation Complex nested objects and arrays are not supported. Analytics APIs are `GET` endpoints designed to be simple and lightweight. ### Adding Advanced Type Validation Moose uses Pydantic for runtime validation. Use Pydantic's `Field` class for more complex validation: ```python filename="ExampleQueryParams.py" copy from pydantic import BaseModel, Field class QueryParams(BaseModel): filterField: str = Field(pattern=r"^(id|name|email)$", description="The field to filter by") ## Only allow valid column names from the UserTable maxResults: int = Field(gt=0, description="The maximum number of results to return") ## Positive integer ``` ### Common Validation Options ```python filename="ValidationExamples.py" copy from pydantic import BaseModel, Field class QueryParams(BaseModel): # Numeric validations id: int = Field(..., gt=0) age: int = Field(..., gt=0, lt=120) price: float = Field(..., gt=0, lt=1000) discount: float = Field(..., gt=0, multiple_of=0.5) # String validations username: str = Field(..., min_length=3, max_length=20) email: str = Field(..., format="email") zipCode: str = Field(..., pattern=r"^[0-9]{5}$") uuid: str = Field(..., format="uuid") ipAddress: str = Field(..., format="ipv4") # Date validations startDate: str = Field(..., format="date") # Enum validation status: str = Field(..., enum=["active", "pending", "inactive"]) # Optional parameters limit: int = Field(None, gt=0, lt=100) ``` For a full list of validation options, see the [Pydantic documentation](https://docs.pydantic.dev/latest/concepts/types/#customizing-validation-with-fields). ### Setting Default Values You can set default values for parameters by setting values for each parameter in your Pydantic model: ```python filename="ExampleQueryParams.py" copy {9} from pydantic import BaseModel class QueryParams(BaseModel): filterField: str = "example" maxResults: int = 10 optionalParam: str | None = "default" ``` ## Implementing Route Handler API route handlers are regular functions, so you can implement whatever arbitrary logic you want inside these functions. Most of the time you will be use APIs to expose your data to your front-end applications or other tools: ### Connecting to the Database Moose provides a managed `MooseClient` to your function execution context. This client provides access to the database and other Moose resources, and handles connection pooling/lifecycle management for you: ```python filename="ExampleApi.py" copy from moose_lib import MooseClient from app.UserTable import UserTable def run(client: MooseClient, params: QueryParams): # You can use a formatted string for simple static query query = """ SELECT COUNT(*) FROM {table} """ ## You can optionally pass the table object to the query return client.query.execute(query, {"table": UserTable}) ## Create the API example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run) ``` Use `execute_raw` with parameter binding: ```python filename="ExampleApi.py" copy from moose_lib import MooseClient def run(params: QueryParams, client: MooseClient): # Using execute_raw for safe queries query = """ SELECT COUNT(*) FROM {table: Identifier} """ ## Must be the name of the table, not the table object return client.query.execute_raw(query, {"table": UserTable.name}) ## Create the API example_api = Api[QueryParams, ResponseBody](name="example_endpoint", query_function=run) ``` ### Constructing Safe SQL Queries ```python filename="SafeQueries.py" copy from pydantic import BaseModel, Field class QueryParams(BaseModel): min_age: int = Field(ge=0, le=150) status: str = Field(pattern=r"^(active|inactive)$") limit: int = Field(default=10, ge=1, le=1000) search_text: str = Field(pattern=r'^[a-zA-Z0-9\s]*$') def run(client: MooseClient, params: QueryParams): query = """ SELECT * FROM users WHERE age >= {min_age} AND status = '{status}' AND name ILIKE '%{search_text}%' LIMIT {limit} """ return client.query.execute(query, {"min_age": params.min_age, "status": params.status, "search_text": params.search_text, "limit": params.limit}) ``` ```python filename="SafeQueries.py" copy from pydantic import BaseModel, Field class QueryParams(BaseModel): min_age: int = Field(ge=0, le=150) status: str = Field(pattern=r"^(active|inactive)$") limit: int = Field(default=10, ge=1, le=1000) search_text: str = Field(pattern=r'^[a-zA-Z0-9\s]*$') def run(client: MooseClient, params: QueryParams): query = """ SELECT * FROM users WHERE age >= {minAge:UInt32} AND status = {status:String} AND name ILIKE {searchPattern:String} LIMIT {limit:UInt32} """ return client.query.execute_raw(query, { "minAge": params.min_age, "status": params.status, "searchPattern": f"%{params.search_text}%", "limit": params.limit }) ``` #### Basic Query Parameter Interpolation #### Table and Column References ```python filename="ValidatedQueries.py" copy from moose_lib import Api, MooseClient from pydantic import BaseModel, Field, constr from typing import Literal, Optional from enum import Enum from app.UserTable import UserTable class QueryParams(BaseModel): # When using f-strings, we need extremely strict validation column: str = Field(pattern=r"^(id|name|email)$", description="Uses a regex pattern to only allow valid column names") search_term: str = Field( pattern=r'^[\w\s\'-]{1,50}$', # Allows letters, numbers, spaces, hyphens, apostrophes; Does not allow special characters that could be used in SQL injection strip_whitespace=True, min_length=1, max_length=50 ) limit: int = Field( default=10, ge=1, le=100, description="Number of results to return" ) def run(client: MooseClient, params: QueryParams): query = """ SELECT {column} FROM {table} WHERE name ILIKE '%{search_term}%' LIMIT {limit} """ return client.query.execute(query, {"column": UserTable.cols[params.column], "table": UserTable, "search_term": params.search_term, "limit": params.limit}) ``` ```python filename="UserTable.py" copy from moose_lib import OlapTable, Key from pydantic import BaseModel class UserSchema(BaseModel): id: Key[int] name: str email: str UserTable = OlapTable[UserSchema]("users") ``` ### Advanced Query Patterns #### Dynamic Column & Table Selection ```python filename="DynamicColumns.py" copy from app.UserTable import UserTable class QueryParams(BaseModel): colName: str = Field(pattern=r"^(id|name|email)$", description="Uses a regex pattern to only allow valid column names from the UserTable") class QueryResult(BaseModel): id: Optional[int] name: Optional[str] email: Optional[str] def run(client: MooseClient, params: QueryParams): # Put column and table in the dict for variables query = "SELECT {column} FROM {table}" return client.query.execute(query, {"column": UserTable.cols[params.colName], "table": UserTable}) ## Create the API bar = Api[QueryParams, QueryResult](name="bar", query_function=run) ## Call the API ## HTTP Request: GET http://localhost:4000/api/bar?colName=id ## EXECUTED QUERY: SELECT id FROM users ``` ```python filename="UserTable.py" copy from moose_lib import OlapTable, Key from pydantic import BaseModel class UserSchema(BaseModel): id: Key[int] name: str email: str UserTable = OlapTable[UserSchema]("users") ``` #### Conditional `WHERE` Clauses Build `WHERE` clauses based on provided parameters: ```python filename="ConditionalColumns.py" copy class FilterParams(BaseModel): min_age: Optional[int] status: Optional[str] = Field(pattern=r"^(active|inactive)$") search_text: Optional[str] = Field(pattern=r"^[a-zA-Z0-9\s]+$", description="Alphanumeric search text without special characters to prevent SQL injection") class QueryResult(BaseModel): id: int name: str email: str def build_query(client: MooseClient, params: FilterParams) -> QueryResult: # Using f-strings with validated parameters conditions = [] if params.min_age: conditions.append("age >= {min_age}") parameters["min_age"] = params.min_age if params.status: conditions.append("status = {status}") parameters["status"] = params.status if params.search_text: conditions.append("(name ILIKE {search_text} OR email ILIKE {search_text})") parameters["search_text"] = params.search_text where_clause = f" WHERE {' AND '.join(conditions)}" if conditions else "" query = f"""SELECT * FROM users {where_clause} ORDER BY created_at DESC""" return client.query.execute(query, parameters) ## Create the API bar = Api[FilterParams, QueryResult](name="bar", query_function=build_query) ## Call the API ## HTTP Request: GET http://localhost:4000/api/bar?min_age=20&status=active&search_text=John ## EXECUTED QUERY: SELECT * FROM users WHERE age >= 20 AND status = 'active' AND (name ILIKE '%John%' OR email ILIKE '%John%') ORDER BY created_at DESC ``` ### Adding Authentication Moose supports authentication via JSON web tokens (JWTs). When your client makes a request to your Analytics API, Moose will automatically parse the JWT and pass the **authenticated** payload to your handler function as the `jwt` object: ```python filename="Authentication.py" copy def run(client: MooseClient, params: QueryParams, jwt: dict): # Use parameter binding with JWT data query = """SELECT * FROM userReports WHERE user_id = {user_id} LIMIT 5""" return client.query.execute(query, {"user_id": jwt["userId"]}) ``` Moose validates the JWT signature and ensures the JWT is properly formatted. If the JWT authentication fails, Moose will return a `401 Unauthorized error`. ## Understanding Response Codes Moose automatically provides standard HTTP responses: | Status Code | Meaning | Response Body | |-------------|-------------------------|---------------------------------| | 200 | Success | Your API's result data | | 400 | Validation error | `{ "error": "Detailed message"}`| | 401 | Unauthorized | `{ "error": "Unauthorized"}` | | 500 | Internal server error | `{ "error": "Internal server error"}` | ## Post-Processing Query Results After executing your database query, you can transform the data before returning it to the client. This allows you to: ```python filename="PostProcessingExample.py" copy from datetime import datetime from moose_lib import Api from pydantic import BaseModel class QueryParams(BaseModel): category: str max_results: int = 10 class ResponseItem(BaseModel): itemId: int displayName: str formattedValue: str isHighValue: bool date: str def run(client: MooseClient, params: QueryParams): # 1. Fetch raw data using parameter binding query = """ SELECT id, name, value, timestamp FROM data_table WHERE category = {category} LIMIT {limit} """ raw_results = client.query.execute(query, {"category": params.category, "limit": params.max_results}) # 2. Post-process the results processed_results = [] for row in raw_results: processed_results.append(ResponseItem( # Transform field names itemId=row['id'], displayName=row['name'].upper(), # Add derived fields formattedValue=f"${row['value']:.2f}", isHighValue=row['value'] > 1000, # Format dates date=datetime.fromisoformat(row['timestamp']).date().isoformat() )) return processed_results # Create the API process_data_api = Api[QueryParams, ResponseItem](name="process_data_endpoint", query_function=run) ``` ### Best Practices While post-processing gives you flexibility, remember that database operations are typically more efficient for heavy data manipulation. Reserve post-processing for transformations that are difficult to express in SQL or that involve application-specific logic. ## Client Integration By default, all API endpoints are automatically integrated with OpenAPI/Swagger documentation. You can integrate your OpenAPI SDK generator of choice to generate client libraries for your APIs. Please refer to the [OpenAPI](/moose/apis/open-api-sdk) page for more information on how to integrate your APIs with OpenAPI. --- ## API Authentication & Security Source: moose/apis/auth.mdx Secure your Moose API endpoints with JWT tokens or API keys # API Authentication & Security Moose supports two authentication mechanisms for securing your API endpoints: - **[API Keys](#api-key-authentication)** - Simple, static authentication for internal applications and getting started - **[JWT (JSON Web Tokens)](#jwt-authentication)** - Token-based authentication for integration with existing identity providers Choose the method that fits your use case, or use both together with custom configuration. ## Do you want to use API Keys? API keys are the simplest way to secure your Moose endpoints. They're ideal for: - Internal applications and microservices - Getting started quickly with authentication - Scenarios where you control both client and server ### How API Keys Work API keys use PBKDF2 HMAC SHA256 hashing for secure storage. You generate a token pair (plain-text and hashed) using the Moose CLI, store the hashed version in environment variables, and send the plain-text version in your request headers. ### Step 1: Generate API Keys Generate tokens and hashed keys using the Moose CLI: ```bash moose generate hash-token ``` **Output:** - **ENV API Keys**: Hashed key for environment variables (use this in your server configuration) - **Bearer Token**: Plain-text token for client applications (use this in `Authorization` headers) Use the **hashed key** for environment variables and `moose.config.toml`. Use the **plain-text token** in your `Authorization: Bearer token` headers. ### Step 2: Configure API Keys with Environment Variables Set environment variables with the **hashed** API keys you generated: ```bash # For ingest endpoints export MOOSE_INGEST_API_KEY='your_pbkdf2_hmac_sha256_hashed_key' # For analytics endpoints export MOOSE_CONSUMPTION_API_KEY='your_pbkdf2_hmac_sha256_hashed_key' # For admin endpoints export MOOSE_ADMIN_TOKEN='your_plain_text_token' ``` Or set the admin API key in `moose.config.toml`: ```toml filename="moose.config.toml" [authentication] admin_api_key = "your_pbkdf2_hmac_sha256_hashed_key" ``` Storing the `admin_api_key` (which is a PBKDF2 HMAC SHA256 hash) in your `moose.config.toml` file is an acceptable practice, even if the file is version-controlled. This is because the actual plain-text Bearer token (the secret) is not stored. The hash is computationally expensive to reverse, ensuring that your secret is not exposed in the codebase. ### Step 3: Make Authenticated Requests All authenticated requests require the `Authorization` header with the **plain-text token**: ```bash # Using curl curl -H "Authorization: Bearer your_plain_text_token_here" \ https://your-moose-instance.com/ingest/YourDataModel # Using JavaScript fetch('https://your-moose-instance.com/api/endpoint', { headers: { 'Authorization': 'Bearer your_plain_text_token_here' } }) ``` ## Do you want to use JWTs? JWT authentication integrates with existing identity providers and follows standard token-based authentication patterns. Use JWTs when: - You have an existing identity provider (Auth0, Okta, etc.) - You need user-specific authentication and authorization - You want standard OAuth 2.0 / OpenID Connect flows ### How JWT Works Moose validates JWT tokens using RS256 algorithm with your identity provider's public key. You configure the expected issuer and audience, and Moose handles token verification automatically. ### Step 1: Configure JWT Settings #### Option A: Configure in `moose.config.toml` ```toml filename=moose.config.toml [jwt] # Your JWT public key (PEM-formatted RSA public key) secret = """ -----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAy... -----END PUBLIC KEY----- """ # Expected JWT issuer issuer = "https://my-auth-server.com/" # Expected JWT audience audience = "my-moose-app" ``` The `secret` field should contain your JWT **public key** used to verify signatures using RS256 algorithm. #### Option B: Configure with Environment Variables You can also set these values as environment variables: ```bash filename=".env" copy MOOSE_JWT_PUBLIC_KEY=your_jwt_public_key # PEM-formatted RSA public key (overrides `secret` in `moose.config.toml`) MOOSE_JWT_ISSUER=your_jwt_issuer # Expected JWT issuer (overrides `issuer` in `moose.config.toml`) MOOSE_JWT_AUDIENCE=your_jwt_audience # Expected JWT audience (overrides `audience` in `moose.config.toml`) ``` ### Step 2: Make Authenticated Requests Send requests with the JWT token in the `Authorization` header: ```bash # Using curl curl -H "Authorization: Bearer your_jwt_token_here" \ https://your-moose-instance.com/ingest/YourDataModel # Using JavaScript fetch('https://your-moose-instance.com/api/endpoint', { headers: { 'Authorization': 'Bearer your_jwt_token_here' } }) ``` ## Want to use both? Here's the caveats You can configure both JWT and API Key authentication simultaneously. When both are configured, Moose's authentication behavior depends on the `enforce_on_all_*` flags. ### Understanding Authentication Priority #### Default Behavior (No Enforcement) By default, when both JWT and API Keys are configured, Moose tries JWT validation first, then falls back to API Key validation: ```toml filename="moose.config.toml" [jwt] # JWT configuration secret = "..." issuer = "https://my-auth-server.com/" audience = "my-moose-app" # enforce flags default to false ``` ```bash filename=".env" # API Key configuration MOOSE_INGEST_API_KEY='your_pbkdf2_hmac_sha256_hashed_key' MOOSE_CONSUMPTION_API_KEY='your_pbkdf2_hmac_sha256_hashed_key' ``` **For Ingest Endpoints (`/ingest/*`)**: - Attempts JWT validation first (RS256 signature check) - Falls back to API Key validation (PBKDF2 HMAC SHA256) if JWT fails **For Analytics Endpoints (`/api/*`)**: - Same fallback behavior as ingest endpoints This allows you to use either authentication method for your clients. #### Enforcing JWT Only If you want to **only** accept JWT tokens (no API key fallback), set the enforcement flags: ```toml filename="moose.config.toml" [jwt] secret = "..." issuer = "https://my-auth-server.com/" audience = "my-moose-app" # Only accept JWT, no API key fallback enforce_on_all_ingest_apis = true enforce_on_all_consumptions_apis = true ``` **Result**: When enforcement is enabled, API Key authentication is disabled even if the environment variables are set. Only valid JWT tokens will be accepted. ### Common Use Cases #### Use Case 1: Different Auth for Different Endpoints Configure JWT for user-facing analytics endpoints, API keys for internal ingestion: ```toml filename="moose.config.toml" [jwt] secret = "..." issuer = "https://my-auth-server.com/" audience = "my-moose-app" enforce_on_all_consumptions_apis = true # JWT only for /api/* enforce_on_all_ingest_apis = false # Allow fallback for /ingest/* ``` ```bash filename=".env" MOOSE_INGEST_API_KEY='hashed_key_for_internal_services' ``` #### Use Case 2: Migration from API Keys to JWT Start with both configured, no enforcement. Gradually migrate clients to JWT. Once complete, enable enforcement: ```toml filename="moose.config.toml" [jwt] secret = "..." issuer = "https://my-auth-server.com/" audience = "my-moose-app" # Start with both allowed during migration enforce_on_all_ingest_apis = false enforce_on_all_consumptions_apis = false # Later, enable to complete migration # enforce_on_all_ingest_apis = true # enforce_on_all_consumptions_apis = true ``` ### Admin Endpoints Admin endpoints use API key authentication exclusively (configured separately from ingest/analytics endpoints). **Configuration precedence** (highest to lowest): 1. `--token` CLI parameter (plain-text token) 2. `MOOSE_ADMIN_TOKEN` environment variable (plain-text token) 3. `admin_api_key` in `moose.config.toml` (hashed token) **Example:** ```bash # Option 1: CLI parameter moose remote plan --token your_plain_text_token # Option 2: Environment variable export MOOSE_ADMIN_TOKEN='your_plain_text_token' moose remote plan # Option 3: Config file # In moose.config.toml: # [authentication] # admin_api_key = "your_pbkdf2_hmac_sha256_hashed_key" ``` ## Security Best Practices - **Never commit plain-text tokens to version control** - Always use hashed keys in configuration files - **Use environment variables for production** - Keep secrets out of your codebase - **Generate unique tokens for different environments** - Separate development, staging, and production credentials - **Rotate tokens regularly** - Especially for long-running production deployments - **Choose the right method for your use case**: - Use **API Keys** for internal services and getting started - Use **JWT** when integrating with identity providers or need user-level auth - **Store hashed keys safely** - The PBKDF2 HMAC SHA256 hash in `moose.config.toml` is safe to version control, but the plain-text token should only exist in secure environment variables or secret management systems Never commit plain-text tokens to version control. Use hashed keys in configuration files and environment variables for production. --- ## Ingestion APIs Source: moose/apis/ingest-api.mdx Ingestion APIs for Moose # Ingestion APIs ## Overview Moose Ingestion APIs are the entry point for getting data into your Moose application. They provide a fast, reliable, and type-safe way to move data from your sources into streams and tables for analytics and processing. ## When to Use Ingestion APIs Ingestion APIs are most useful when you want to implement a push-based pattern for getting data from your data sources into your streams and tables. Common use cases include: - Instrumenting external client applications - Receiving webhooks from third-party services - Integrating with ETL or data pipeline tools that push data ## Why Use Moose's APIs Over Your Own? Moose's ingestion APIs are purpose-built for high-throughput data pipelines, offering key advantages over other more general-purpose frameworks: - **Built-in schema validation:** Ensures only valid data enters your pipeline. - **Direct connection to streams/tables:** Instantly link HTTP endpoints to Moose data infrastructure to route incoming data to your streams and tables without any glue code. - **Dead Letter Queue (DLQ) support:** Invalid records are automatically captured for review and recovery. - **OpenAPI auto-generation:** Instantly generate client SDKs and docs for all endpoints, including example data. - **Rust-powered performance:** Far higher throughput and lower latency than typical Node.js or Python APIs. ## Validation Moose validates all incoming data against your Pydantic model. If a record fails validation, Moose can automatically route it to a Dead Letter Queue (DLQ) for later inspection and recovery. ```python filename="ValidationExample.py" copy from moose_lib import IngestPipeline, IngestPipelineConfig, IngestConfig from pydantic import BaseModel class Properties(BaseModel): device: Optional[str] version: Optional[int] class ExampleModel(BaseModel): id: str userId: str timestamp: datetime properties: Properties api = IngestApi[ExampleModel]("your-api-route", IngestConfig( destination=Stream[ExampleModel]("your-stream-name"), dead_letter_queue=DeadLetterQueue[ExampleModel]("your-dlq-name") )) ``` If your IngestPipeline’s schema marks a field as optional but annotates a ClickHouse default, Moose treats: - API request and Stream message: field is optional (you may omit it) - ClickHouse table storage: field is required with a DEFAULT clause Behavior: When the API/stream inserts into ClickHouse and the field is missing, ClickHouse sets it to the configured default value. This keeps request payloads simple while avoiding Nullable columns in storage. Example: `Annotated[int, clickhouse_default("18")]` (or equivalent annotation) Send a valid event - routed to the destination stream ```python filename="ValidEvent.py" copy requests.post("http://localhost:4000/ingest/your-api-route", json={ "id": "event1", "userId": "user1", "timestamp": "2023-05-10T15:30:00Z" }) # ✅ Accepted and routed to the destination stream # API returns 200 and { success: true } ``` Send an invalid event (missing required field) - routed to the DLQ ```python filename="InvalidEventMissingField.py" copy requests.post("http://localhost:4000/ingest/your-api-route", json={ "id": "event1", }) # ❌ Routed to DLQ, because it's missing a required field # API returns 400 response ``` Send an invalid event (bad date format) - routed to the DLQ ```python filename="InvalidEventBadDate.py" copy requests.post("http://localhost:4000/ingest/your-api-route", json={ "id": "event1", "userId": "user1", "timestamp": "not-a-date" }) # ❌ Routed to DLQ, because the timestamp is not a valid date # API returns 400 response ``` ## Creating Ingestion APIs You can create ingestion APIs in two ways: - **High-level:** Using the `IngestPipeline` class (recommended for most use cases) - **Low-level:** Manually configuring the `IngestApi` component for more granular control ### High-level: IngestPipeline (Recommended) The `IngestPipeline` class provides a convenient way to set up ingestion endpoints, streams, and tables with a single declaration: ```python filename="IngestPipeline.py" copy from moose_lib import Key, IngestPipeline, IngestPipelineConfig from pydantic import BaseModel class ExampleSchema(BaseModel): id: Key[str] name: str value: int timestamp: datetime example_pipeline = IngestPipeline[ExampleSchema]( name="example-name", config=IngestPipelineConfig( ingest_api=True, stream=True, table=True ) ) ``` ### Low-level: Standalone IngestApi For more granular control, you can manually configure the `IngestApi` component: The types of the destination `Stream` and `Table` must match the type of the `IngestApi`. ## Configuration Reference Configuration options for both high-level and low-level ingestion APIs are provided below. ```python filename="IngestPipelineConfig.py" copy class IngestPipelineConfig(BaseModel): table: bool | OlapConfig = True stream: bool | StreamConfig = True ingest_api: bool | IngestConfig = True dead_letter_queue: bool | StreamConfig = True version: Optional[str] = None metadata: Optional[dict] = None life_cycle: Optional[LifeCycle] = None ``` ```python filename="IngestConfig.py" copy @dataclass class IngestConfigWithDestination[T: BaseModel]: destination: Stream[T] dead_letter_queue: Optional[DeadLetterQueue[T]] = None version: Optional[str] = None metadata: Optional[dict] = None ``` --- ## OpenAPI SDK Generation Source: moose/apis/openapi-sdk.mdx Generate type-safe client SDKs from your Moose APIs # OpenAPI SDK Generation Moose automatically generates OpenAPI specifications for all your APIs, enabling you to create type-safe client SDKs in any language. This allows you to integrate your Moose APIs into any application with full type safety and IntelliSense support. ## Overview While `moose dev` is running, Moose emits an OpenAPI spec at `.moose/openapi.yaml` covering: - **Ingestion endpoints** with request/response schemas - **Analytics APIs** with query parameters and response types Every time you make a change to your Moose APIs, the OpenAPI spec is updated automatically. ## Generating Typed SDKs from OpenAPI You can use your preferred generator to create a client from that spec. Below are minimal, tool-agnostic examples you can copy into your project scripts. ### Setup The following example uses `openapi-python-client` to generate the SDK. Follow the setup instructions here: [openapi-python-client on PyPI](https://pypi.org/project/openapi-python-client/). Add a generation script in your repository: ```bash filename="scripts/generate_python_sdk.sh" copy #!/usr/bin/env bash set -euo pipefail openapi-python-client generate --path .moose/openapi.yaml --output ./generated/python --overwrite ``` Then configure Moose to run it after each dev reload: ```toml filename="moose.config.toml" copy [http_server_config] on_reload_complete_script = "bash scripts/generate_python_sdk.sh" ``` This will regenerate the Python client from the live spec on every reload. ### Hooks for automatic SDK generation The `on_reload_complete_script` hook is available in your `moose.config.toml` file. It runs after each dev server reload when code/infra changes have been fully applied. This allows you to keep your SDKs continuously up to date as you make changes to your Moose APIs. Notes: - The script runs in your project root using your `$SHELL` (falls back to `/bin/sh`). - Paths like `.moose/openapi.yaml` and `./generated/...` are relative to the project root. - You can combine multiple generators with `&&` (as shown) or split into a shell script if preferred. These hooks only affect local development (`moose dev`). The reload hook runs after Moose finishes applying your changes, ensuring `.moose/openapi.yaml` is fresh before regeneration. ## Integration Import from the output path your generator writes to (see your chosen example repo). The Moose side is unchanged: the spec lives at `.moose/openapi.yaml` during `moose dev`. ## Generators Use any OpenAPI-compatible generator: ### TypeScript projects - [OpenAPI Generator (typescript-fetch)](https://github.com/OpenAPITools/openapi-generator) — mature, broad options; generates Fetch-based client - [Kubb](https://github.com/kubb-project/kubb) — generates types + fetch client with simple config - [Orval](https://orval.dev/) — flexible output (client + schemas), good DX - [openapi-typescript](https://github.com/openapi-ts/openapi-typescript) — generates types only (pair with your own client) - [swagger-typescript-api](https://github.com/acacode/swagger-typescript-api) — codegen for TS clients from OpenAPI - [openapi-typescript-codegen](https://github.com/ferdikoomen/openapi-typescript-codegen) — TS client + models - [oazapfts](https://github.com/oazapfts/oazapfts) — minimal TS client based on fetch - [openapi-zod-client](https://github.com/astahmer/openapi-zod-client) — Zod schema-first client generation ### Python projects - [openapi-python-client](https://pypi.org/project/openapi-python-client/) — modern typed client for OpenAPI 3.0/3.1 - [OpenAPI Generator (python)](https://github.com/OpenAPITools/openapi-generator) — multiple Python generators (python, python-nextgen) --- ## Trigger APIs Source: moose/apis/trigger-api.mdx Create APIs that trigger workflows and other processes # Trigger APIs ## Overview You can create APIs to initiate workflows, data processing jobs, or other automated processes. ## Basic Usage ```python filename="app/apis/trigger_workflow.py" copy from moose_lib import MooseClient, Api from pydantic import BaseModel, Field from datetime import datetime class WorkflowParams(BaseModel): input_value: str priority: str = Field(default="normal") class WorkflowResponse(BaseModel): workflow_id: str status: str def run(params: WorkflowParams, client: MooseClient) -> WorkflowResponse: # Trigger the workflow with input parameters workflow_execution = client.workflow.execute( workflow="data-processing", params={ "input_value": params.input_value, "priority": params.priority, "triggered_at": datetime.now().isoformat() } ) return WorkflowResponse( workflow_id=workflow_execution.id, status="started" ) api = Api[WorkflowParams, WorkflowResponse]("trigger-workflow", run) ``` ## Using the Trigger API Once deployed, you can trigger workflows via HTTP requests: ```bash filename="Terminal" copy curl "http://localhost:4000/api/trigger-workflow?inputValue=process-user-data&priority=high" ``` Response: ```json { "workflowId": "workflow-12345", "status": "started" } ``` --- ## changelog Source: moose/changelog.mdx ReleaseHighlights, Added, Changed, Deprecated, Fixed, Security, BreakingChanges } from "@/components/changelog-category"; # Moose Changelog ## What is this page? This changelog tracks all meaningful changes to Moose. Each entry includes the PR link and contributor credit, organized by date (newest first). Use this page to stay informed about new features, fixes, and breaking changes that might affect your projects. ## How to read this changelog:
Key features, enhancements, or fixes for each release. New features and capabilities. Updates to existing functionality or improvements. Features that are no longer recommended for use and may be removed in the future. Bug fixes and reliability improvements. Changes that require user action or may break existing usage.
--- ## 2025-08-21 - **Analytics API Standardization** — Standardized naming of classes and utilities for analytics APIs - **Complete S3Queue Engine Support** — Full implementation of ClickHouse S3Queue engine with comprehensive parameter support, modular architecture, and generic settings framework. - **S3Queue Engine Configuration** — Added complete support for all ClickHouse S3Queue engine parameters including authentication (`aws_access_key_id`, `aws_secret_access_key`), compression, custom headers, and NOSIGN for public buckets. [PR #2674](https://github.com/514-labs/moosestack/pull/2674) - **Generic Settings Framework** — Introduced a flexible settings system that allows any engine to use configuration settings, laying groundwork for future engine implementations. - **Enhanced Documentation** — Added comprehensive documentation for OlapTable S3Queue configuration in both TypeScript and Python SDKs. - **Improved Architecture** — Moved ClickHouse-specific types from core infrastructure to ClickHouse module for better separation of concerns. - **Settings Location** — Engine-specific settings are now properly encapsulated within their respective engine configurations (e.g., `s3QueueEngineConfig.settings` for S3Queue). - **API Consistency** — Unified configuration APIs across TypeScript and Python SDKs for S3Queue engine. - **Compilation Issues** — Fixed struct patterns to handle new S3Queue parameter structure correctly. - **Diff Strategy** — Enhanced diff strategy to properly handle S3Queue parameter changes. - `ConsumptionApi` renamed to `Api` - `EgressConfig` renamed to `ApiConfig` - `ConsumptionUtil` renamed to `ApiUtil` - `ConsumptionHelpers` renamed to `ApiHelpers` *[#2676](https://github.com/514-labs/moosestack/pull/2676) by [camelCasedAditya](https://github.com/camelCasedAditya)* --- ## 2025-08-20 - **Improved IngestPipeline API Clarity** — The confusing `ingest` parameter has been renamed to `ingestApi` (TypeScript) and `ingest_api` (Python) for better clarity. The old parameter names are still supported with deprecation warnings. - **IngestPipeline Parameter Renamed** — The `ingest` parameter in IngestPipeline configurations has been renamed for clarity: - **TypeScript**: `ingest: true` → `ingestApi: true` - **Python**: `ingest=True` → `ingest_api=True` The old parameter names continue to work with deprecation warnings to ensure backwards compatibility. *[Current PR]* - **IngestPipeline `ingest` parameter** — The `ingest` parameter in IngestPipeline configurations is deprecated: - **TypeScript**: Use `ingestApi` instead of `ingest` - **Python**: Use `ingest_api` instead of `ingest` The old parameter will be removed in a future major version. Please update your code to use the new parameter names. *[Current PR]* None - Full backwards compatibility maintained --- ## 2025-06-12 - **Enhanced TypeScript Workflow Types** — Improved type safety for Tasks with optional input/output parameters, supporting `null` types for better flexibility. - TypeScript workflow Task types now properly support optional input/output with `null` types, enabling more flexible task definitions like `Task` and `Task`. *[#2442](https://github.com/514-labs/moose/pull/2442) by [DatGuyJonathan](https://github.com/DatGuyJonathan)* None --- ## 2025-06-10 - **OlapTable Direct Insert API** — New comprehensive insert API with advanced error handling, typia validation, and multiple failure strategies. Enables direct data insertion into ClickHouse tables with production-ready reliability features. - **Python Workflows V2** — Replaces static file-based routing with explicit `Task` and `Workflow` classes, enabling dynamic task composition and programmatic workflow orchestration. No more reliance on `@task` decorators or file naming conventions. - OlapTable direct insert API with `insert()` method supporting arrays and Node.js streams. Features comprehensive typia-based validation, three error handling strategies (`fail-fast`, `discard`, `isolate`), configurable error thresholds, memoized ClickHouse connections, and detailed insertion results with failed record tracking. *[#2437](https://github.com/514-labs/moose/pull/2437) by [callicles](https://github.com/callicles)* - Enhanced typia validation integration for OlapTable and IngestPipeline with `validateRecord()`, `isValidRecord()`, and `assertValidRecord()` methods providing compile-time type safety and runtime validation. *[#2437](https://github.com/514-labs/moose/pull/2437) by [callicles](https://github.com/callicles)* - Python Workflows V2 with `Task[InputType, OutputType]` and `Workflow` classes for dynamic workflow orchestration. Replaces the legacy `@task` decorator approach with explicit task definitions, enabling flexible task composition, type-safe chaining via `on_complete`, retries, timeouts, and scheduling with cron expressions. *[#2439](https://github.com/514-labs/moose/pull/2439) by [DatGuyJonathan](https://github.com/DatGuyJonathan)* None --- ## 2025-06-06 - **TypeScript Workflows V2** — Replaces static file-based routing with explicit `Task` and `Workflow` classes, enabling dynamic task composition and programmatic workflow orchestration. No more reliance on file naming conventions for task execution order. - TypeScript Workflows V2 with `Task` and `Workflow` classes for dynamic workflow orchestration. Replaces the legacy file-based routing approach with explicit task definitions, enabling flexible task composition, type-safe chaining via `onComplete`, configurable retries and timeouts, and flexible scheduling with cron expressions. *[#2421](https://github.com/514-labs/moose/pull/2421) by [DatGuyJonathan](https://github.com/DatGuyJonathan)* None --- ## 2025-05-23 - **TypeScript `DeadLetterQueue` support** — Handle failed streaming function messages with type-safe dead letter queues in TypeScript. - **Improved Python `DeadLetterModel` API** — Renamed `as_t` to `as_typed` for better clarity. - TypeScript `DeadLetterQueue` class with type guards and transform methods for handling failed streaming function messages with full type safety. *[#2356](https://github.com/514-labs/moose/pull/2356) by [phiSgr](https://github.com/phiSgr)* - Renamed `DeadLetterModel.as_t()` to `DeadLetterModel.as_typed()` in Python for better API clarity and consistency. *[#2356](https://github.com/514-labs/moose/pull/2356) by [phiSgr](https://github.com/phiSgr)* - `DeadLetterModel.as_t()` method renamed to `as_typed()` in Python. Update your code to use the new method name. *[#2356](https://github.com/514-labs/moose/pull/2356) by [phiSgr](https://github.com/phiSgr)* --- ## 2025-05-22 - **Refactored CLI 'peek' command** — Now supports peeking into both tables and streams with unified parameters. - **Simplified CLI experience** — Removed unused commands and routines for a cleaner interface. - Updated CLI 'peek' command to use a unified 'name' parameter and new flags (`--table`, `--stream`) to specify resource type. Default is table. Documentation updated to match. *[#2361](https://github.com/514-labs/moose/pull/2361) by [callicles](https://github.com/callicles)* - Removed unused CLI commands and routines including `Function`, `Block`, `Consumption`, `DataModel`, and `Import`. CLI is now simpler and easier to maintain. *[#2360](https://github.com/514-labs/moose/pull/2360) by [callicles](https://github.com/callicles)* None --- ## 2025-05-21 - **Infrastructure state sync** — Auto-syncs DB state before changes, handling manual modifications and failed DDL runs. - **Fixed nested data type support** — Use objects and arrays in your Moose models. - State reconciliation for infrastructure planning — Moose now checks and updates its in-memory infra map to match the real database state before planning changes. Makes infra planning robust to manual DB changes and failed runs. *[#2341](https://github.com/514-labs/moose/pull/2341) by [callicles](https://github.com/callicles)* - Handling of nested data structures in Moose models for correct support of complex objects and arrays. *[#2357](https://github.com/514-labs/moose/pull/2357) by [georgevanderson](https://github.com/georgevanderson)* None --- ## 2025-05-27 - **IPv4 and IPv6 Type Support** — Added native support for IP address types in ClickHouse data models, enabling efficient storage and querying of network data. - IPv4 and IPv6 data types for ClickHouse integration, supporting native IP address storage and operations. *[#2373](https://github.com/514-labs/moose/pull/2373) by [phiSgr](https://github.com/phiSgr)* - Enhanced type parser to handle IP address types across the Moose ecosystem. *[#2374](https://github.com/514-labs/moose/pull/2374) by [phiSgr](https://github.com/phiSgr)* None --- ## 2025-05-20 - **ClickHouse `Date` type support** — Store and query native date values in your schemas. - ClickHouse `Date` column support for native date types in Moose schemas and ingestion. *[#2352](https://github.com/514-labs/moose/pull/2352), [#2351](https://github.com/514-labs/moose/pull/2351) by [phiSgr](https://github.com/phiSgr)* None --- ## 2025-05-19 - **Metadata map propagation** — Metadata is now tracked and available in the infra map for both Python and TypeScript. Improves LLM accuracy and reliability when working with Moose objects. - Metadata map propagation to infra map for consistent tracking and availability in both Python and TypeScript. *[#2326](https://github.com/514-labs/moose/pull/2326) by [georgevanderson](https://github.com/georgevanderson)* None --- ## 2025-05-16 - **New `list[str]` support to Python `AggregateFunction`** — Enables more flexible aggregation logic in Materialized Views. - **Python `DeadLetterQueue[T]` alpha release** — Automatically route exceptions to a dead letter queue in streaming functions. - AggregateFunction in Python now accepts `list[str]` for more expressive and type-safe aggregations. *[#2321](https://github.com/514-labs/moose/pull/2321) by [phiSgr](https://github.com/phiSgr)* - Python dead letter queues for handling and retrying failed messages in Python streaming functions. *[#2324](https://github.com/514-labs/moose/pull/2324) by [phiSgr](https://github.com/phiSgr)* None --- ## 2025-05-15 - **Hotfix — casing fix for `JSON` columns in TypeScript. - TypeScript JSON columns to have consistent casing, avoiding confusion and errors in your code. *[#2320](https://github.com/514-labs/moose/pull/2320) by [phiSgr](https://github.com/phiSgr)* None --- ## 2025-05-14 - **Introduced TypeScript JSON columns** — Use `Record` for type-safe JSON fields. - **Ingestion config simplified** — Less config needed for ingestion setup. - **Python `enum` support improved** — More robust data models. - TypeScript ClickHouse JSON columns to use `Record` for type-safe JSON fields. *[#2317](https://github.com/514-labs/moose/pull/2317) by [phiSgr](https://github.com/phiSgr)* - Pydantic mixin for parsing integer enums by name for more robust Python data models. *[#2316](https://github.com/514-labs/moose/pull/2316) by [phiSgr](https://github.com/phiSgr)* - Better Python enum handling in data models for easier enum usage. *[#2315](https://github.com/514-labs/moose/pull/2315) by [phiSgr](https://github.com/phiSgr)* - `IngestionFormat` from `IngestApi` config for simpler ingestion setup. *[#2306](https://github.com/514-labs/moose/pull/2306) by [georgevanderson](https://github.com/georgevanderson)* None --- ## 2025-05-13 - **New `refresh` CLI command** — Quickly reload data and schema changes from changes applied directly to your database outside of Moose. - **Python: `LowCardinality` type support** — Better performance for categorical data. - `refresh` command to reload data and schema with a single command. *[#2309](https://github.com/514-labs/moose/pull/2309) by [phiSgr](https://github.com/phiSgr)* - Python support for `LowCardinality(T)` to improve performance for categorical columns. *[#2313](https://github.com/514-labs/moose/pull/2313) by [phiSgr](https://github.com/phiSgr)* None --- ## 2025-05-10 - **Dependency-based execution order for Materialized Views** — Reduces migration errors and improves reliability. - Order changes for materialized views based on dependency to ensure correct execution order for dependent changes. *[#2294](https://github.com/514-labs/moose/pull/2294) by [callicles](https://github.com/callicles)* None --- ## 2025-05-07 - **Python `datetime64` support** - Enables more precise datetime handling in Python data models. - **Type mapping in Python `QueryClient`** - Automatically maps ClickHouse query result rows to the correct Pydantic model types. - Row parsing in QueryClient with type mapping for Python. *[#2299](https://github.com/514-labs/moose/pull/2299) by [phiSgr](https://github.com/phiSgr)* - `datetime64` parsing and row parsing in QueryClient for more reliable data handling in Python. *[#2299](https://github.com/514-labs/moose/pull/2299) by [phiSgr](https://github.com/phiSgr)* None --- ## 2025-05-06 - **`uint` type support in TypeScript** — Enables type safety for unsigned integer fields in Typescript data models. - uint type support in TypeScript for unsigned integers in Moose models. *[#2295](https://github.com/514-labs/moose/pull/2295) by [phiSgr](https://github.com/phiSgr)* None --- ## 2025-05-01 - **Explicit dependency tracking for materialized views** — Improves data lineage, migration reliability, and documentation. - Explicit dependency tracking for materialized views to make migrations and data lineage more robust and easier to understand. *[#2282](https://github.com/514-labs/moose/pull/2282) by [callicles](https://github.com/callicles)* - Required `selectTables` field in `MaterializedView` config that must specify an array of `OlapTable` objects for the source tables. *[#2282](https://github.com/514-labs/moose/pull/2282) by [callicles](https://github.com/callicles)* --- ## 2025-04-30 - **More flexible `JSON_ARRAY` configuration for `IngestApi`** — Now accepts both arrays and single elements. Default config is now `JSON_ARRAY`. - **Python rich ClickHouse type support** — Added support for advanced types in Python models: - `Decimal`: `clickhouse_decimal(precision, scale)` - `datetime` with precision: `clickhouse_datetime64(precision)` - `date`: `date` - `int` with size annotations: `Annotated[int, 'int8']`, `Annotated[int, 'int32']`, etc. - `UUID`: `UUID` - `JSON_ARRAY` to allow both array and single element ingestion for more flexible data handling. *[#2285](https://github.com/514-labs/moose/pull/2285) by [phiSgr](https://github.com/phiSgr)* - Python rich ClickHouse type support with: - `Decimal`: `clickhouse_decimal(precision, scale)` - `datetime` with precision: `clickhouse_datetime64(precision)` - `date`: `date` - `int` with size annotations: `Annotated[int, 'int8']`, `Annotated[int, 'int32']`, etc. - `UUID`: `UUID` for more expressive data modeling. *[#2284](https://github.com/514-labs/moose/pull/2284) by [phiSgr](https://github.com/phiSgr)* None --- ## Configuration Source: moose/configuration.mdx Configuration for Moose # Project Configuration The `moose.config.toml` file is the primary way to configure all Moose infrastructure including ClickHouse, Redpanda, Redis, Temporal, and HTTP servers. Do not use docker-compose overrides to modify Moose-managed services. See [Development Mode](/moose/local-dev#extending-docker-infrastructure) for guidelines on when to use docker-compose extensions. ```toml # Programming language used in the project (`Typescript` or `Python`) language = "Typescript" # Map of supported old versions and their locations (Default: {}) # supported_old_versions = { "0.1.0" = "path/to/old/version" } #Telemetry configuration for usage tracking and metrics [telemetry] # Whether telemetry collection is enabled enabled = true # Whether to export metrics to external systems export_metrics = true # Flag indicating if the user is a Moose developer is_moose_developer = false # Redpanda streaming configuration (also aliased as `kafka_config`) [redpanda_config] # Broker connection string (e.g., "host:port") (Default: "localhost:19092") broker = "localhost:19092" # Confluent Schema Registry URL (optional) # schema_registry_url = "http://localhost:8081" # Message timeout in milliseconds (Default: 1000) message_timeout_ms = 1000 # Default retention period in milliseconds (Default: 30000) retention_ms = 30000 # Replication factor for topics (Default: 1) replication_factor = 1 # SASL username for authentication (Default: None) # sasl_username = "user" # SASL password for authentication (Default: None) # sasl_password = "password" # SASL mechanism (e.g., "PLAIN", "SCRAM-SHA-256") (Default: None) # sasl_mechanism = "PLAIN" # Security protocol (e.g., "SASL_SSL", "PLAINTEXT") (Default: None) # security_protocol = "SASL_SSL" # Namespace for topic isolation (Default: None) # namespace = "my_namespace" # ClickHouse database configuration [clickhouse_config] # Database name (Default: "local") db_name = "local" # ClickHouse user (Default: "panda") user = "panda" # ClickHouse password (Default: "pandapass") password = "pandapass" # Whether to use SSL for connection (Default: false) use_ssl = false # ClickHouse host (Default: "localhost") host = "localhost" # ClickHouse HTTP port (Default: 18123) host_port = 18123 # ClickHouse native protocol port (Default: 9000) native_port = 9000 # Optional host path to mount as the ClickHouse data volume (uses Docker volume if None) (Default: None) # host_data_path = "/path/on/host/clickhouse_data" # Optional list of additional databases to create on startup (Default: []) # additional_databases = ["analytics", "staging"] # HTTP server configuration for local development [http_server_config] # Host to bind the webserver to (Default: "localhost") host = "localhost" # Port for the main API server (Default: 4000) port = 4000 # Port for the management server (Default: 5001) management_port = 5001 # Optional path prefix for all routes (Default: None) # path_prefix = "api" # Number of worker processes for consumption API cluster (TypeScript only) (Default: Auto-calculated - 70% of CPU cores) # Python projects always use 1 worker regardless of this setting # api_workers = 2 # Redis configuration [redis_config] # Redis connection URL (Default: "redis://127.0.0.1:6379") url = "redis://127.0.0.1:6379" # Namespace prefix for all Redis keys (Default: "MS") key_prefix = "MS" # Git configuration [git_config] # Name of the main branch (Default: "main") main_branch_name = "main" # Temporal workflow configuration [temporal_config] # Temporal database user (Default: "temporal") db_user = "temporal" # Temporal database password (Default: "temporal") db_password = "temporal" # Temporal database port (Default: 5432) db_port = 5432 # Temporal server host (Default: "localhost") temporal_host = "localhost" # Temporal server port (Default: 7233) temporal_port = 7233 # Temporal server scheme - "http" or "https" (Default: auto-detect based on host) # temporal_scheme = "https" # Temporal server version (Default: "1.22.3") temporal_version = "1.22.3" # Temporal admin tools version (Default: "1.22.3") admin_tools_version = "1.22.3" # Temporal UI version (Default: "2.21.3") ui_version = "2.21.3" # Temporal UI port (Default: 8080) ui_port = 8080 # Temporal UI CORS origins (Default: "http://localhost:3000") ui_cors_origins = "http://localhost:3000" # Temporal dynamic config path (Default: "config/dynamicconfig/development-sql.yaml") config_path = "config/dynamicconfig/development-sql.yaml" # PostgreSQL version for Temporal database (Default: "13") postgresql_version = "13" # Path to Temporal client certificate (mTLS) (Default: "") client_cert = "" # Path to Temporal client key (mTLS) (Default: "") client_key = "" # Path to Temporal CA certificate (mTLS) (Default: "") ca_cert = "" # API key for Temporal Cloud connection (Default: "") api_key = "" # JWT (JSON Web Token) authentication configuration (Optional) [jwt] # Enforce JWT on all consumption APIs (Default: false) enforce_on_all_consumptions_apis = false # Enforce JWT on all ingestion APIs (Default: false) enforce_on_all_ingest_apis = false # Secret key for JWT signing (Required if jwt section is present) # secret = "your-jwt-secret" # JWT issuer (Required if jwt section is present) # issuer = "your-issuer-name" # JWT audience (Required if jwt section is present) # audience = "your-audience-name" # General authentication configuration [authentication] # Optional hashed admin API key for auth (Default: None) # admin_api_key = "hashed_api_key" # Migration configuration [migration_config] # Operations to ignore during migration plan generation and drift detection # Useful for managing TTL changes outside of Moose or when you don't want # migration failures due to TTL drift # ignore_operations = ["ModifyTableTtl", "ModifyColumnTtl"] # Feature flags [features] # Enable the streaming engine (Default: true) streaming_engine = true # Enable Temporal workflows (Default: false) workflows = false # Enable OLAP database (Default: true) olap = true # Enable Analytics APIs server (Default: true) apis = true ``` --- ## Data Modeling Source: moose/data-modeling.mdx Data Modeling for Moose # Data Modeling ## Overview In Moose, data models are just Pydantic models that become the authoritative source for your infrastructure schemas. Data Models are used to define: - [OLAP Tables and Materialized Views](/moose/olap) (automatically generated DDL) - [Redpanda/Kafka Streams](/moose/streaming) (schema registry and topic validation) - [API Contracts](/moose/apis) (request/response validation and OpenAPI specs) - [Workflow Task Input and Output Types](/moose/workflows) (typed function inputs/outputs) ## Philosophy ### Problem: Analytical Backends are Prone to Schema Drift Analytical backends are unique in that they typically have to coordinate schemas across multiple systems that each have their own type systems and constraints. Consider a typical pipeline for ingesting events into a ClickHouse table. ```python # What you're building: # API endpoint → Kafka topic → ClickHouse table → Analytics API # Traditional approach: Define schema 4 times # 1. API validation with Pydantic class APIEvent(BaseModel): user_id: str event_type: Literal["click", "view", "purchase"] timestamp: datetime # 2. Kafka schema registration kafka_schema = { "type": "record", "fields": [ {"name": "user_id", "type": "string"}, {"name": "event_type", "type": "string"}, {"name": "timestamp", "type": "string"} ] } # 3. ClickHouse DDL # CREATE TABLE events ( # user_id String, # event_type LowCardinality(String), # timestamp DateTime # ) ENGINE = MergeTree() # 4. Analytics API response class EventsResponse(BaseModel): user_id: str event_type: str timestamp: datetime ``` **The Problem:** When you add a field or change a type, you must update it in multiple places. Miss one, and you get: - Silent data loss (Kafka → ClickHouse sync fails) - Runtime errors - Data quality issues (validation gaps) ### Solution: Model In Code, Reuse Everywhere With Moose you define your schemas in native language types with optional metadata. This lets you reuse your schemas across multiple systems: ```python filename="app/main.py" # 1. Define your schema (WHAT your data looks like) class MyFirstDataModel(BaseModel): id: Key[str] some_string: Annotated[str, "LowCardinality"] some_number: int some_date: datetime some_json: Any # This single model can be reused across multiple systems: my_first_pipeline = IngestPipeline[MyFirstDataModel]("my_first_pipeline", IngestPipelineConfig( ingest_api=True, # POST API endpoint stream=True, # Kafka topic table=True # ClickHouse table )) ``` ## How It Works The key idea is leveraging Annotated types to extend base Python types with "metadata" that represents specific optimizations and details on how to map that type in ClickHouse: ```python from moose_lib import Key, clickhouse_decimal from typing import Annotated class Model(BaseModel): # Base type: str # ClickHouse: String with primary key id: Key[str] # Base type: Decimal # ClickHouse: Decimal(10,2) for precise money amount: clickhouse_decimal(10, 2) # Base type: str # ClickHouse: LowCardinality(String) for enums status: Annotated[str, "LowCardinality"] # Base type: datetime # ClickHouse: DateTime created_at: datetime events = OlapTable[Event]("events") # In your application code: tx = Event( id="id_123", amount=Decimal("99.99"), # Regular Decimal in Python status="completed", # Regular string in Python created_at=datetime.now() ) # In ClickHouse: # CREATE TABLE events ( # id String, # amount Decimal(10,2), # status LowCardinality(String), # created_at DateTime # ) ENGINE = MergeTree() # ORDER BY transaction_id ``` **The metadata annotations are compile-time only** - they don't affect your runtime code. Your application works with regular strings and numbers, while Moose uses the metadata to generate optimized infrastructure. ## Building Data Models: From Simple to Complex Let's walk through how to model data for different infrastructure components and see how types behave across them. ### Simple Data Model Shared Across Infrastructure A basic data model that works identically across all infrastructure components: ```python filename="app/datamodels/simple_shared.py" from pydantic import BaseModel from datetime import datetime class SimpleShared(BaseModel): id: str name: str value: float timestamp: datetime # This SAME model creates all infrastructure pipeline = IngestPipeline[SimpleShared]("simple_shared", IngestPipelineConfig( ingest_api=True, # Creates: POST /ingest/simple_shared stream=True, # Creates: Kafka topic table=True # Creates: ClickHouse table )) # The exact same types work everywhere: # - API validates: { "id": "123", "name": "test", "value": 42, "timestamp": "2024-01-01T00:00:00Z" } # - Kafka stores: { "id": "123", "name": "test", "value": 42, "timestamp": "2024-01-01T00:00:00Z" } # - ClickHouse table: id String, name String, value Float64, timestamp DateTime ``` **Key Point**: One model definition creates consistent schemas across all systems. ### Composite Types Shared Across Infrastructure Complex types including nested objects, arrays, and enums work seamlessly across all components: ```python filename="app/datamodels/composite_shared.py" from moose_lib import Key from pydantic import BaseModel from typing import List, Dict, Any, Optional, Literal from datetime import datetime class Metadata(BaseModel): category: str priority: float tags: List[str] class CompositeShared(BaseModel): id: Key[str] # Primary key status: Literal["active", "pending", "completed"] # Enum # Nested object metadata: Metadata # Arrays and maps values: List[float] attributes: Dict[str, Any] # Optional field description: Optional[str] = None created_at: datetime # Using in IngestPipeline - all types preserved pipeline = IngestPipeline[CompositeShared]("composite_shared", IngestPipelineConfig( ingest_api=True, stream=True, table=True )) # How the types map: # - API validates nested structure and enum values # - Kafka preserves the exact JSON structure # - ClickHouse creates: # - id String (with PRIMARY KEY) # - status Enum8('active', 'pending', 'completed') # - metadata.category String, metadata.priority Float64, metadata.tags Array(String) # - values Array(Float64) # - attributes String (JSON) # - description Nullable(String) # - created_at DateTime ``` **Key Point**: Complex types including nested objects and arrays work consistently across all infrastructure. ### ClickHouse-Specific Types (Standalone vs IngestPipeline) ClickHouse type annotations optimize database performance but are **transparent to other infrastructure**: ```python filename="app/datamodels/clickhouse_optimized.py" from moose_lib import Key, clickhouse_decimal, OlapTable, IngestPipeline, IngestPipelineConfig from typing import Annotated from pydantic import BaseModel from datetime import datetime class Details(BaseModel): name: str value: float class ClickHouseOptimized(BaseModel): id: Key[str] # ClickHouse-specific type annotations amount: clickhouse_decimal(10, 2) # Decimal(10,2) in ClickHouse category: Annotated[str, "LowCardinality"] # LowCardinality(String) in ClickHouse # Optimized nested type details: Annotated[Details, "ClickHouseNamedTuple"] # NamedTuple in ClickHouse timestamp: datetime # SCENARIO 1: Standalone OlapTable - gets all optimizations table = OlapTable[ClickHouseOptimized]("optimized_table", { "order_by_fields": ["id", "timestamp"] }) # Creates ClickHouse table with: # - amount Decimal(10,2) # - category LowCardinality(String) # - details Tuple(name String, value Float64) # SCENARIO 2: IngestPipeline - optimizations ONLY in ClickHouse pipeline = IngestPipeline[ClickHouseOptimized]("optimized_pipeline", IngestPipelineConfig( ingest_api=True, stream=True, table=True )) # What happens at each layer: # 1. API receives/validates: { "amount": "123.45", "category": "electronics", ... } # - Sees amount as str, category as str (annotations ignored) # 2. Kafka stores: { "amount": "123.45", "category": "electronics", ... } # - Plain JSON, no ClickHouse types # 3. ClickHouse table gets optimizations: # - amount stored as Decimal(10,2) # - category stored as LowCardinality(String) # - details stored as NamedTuple ``` **Key Point**: ClickHouse annotations are metadata that ONLY affect the database schema. Your application code and other infrastructure components see regular TypeScript/Python types. ### API Contracts with Runtime Validators APIs use runtime validation to ensure query parameters meet your requirements: ```python filename="app/apis/consumption_with_validation.py" from moose_lib import Api from pydantic import BaseModel, Field from datetime import datetime from typing import Optional, List # Query parameters with runtime validation class SearchParams(BaseModel): # Date range validation start_date: str = Field(..., regex="^\\d{4}-\\d{2}-\\d{2}$") # Must be YYYY-MM-DD end_date: str = Field(..., regex="^\\d{4}-\\d{2}-\\d{2}$") # Numeric constraints min_value: Optional[float] = Field(None, ge=0) # Optional, but if provided >= 0 max_value: Optional[float] = Field(None, le=1000) # Optional, but if provided <= 1000 # String validation category: Optional[str] = Field(None, min_length=2, max_length=50) # Pagination page: Optional[int] = Field(None, ge=1) limit: Optional[int] = Field(None, ge=1, le=100) # Response data model class SearchResult(BaseModel): id: str name: str value: float category: str timestamp: datetime # Create validated API endpoint async def search_handler(params: SearchParams, client: MooseClient) -> List[SearchResult]: # Params are already validated when this runs # Build a parameterized query safely clauses = [ "timestamp >= {startDate}", "timestamp <= {endDate}" ] params_dict = { "startDate": params.start_date, "endDate": params.end_date, "limit": params.limit or 10, "offset": ((params.page or 1) - 1) * (params.limit or 10) } if params.min_value is not None: clauses.append("value >= {minValue}") params_dict["minValue"] = params.min_value if params.max_value is not None: clauses.append("value <= {maxValue}") params_dict["maxValue"] = params.max_value if params.category is not None: clauses.append("category = {category}") params_dict["category"] = params.category where_clause = " AND ".join(clauses) query = f""" SELECT * FROM data_table WHERE {where_clause} LIMIT {limit} OFFSET {offset} """ results = await client.query.execute(query, params=params_dict) return [SearchResult(**row) for row in results] search_api = Api[SearchParams, List[SearchResult]]( "search", handler=search_handler ) # API Usage Examples: # ✅ Valid: GET /api/search?start_date=2024-01-01&end_date=2024-01-31 # ✅ Valid: GET /api/search?start_date=2024-01-01&end_date=2024-01-31&min_value=100&limit=50 # ❌ Invalid: GET /api/search?start_date=Jan-1-2024 (wrong date format) # ❌ Invalid: GET /api/search?start_date=2024-01-01&end_date=2024-01-31&limit=200 (exceeds max) ``` **Key Point**: Runtime validators ensure API consumers provide valid data, returning clear error messages for invalid requests before any database queries run. ## Additional Data Modeling Patterns ### Modeling for Stream Processing When you need to process data in real-time before it hits the database: ```python filename="app/datamodels/stream_example.py" from moose_lib import Key, Stream, OlapTable from pydantic import BaseModel from typing import Dict, Any, Annotated from datetime import datetime # Raw data from external source class RawData(BaseModel): id: Key[str] timestamp: datetime raw_payload: str source_type: Annotated[str, "LowCardinality"] # Processed data after transformation class ProcessedData(BaseModel): id: Key[str] timestamp: datetime field1: str field2: Annotated[str, "LowCardinality"] numeric_value: float attributes: Dict[str, Any] # Create streams raw_stream = Stream[RawData]("raw-stream") processed_table = OlapTable[ProcessedData]("processed_data", OlapConfig( order_by_fields = ["id", "timestamp"] )) processed_stream = Stream[ProcessedData]("processed-stream", StreamConfig( destination=processed_table )) # Transform raw data async def process_data(raw: RawData): parsed = json.loads(raw.raw_payload) processed = ProcessedData( id=raw.id, timestamp=raw.timestamp, field1=parsed["field_1"], field2=parsed["field_2"], numeric_value=float(parsed.get("value", 0)), attributes=parsed.get("attributes", {}) ) raw_stream.add_transform(processed_stream, process_data) ``` ### Modeling for Workflow Tasks Define strongly-typed inputs and outputs for async jobs: ```python filename="app/workflows/task_example.py" from moose_lib import Task, TaskContext from pydantic import BaseModel, Field from typing import Optional, List, Literal, Dict, Any from datetime import datetime # Input validation with constraints class TaskOptions(BaseModel): include_metadata: bool max_items: Optional[int] = Field(None, ge=1, le=100) class TaskInput(BaseModel): id: str = Field(..., regex="^[0-9a-f-]{36}$") items: List[str] task_type: Literal["typeA", "typeB", "typeC"] options: Optional[TaskOptions] = None # Structured output class ResultA(BaseModel): category: str score: float details: Dict[str, Any] class ResultB(BaseModel): values: List[str] metrics: List[float] class ResultC(BaseModel): field1: str field2: str field3: float class TaskOutput(BaseModel): id: str processed_at: datetime result_a: Optional[ResultA] = None result_b: Optional[ResultB] = None result_c: Optional[ResultC] = None # Create workflow task async def run_task(ctx: TaskContext[TaskInput]) -> TaskOutput: # Process data based on task type output = TaskOutput( id=ctx.input.id, processed_at=datetime.now() ) if ctx.input.task_type == "typeA": output.result_a = await process_type_a(ctx.input) return output example_task = Task[TaskInput, TaskOutput]( "example-task", run_function=run_task, retries=3, timeout=30 # seconds ) ``` --- ## Summary Source: moose/deploying.mdx Summary of deploying Moose into production # Moose Deploy ## Overview Once you've finished developing your Moose application locally, the next step is to deploy your Moose app into production. You have two options: - Self-host your Moose application on your own servers - Use the [Boreal Cloud hosting platform](https://www.fiveonefour.com/boreal) (from the makers of the Moose Stack) Want this managed in production for you? Check out Boreal Cloud (from the makers of the Moose Stack). ## Getting Started With Self-Hosting Moose makes it easy to package and deploy your applications, whether you're deploying to a server with or without internet access. The deployment process is designed to be flexible and can accommodate both containerized and non-containerized environments. ### Deployment Options 1. **Kubernetes Deployment**: Deploy your application to Kubernetes clusters (GKE, EKS, AKS, or on-premises) 2. **Standard Server Deployment**: Deploy your application to a server with internet access 3. **Containerized Cloud Deployment**: Deploy to cloud services like AWS ECS or Google Cloud Run 4. **Offline Server Deployment**: Deploy to an environment without internet access ### Key Deployment Steps There are three main aspects to deploying a Moose application: 1. Setting up your build environment with Python and the Moose CLI 2. Building your application using `moose build` 3. Setting up your deployment environment with the necessary runtime dependencies (Python, Docker) and configuration ## Configuring Your Deployment Based on our production experience, we recommend the following best practices for deploying Moose applications: ### Health Monitoring Configure comprehensive health checks to ensure your application remains available: - Startup probes to handle initialization - Readiness probes for traffic management - Liveness probes to detect and recover from deadlocks ### Zero-Downtime Deployments Implement graceful termination and rolling updates: - Pre-stop hooks to handle in-flight requests - Appropriate termination grace periods - Rolling update strategies that maintain service availability ### Resource Allocation Properly size your deployments based on workload: - CPU and memory requests tailored to your application - Replicas scaled according to traffic patterns - Horizontal scaling for high availability ### Environment Configuration For any deployment type, you'll need to configure: 1. Runtime environment variables for logging, telemetry, and application settings 2. External service connections (ClickHouse, Redpanda, Redis) 3. Network settings and security configurations 4. Application-specific configurations ## Detailed Guides The following pages provide detailed guides for each deployment scenario, including step-by-step instructions for both Python and TypeScript applications and production-ready configuration templates. --- ## Configuring Moose for cloud environments Source: moose/deploying/configuring-moose-for-cloud.mdx Configuring Moose for cloud environments # Configuring Moose for cloud environments In the [Packaging Moose for deployment](packaging-moose-for-deployment.mdx) page, we looked at how to package your moose application into Docker containers (using the `moose build —-docker` command), and you've pushed them to your container repository. We can connect and configure your container image with remote ClickHouse and Redis-hosted services. You can also optionally use Redpanda for event streaming and Temporal for workflow orchestration. The methods used to accomplish this are generally similar, but the specific details depend on your target cloud infrastructure. So, we'll look at the overarching concepts and provide some common examples. ## Specifying your repository container Earlier, we created two local containers and pushed them to a docker repository. ```txt filename="Terminal" copy >docker images REPOSITORY TAG IMAGE ID CREATED SIZE moose-df-deployment-aarch64-unknown-linux-gnu 0.3.175 c50674c7a68a About a minute ago 155MB moose-df-deployment-x86_64-unknown-linux-gnu 0.3.175 e5b449d3dea3 About a minute ago 163MB ``` We pushed the containers to the `514labs` Docker Hub account. So, we have these two containers available for use: ``` 514labs/moose-df-deployment-aarch64-unknown-linux-gnu:0.3.175 514labs/moose-df-deployment-x86_64-unknown-linux-gnu:0.3.175 ``` In later examples, we'll use an AMD64 (x86_64) based machine, so we'll stick to using the following container image: `514labs/moose-df-deployment-x86_64-unknown-linux-gnu:0.3.175` We'll also examine how the container image name can be used in various cloud providers and scenarios. ## General overview The general approach is to use a cloud provider that supports specifying a container image to launch your application. Examples include the Google Kubernetes Engine (GKE), Amazon's Elastic Kubernetes Service (EKS), and Elastic Container Service (ECS). Each provider also offers a way of configuring container environment variables that your container application will have access to. ## Essential Environment Variables Based on our production deployments, here are the essential environment variables you'll need to configure for your Moose application in cloud environments: ### Logging and Telemetry ``` # Logger configuration MOOSE_LOGGER__LEVEL=Info MOOSE_LOGGER__STDOUT=true MOOSE_LOGGER__FORMAT=Json # Telemetry configuration MOOSE_TELEMETRY__ENABLED=false MOOSE_TELEMETRY__EXPORT_METRICS=true # For debugging RUST_BACKTRACE=1 ``` ### HTTP Server Configuration ``` # HTTP server settings MOOSE_HTTP_SERVER_CONFIG__HOST=0.0.0.0 MOOSE_HTTP_SERVER_CONFIG__PORT=4000 ``` ### External Service Connections For detailed configuration of the external services, refer to the [Preparing ClickHouse and Redpanda](preparing-clickhouse-redpanda.mdx) page. #### ClickHouse ``` MOOSE_CLICKHOUSE_CONFIG__DB_NAME= MOOSE_CLICKHOUSE_CONFIG__USER= MOOSE_CLICKHOUSE_CONFIG__PASSWORD= MOOSE_CLICKHOUSE_CONFIG__HOST= MOOSE_CLICKHOUSE_CONFIG__HOST_PORT=8443 MOOSE_CLICKHOUSE_CONFIG__USE_SSL=1 MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT=9440 ``` #### Redis Moose requires Redis for caching and message passing: ``` MOOSE_REDIS_CONFIG__URL= MOOSE_REDIS_CONFIG__KEY_PREFIX= ``` #### Redpanda (Optional) If you choose to use Redpanda for event streaming: ``` MOOSE_REDPANDA_CONFIG__BROKER= MOOSE_REDPANDA_CONFIG__NAMESPACE= MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS=10043 MOOSE_REDPANDA_CONFIG__SASL_USERNAME= MOOSE_REDPANDA_CONFIG__SASL_PASSWORD= MOOSE_REDPANDA_CONFIG__SASL_MECHANISM=SCRAM-SHA-256 MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL=SASL_SSL MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR=3 ``` #### Temporal (Optional) If you choose to use Temporal for workflow orchestration: ``` MOOSE_TEMPORAL_CONFIG__CA_CERT=/etc/ssl/certs/ca-certificates.crt MOOSE_TEMPORAL_CONFIG__API_KEY= MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST=.tmprl.cloud ``` ## Securing Sensitive Information When deploying to cloud environments, it's important to handle sensitive information like passwords and API keys securely. Each cloud provider offers mechanisms for this: - **Kubernetes**: Use Secrets to store sensitive data. See our [Kubernetes deployment guide](deploying-on-kubernetes.mdx) for examples. - **Amazon ECS**: Use AWS Secrets Manager or Parameter Store to securely inject environment variables. - **Other platforms**: Use the platform's recommended secrets management approach. Never hardcode sensitive values directly in your deployment configuration files. Please share your feedback about Moose monitoring capabilities through [our GitHub repository](https://github.com/514-labs/moose/issues/new?title=Feedback%20for%20%E2%80%9CMonitoring%E2%80%9D&labels=feedback). --- ## Deploying on an offline server Source: moose/deploying/deploying-on-an-offline-server.mdx Deploying on an offline server # Building and Deploying Moose Applications This guide will walk you through the process of building a Moose application and deploying it to a server that does not have internet access. We'll cover both the build environment setup and the deployment environment requirements. ## Build Environment Setup ### Prerequisites Before you can build a Moose application, you need to set up your build environment with the following dependencies: OS: - Debian 10+ - Ubuntu 18.10+ - Fedora 29+ - CentOS/RHEL 8+ - Amazon Linux 2023+ - Mac OS 13+ Common CLI utilities: - zip - curl (optional, for installing the Moose CLI) Python build environment requirements: 1. Python 3.12 or later (we recommend using pyenv for Python version management) 2. pip ### Setting up the Build Environment First, install the required system dependencies: ```bash sudo apt update sudo apt install build-essential libssl-dev zlib1g-dev libbz2-dev \ libreadline-dev libsqlite3-dev curl git libncursesw5-dev xz-utils \ tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev ``` Install pyenv and configure your shell: ```bash curl -fsSL https://pyenv.run | bash ``` Add the following to your `~/.bashrc` or `~/.zshrc`: ```bash export PYENV_ROOT="$HOME/.pyenv" command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH" eval "$(pyenv init -)" ``` Install and set Python 3.12: ```bash pyenv install 3.12 pyenv global 3.12 ``` Verify the installation: ```bash python --version ``` ### Installing Moose CLI (Optional) You can install the Moose CLI using the official installer: ```bash curl -SfsL https://fiveonefour.com/install.sh | bash -s -- moose source ~/.bashrc # Or restart your terminal ``` or ```bash pip install moose-cli ``` ## Building Your Application ### 1. Initialize a New Project (Optional) This step is optional if you already have a Moose project. Create a new Moose project: ```bash moose init your-project-name py cd your-project-name ``` ### 2. Build the Application Make sure you have the `zip` utility installed (`sudo apt install zip`) before building your application. if you installed the moose cli to be available globally, you can build the application with the following command: ```bash moose build ``` Or if you installed the moose cli to be available locally, you can build the application with the following command: The build process will create a deployable package: ```bash moose build ``` This will create a zip file in your project directory with a timestamp, for example: `your-project-name-YYYY-MM-DD.zip` ## Deployment Environment Setup ### Prerequisites The deployment server requires: 1. Python 3.12 or later 3. Unzip utility ### Setting up the Deployment Environment 1. Install the runtime environment: Follow the Python installation steps from the build environment setup section. 2. Install the unzip utility: ```bash sudo apt install unzip ``` ## Deploying Your Application 1. Copy your built application package to the deployment server 2. Extract the application: ```bash unzip your-project-name-YYYY-MM-DD.zip -d ./app cd ./app/packager ``` 3. Start your application: ```bash moose prod ``` Ensure all required environment variables and configurations are properly set before starting your application. ## Troubleshooting - Verify that Python is properly installed using `python --version` - Check that your application's dependencies are properly listed in `requirements.txt` - If you encounter Python import errors, ensure your `PYTHONPATH` is properly set --- ## Deploying on Amazon ECS Source: moose/deploying/deploying-on-ecs.mdx Deploying on Amazon ECS # Deploying on Amazon ECS Moose can be deployed to Amazon's Elastic Container Service (ECS). ECS offers a managed container orchestrator at a fraction of the complexity of managing a Kubernetes cluster. If you're relatively new to ECS we recommend the following resources: - [Amazon Elastic Container Service (ECS) with a Load Balancer | AWS Tutorial with New ECS Experience](https://www.youtube.com/watch?v=rUgZNXKbsrY) - [Tutorial: Deploy NGINX Containers On ECS Fargate with Load Balancer](https://bhaveshmuleva.hashnode.dev/tutorial-deploy-nginx-containers-on-ecs-fargate-with-load-balancer) - [How to configure target groups ports with listeners and tasks](https://stackoverflow.com/questions/66275574/how-to-configure-target-groups-ports-with-listeners-and-tasks) The first step is deciding whether you'll host your Moose container on Docker Hub or Amazon's Elastic Container Registry (ECR). Amazon ECR is straightforward and is designed to work out of the box with ECS. Using Docker Hub works if your moose container is publicly available; however, if your container is private, you'll need to do a bit more work to provide ECS with your Docker credentials. > See: [Authenticating with Docker Hub for AWS Container Services](https://aws.amazon.com/blogs/containers/authenticating-with-docker-hub-for-aws-container-services/) Here is an overview of the steps required: 1. You'll first need to create or use an existing ECS cluster. 2. Then, you'll need to create an ECS `Task definition.` This is where you'll specify whether you want to use AWS Fargate or AWS EC2 instances. You'll also have options for selecting your OS and Architecture. Specify `Linux/X86-64` or `Linux/ARM-64`. This is important as you'll also need to specify a matching moose container image, such as `moose-df-deployment-x86_64-unknown-linux-gnu:0.3.175` or `moose-df-deployment-aarch64-unknown-linux-gnu:0.3.175` 3. As with all AWS services, if you're using secrets to store credentials, you will need to specify an IAM role with an `AmazonECSTaskExecutionRolePolicy` and `SecretsManagerReadWrite` policy. 4. Under the Container section, specify the name of your moose deployment and provide the container image name you're using. 5. Next, specify the Container Port as 4000. ## Configuring container environment variables While still in the Amazon ECS Task definition section, you'll need to provide the environment variables on which your Moose application depends. Scroll down to the Environment variables section and fill in each of the following variables. ClickHouse and Redis are required components for Moose. Redpanda and Temporal are optional - configure them only if you're using these components in your application. > Note: if you prefer, you can provide the environment variables below via an env file hosted on S3 or using AWS Secrets Manager for sensitive values. ### Core Configuration | Key | Description | Example Value | |-----|-------------|---------------| | MOOSE_LOGGER__LEVEL | Log level | Info | | MOOSE_LOGGER__STDOUT | Enable stdout logging | true | | MOOSE_LOGGER__FORMAT | Log format | Json | | RUST_BACKTRACE | Enable backtraces for debugging | 1 | ### HTTP Server Configuration | Key | Description | Example Value | |-----|-------------|---------------| | MOOSE_HTTP_SERVER_CONFIG__HOST | Your moose network binding address | 0.0.0.0 | | MOOSE_HTTP_SERVER_CONFIG__PORT | The network port your moose server is using | 4000 | ### ClickHouse Configuration (Required) | Key | Description | Example Value | |-----|-------------|---------------| | MOOSE_CLICKHOUSE_CONFIG__DB_NAME | The name of your ClickHouse database | moose_production | | MOOSE_CLICKHOUSE_CONFIG__USER | The database user name | clickhouse_user | | MOOSE_CLICKHOUSE_CONFIG__PASSWORD | The password to your ClickHouse database | (use AWS Secrets Manager) | | MOOSE_CLICKHOUSE_CONFIG__HOST | The hostname for your ClickHouse database | your-clickhouse.cloud.example.com | | MOOSE_CLICKHOUSE_CONFIG__HOST_PORT | The HTTPS port for your ClickHouse database | 8443 | | MOOSE_CLICKHOUSE_CONFIG__USE_SSL | Whether your database connection requires SSL | 1 | | MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT | The native port for your ClickHouse database | 9440 | ### Redis Configuration (Required) | Key | Description | Example Value | |-----|-------------|---------------| | MOOSE_REDIS_CONFIG__URL | Redis connection URL | redis://user:password@redis.example.com:6379 | | MOOSE_REDIS_CONFIG__KEY_PREFIX | Prefix for Redis keys to isolate namespaces | moose_production | ### Redpanda Configuration (Optional) | Key | Description | Example Value | |-----|-------------|---------------| | MOOSE_REDPANDA_CONFIG__BROKER | The hostname for your Redpanda instance | seed-5fbcae97.example.redpanda.com:9092 | | MOOSE_REDPANDA_CONFIG__NAMESPACE | Namespace for isolation | moose_production | | MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS | The message timeout delay in milliseconds | 10043 | | MOOSE_REDPANDA_CONFIG__SASL_USERNAME | Your Redpanda user name | redpanda_user | | MOOSE_REDPANDA_CONFIG__SASL_PASSWORD | Your Redpanda password | (use AWS Secrets Manager) | | MOOSE_REDPANDA_CONFIG__SASL_MECHANISM | SASL mechanism | SCRAM-SHA-256 | | MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL | The Redpanda security protocol | SASL_SSL | | MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR | Topic replication factor | 3 | ### Temporal Configuration (Optional) | Key | Description | Example Value | |-----|-------------|---------------| | MOOSE_TEMPORAL_CONFIG__CA_CERT | Path to CA certificate | /etc/ssl/certs/ca-certificates.crt | | MOOSE_TEMPORAL_CONFIG__API_KEY | Temporal Cloud API key | (use AWS Secrets Manager) | | MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST | Temporal Cloud namespace host | your-namespace.tmprl.cloud | Consider using a value of greater than 1000ms (1 second) for the Redpanda message timeout delay if you're using a hosted Redpanda cloud service. Review other options on the Task Creation page and press the `Create` button when ready. ## Using AWS Secrets Manager For sensitive information like passwords and API keys, we recommend using AWS Secrets Manager. To configure a secret: 1. Go to AWS Secrets Manager and create a new secret 2. Choose "Other type of secret" and add key-value pairs for your secrets 3. Name your secret appropriately (e.g., `moose/production/credentials`) 4. In your ECS task definition, reference the secret: - For environment variables, select "ValueFrom" and enter the ARN of your secret with the key name - Example: `arn:aws:secretsmanager:region:account:secret:moose/production/credentials:MOOSE_CLICKHOUSE_CONFIG__PASSWORD::` ## Building an ECS Service Once you've completed creating an ECS Task, you're ready to create an ECS Service. An ECS Service is a definition that allows you to specify how your cluster will be managed. Navigate to your cluster's Service page and press the `Create` button to create your new Moose service. The section we're interested in is the `Deployment configuration` section. There, you'll specify the Task Definition you created earlier. You can also specify the name of your service—perhaps something creative like `moose-service`—and the number of tasks to launch. Note at this time, we recommend that you only launch a single instance of Moose in your cluster. We're currently developing for multi-instance concurrent usage. The remaining sections on the create service page allow you to specify networking considerations and whether you'll use a load balancer. You can press the `Create` button to launch an instance of your new ECS Moose service. ## Setting up health checks Your generated Moose containers include a health check endpoint at `/health` that should be configured in your ECS service. We recommend configuring the following health check settings: ### Container-level Health Check In your task definition's container configuration: ``` healthCheck: command: ["CMD-SHELL", "curl -f http://localhost:4000/health || exit 1"] interval: 30 timeout: 5 retries: 3 startPeriod: 60 ``` ### Load Balancer Health Check If you're using an Application Load Balancer: 1. Create a target group for your service 2. Set the health check path to `/health` 3. Configure appropriate health check settings: - Health check protocol: HTTP - Health check port: 4000 - Health check path: /health - Healthy threshold: 2 - Unhealthy threshold: 2 - Timeout: 5 seconds - Interval: 15 seconds - Success codes: 200 These health check configurations ensure that your Moose service is properly monitored and that traffic is only routed to healthy containers. --- ## Deploying on Kubernetes Source: moose/deploying/deploying-on-kubernetes.mdx Deploying on Kubernetes # Deploying on Kubernetes Moose applications can be deployed to Kubernetes clusters, whether it's your own on-prem cluster or through a cloud service like Google's Kubernetes Engine (GKE) or Amazon's Elastic Kubernetes Service (EKS). Note at this time, we recommend that you only launch a single instance of moose in one cluster. We're currently developing for multi-instance concurrent usage. Essentially you'll need to create a moose-deployment YAML file. Here is an example: ```yaml filename="moose-deployment.yaml-fragment" copy apiVersion: apps/v1 kind: Deployment metadata: name: moosedeployment spec: replicas: 1 selector: matchLabels: app: moosedeploy template: metadata: labels: app: moosedeploy spec: containers: - name: moosedeploy image: 514labs/moose-df-deployment-x86_64-unknown-linux-gnu:latest ports: - containerPort: 4000 ``` > Make sure to update the image key above with the location of your repository and image tag. You may also need to configure a load balancer to route external traffic to your moose ingest points. ```yaml filename="moose-lb-service.yaml" copy apiVersion: v1 kind: Service metadata: name: moose-service spec: selector: app: moosedeploy ports: - protocol: TCP port: 4000 targetPort: 4000 type: LoadBalancer ``` Another approach would be to use a service type of `ClusterIP`: ```yaml filename="moose-service.yaml" copy apiVersion: v1 kind: Service metadata: name: moose-service spec: selector: app: moosedeploy type: ClusterIP ports: - protocol: TCP port: 4000 targetPort: 4000 ``` The approach you decide on will depend on your specific Kubernetes networking requirements. ## Setting up health checks and probes Your generated Moose docker containers feature a health check endpoint at `/health` that can be used by Kubernetes to monitor the health of your application. Based on our production deployment, we recommend configuring the following probes: ```yaml # Startup probe - gives Moose time to initialize before accepting traffic startupProbe: httpGet: path: /health port: 4000 initialDelaySeconds: 60 timeoutSeconds: 3 periodSeconds: 5 failureThreshold: 30 successThreshold: 3 # Readiness probe - determines when the pod is ready to receive traffic readinessProbe: httpGet: path: /health port: 4000 initialDelaySeconds: 5 timeoutSeconds: 3 periodSeconds: 3 failureThreshold: 2 successThreshold: 5 # Liveness probe - restarts the pod if it becomes unresponsive livenessProbe: httpGet: path: /health port: 4000 initialDelaySeconds: 5 timeoutSeconds: 3 periodSeconds: 5 failureThreshold: 5 successThreshold: 1 ``` ## Zero-downtime deployments with lifecycle hooks For production deployments, we recommend configuring a preStop lifecycle hook to ensure graceful pod termination during updates: ```yaml lifecycle: preStop: exec: command: ["/bin/sleep", "60"] ``` This gives the pod time to finish processing in-flight requests before termination. You should also set an appropriate `terminationGracePeriodSeconds` value (we recommend 70 seconds) to work with this hook. ## Resource requirements Based on our production deployments, we recommend the following resource allocation for a standard Moose deployment: ```yaml resources: requests: cpu: "1000m" memory: "8Gi" ``` You can adjust these values based on your application's specific needs and workload. ## Configuring container environment variables Inside your `moose-deployment.yaml` file, you will need to add an `env` section for environment variables. The example below includes actual sample values for clarity. In production deployments, you should use Kubernetes secrets for sensitive information as shown in the second example. Note that both Redpanda and Temporal are optional. If you're not using these components, you can omit their respective configuration sections. ### Example with hardcoded values (for development/testing only): The example below includes actual sample values for clarity. In production deployments, you should use Kubernetes secrets for sensitive information as shown in the second example. Note that both Redpanda and Temporal are optional. If you're not using these components, you can omit their respective configuration sections. ```yaml filename="moose-deployment-dev.yaml" copy apiVersion: apps/v1 kind: Deployment metadata: name: moosedeployment spec: # For zero-downtime deployments strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 replicas: 1 selector: matchLabels: app: moosedeploy template: metadata: labels: app: moosedeploy spec: # For graceful shutdowns terminationGracePeriodSeconds: 70 containers: - name: moosedeploy image: 514labs/moose-df-deployment-x86_64-unknown-linux-gnu:latest ports: - containerPort: 4000 # Lifecycle hook to delay pod shutdown lifecycle: preStop: exec: command: ["/bin/sleep", "60"] # Startup probe startupProbe: httpGet: path: /health port: 4000 initialDelaySeconds: 60 timeoutSeconds: 3 periodSeconds: 5 failureThreshold: 30 successThreshold: 3 # Readiness probe readinessProbe: httpGet: path: /health port: 4000 initialDelaySeconds: 5 timeoutSeconds: 3 periodSeconds: 3 failureThreshold: 2 successThreshold: 5 # Liveness probe livenessProbe: httpGet: path: /health port: 4000 initialDelaySeconds: 5 timeoutSeconds: 3 periodSeconds: 5 failureThreshold: 5 successThreshold: 1 # Resource requirements resources: requests: cpu: "1000m" memory: "8Gi" env: # Logger configuration - name: MOOSE_LOGGER__LEVEL value: "Info" - name: MOOSE_LOGGER__STDOUT value: "true" - name: MOOSE_LOGGER__FORMAT value: "Json" # Telemetry configuration - name: MOOSE_TELEMETRY__ENABLED value: "true" - name: MOOSE_TELEMETRY__EXPORT_METRICS value: "true" # Debugging - name: RUST_BACKTRACE value: "1" # HTTP server configuration - name: MOOSE_HTTP_SERVER_CONFIG__HOST value: "0.0.0.0" - name: MOOSE_HTTP_SERVER_CONFIG__PORT value: "4000" # ClickHouse configuration - name: MOOSE_CLICKHOUSE_CONFIG__DB_NAME value: "moose_production" - name: MOOSE_CLICKHOUSE_CONFIG__USER value: "clickhouse_user" - name: MOOSE_CLICKHOUSE_CONFIG__PASSWORD value: "clickhouse_password_example" - name: MOOSE_CLICKHOUSE_CONFIG__HOST value: "your-clickhouse.cloud.example.com" - name: MOOSE_CLICKHOUSE_CONFIG__HOST_PORT value: "8443" - name: MOOSE_CLICKHOUSE_CONFIG__USE_SSL value: "1" - name: MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT value: "9440" # Redis configuration - name: MOOSE_REDIS_CONFIG__URL value: "redis://redis_user:redis_password_example@redis.example.com:6379" - name: MOOSE_REDIS_CONFIG__KEY_PREFIX value: "moose_production" # Redpanda configuration (Optional) - name: MOOSE_REDPANDA_CONFIG__BROKER value: "seed-5fbcae97.example.redpanda.com:9092" - name: MOOSE_REDPANDA_CONFIG__NAMESPACE value: "moose_production" - name: MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS value: "10043" - name: MOOSE_REDPANDA_CONFIG__SASL_USERNAME value: "redpanda_user" - name: MOOSE_REDPANDA_CONFIG__SASL_PASSWORD value: "redpanda_password_example" - name: MOOSE_REDPANDA_CONFIG__SASL_MECHANISM value: "SCRAM-SHA-256" - name: MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL value: "SASL_SSL" - name: MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR value: "3" # Temporal configuration (Optional) - name: MOOSE_TEMPORAL_CONFIG__CA_CERT value: "/etc/ssl/certs/ca-certificates.crt" - name: MOOSE_TEMPORAL_CONFIG__API_KEY value: "temporal_api_key_example" - name: MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST value: "your-namespace.tmprl.cloud" imagePullSecrets: - name: moose-docker-repo-credentials ``` --- ## deploying/deploying-with-docker-compose Source: moose/deploying/deploying-with-docker-compose.mdx # Deploying with Docker Compose Deploying a Moose application with all its dependencies can be challenging and time-consuming. You need to properly configure multiple services, ensure they communicate with each other, and manage their lifecycle. Docker Compose solves this problem by allowing you to deploy your entire stack with a single command. This guide shows you how to set up a production-ready Moose environment on a single server using Docker Compose, with proper security, monitoring, and maintenance practices. This guide describes a single-server deployment. For high availability (HA) deployments, you'll need to: - Deploy services across multiple servers - Configure service replication and redundancy - Set up load balancing - Implement proper failover mechanisms We are also offering an HA managed deployment option for Moose called [Boreal](https://fiveonefour.com/boreal). ## Prerequisites Before you begin, you'll need: - Ubuntu 24 or above (for this guide) - Docker and Docker Compose (minimum version 2.23.1) - Access to a server with at least 8GB RAM and 4 CPU cores The Moose stack consists of: - Your Moose Application - [ClickHouse](https://clickhouse.com) (required) - [Redis](https://redis.io) (required) - [Redpanda](https://redpanda.com) (optional for event streaming) - [Temporal](https://temporal.io) (optional for workflow orchestration) ## Setting Up a Production Server ### Installing Required Software First, install Docker on your Ubuntu server: ```bash # Update the apt package index sudo apt-get update # Install packages to allow apt to use a repository over HTTPS sudo apt-get install -y \ apt-transport-https \ ca-certificates \ curl \ gnupg \ lsb-release # Add Docker's official GPG key curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg # Set up the stable repository echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # Update apt package index again sudo apt-get update # Install Docker Engine sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin ``` Next, install Node.js or Python depending on your Moose application: ```bash # For Node.js applications curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt-get install -y nodejs # OR for Python applications sudo apt-get install -y python3.12 python3-pip ``` ### Configure Docker Log Size Limits To prevent Docker logs from filling up your disk space, configure log rotation: ```bash sudo mkdir -p /etc/docker sudo vim /etc/docker/daemon.json ``` Add the following configuration: ```json { "log-driver": "json-file", "log-opts": { "max-size": "100m", "max-file": "3" } } ``` Restart Docker to apply the changes: ```bash sudo systemctl restart docker ``` ### Enable Docker Non-Root Access To run Docker commands without sudo: ```bash # Add your user to the docker group sudo usermod -aG docker $USER # Apply the changes (log out and back in, or run this) newgrp docker ``` ### Setting Up GitHub Actions Runner (Optional) If you want to set up CI/CD automation, you can install a GitHub Actions runner: 1. Navigate to your GitHub repository 2. Go to Settings > Actions > Runners 3. Click "New self-hosted runner" 4. Select Linux and follow the instructions shown To configure the runner as a service (to run automatically): ```bash cd actions-runner sudo ./svc.sh install sudo ./svc.sh start ``` ## Setting up a Foo Bar Moose Application (Optional) If you already have a Moose application, you can skip this section. You should copy the moose project to your server and then build the application with the `--docker` flag and get the built image on the server. ### Install Moose CLI ```bash bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose ``` ### Create a new Moose Application Please follow the initialization instructions for your language. ```bash moose init test-ts typescript cd test-ts npm install ``` or ```bash moose init test-py python cd test-py pip install -r requirements.txt ``` ### Build the application on AMD64 ```bash moose build --docker --amd64 ``` ### Build the application on ARM64 ```bash moose build --docker --arm64 ``` ### Confirm the image was built ```bash docker images ``` For more information on packaging Moose for deployment, see the full packaging guide. ## Preparing for Deployment ### Create Environment Configuration First, create a file called `.env` in your project directory to specify component versions: ```bash # Create and open the .env file vim .env ``` Add the following content to the `.env` file: ``` # Version configuration for components POSTGRESQL_VERSION=14.0 TEMPORAL_VERSION=1.22.0 TEMPORAL_UI_VERSION=2.20.0 REDIS_VERSION=7 CLICKHOUSE_VERSION=25.4 REDPANDA_VERSION=v24.3.13 REDPANDA_CONSOLE_VERSION=v3.1.0 ``` Additionally, create a `.env.prod` file for your Moose application-specific secrets and configuration: ```bash # Create and open the .env.prod file vim .env.prod ``` Add your application-specific environment variables: ``` # Application-specific environment variables APP_SECRET=your_app_secret # Add other application variables here ``` ## Deploying with Docker Compose Create a file called `docker-compose.yml` in the same directory: ```bash # Create and open the docker-compose.yml file vim docker-compose.yml ``` Add the following content to the file: ```yaml file=./docker-compose.yml name: moose-stack volumes: # Required volumes clickhouse-0-data: null clickhouse-0-logs: null redis-0: null # Optional volumes redpanda-0: null postgresql-data: null configs: temporal-config: # Using the "content" property to inline the config content: | limit.maxIDLength: - value: 255 constraints: {} system.forceSearchAttributesCacheRefreshOnRead: - value: true # Dev setup only. Please don't turn this on in production. constraints: {} services: # REQUIRED SERVICES # ClickHouse - Required analytics database clickhouse-0: container_name: clickhouse-0 restart: always image: clickhouse/clickhouse-server:${CLICKHOUSE_VERSION} volumes: - clickhouse-0-data:/var/lib/clickhouse/ - clickhouse-0-logs:/var/log/clickhouse-server/ environment: # Enable SQL-driven access control and user management CLICKHOUSE_ALLOW_INTROSPECTION_FUNCTIONS: 1 # Default admin credentials CLICKHOUSE_USER: admin CLICKHOUSE_PASSWORD: adminpassword # Disable default user CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1 # Database setup CLICKHOUSE_DB: moose # Uncomment this if you want to access clickhouse from outside the docker network # ports: # - 8123:8123 # - 9000:9000 healthcheck: test: wget --no-verbose --tries=1 --spider http://localhost:8123/ping || exit 1 interval: 30s timeout: 5s retries: 3 start_period: 30s ulimits: nofile: soft: 262144 hard: 262144 networks: - moose-network # Redis - Required for caching and pub/sub redis-0: restart: always image: redis:${REDIS_VERSION} volumes: - redis-0:/data command: redis-server --save 20 1 --loglevel warning healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 10s timeout: 5s retries: 5 networks: - moose-network # OPTIONAL SERVICES # --- BEGIN REDPANDA SERVICES (OPTIONAL) --- # Remove this section if you don't need event streaming redpanda-0: restart: always command: - redpanda - start - --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092 # Address the broker advertises to clients that connect to the Kafka API. # Use the internal addresses to connect to the Redpanda brokers' # from inside the same Docker network. # Use the external addresses to connect to the Redpanda brokers' # from outside the Docker network. - --advertise-kafka-addr internal://redpanda-0:9092,external://localhost:19092 - --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082 # Address the broker advertises to clients that connect to the HTTP Proxy. - --advertise-pandaproxy-addr internal://redpanda-0:8082,external://localhost:18082 - --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081 # Redpanda brokers use the RPC API to communicate with each other internally. - --rpc-addr redpanda-0:33145 - --advertise-rpc-addr redpanda-0:33145 # Mode dev-container uses well-known configuration properties for development in containers. - --mode dev-container # Tells Seastar (the framework Redpanda uses under the hood) to use 1 core on the system. - --smp 1 - --default-log-level=info image: docker.redpanda.com/redpandadata/redpanda:${REDPANDA_VERSION} container_name: redpanda-0 volumes: - redpanda-0:/var/lib/redpanda/data networks: - moose-network healthcheck: test: ["CMD-SHELL", "rpk cluster health | grep -q 'Healthy:.*true'"] interval: 30s timeout: 10s retries: 3 start_period: 30s # Optional Redpanda Console for visualizing the cluster redpanda-console: restart: always container_name: redpanda-console image: docker.redpanda.com/redpandadata/console:${REDPANDA_CONSOLE_VERSION} entrypoint: /bin/sh command: -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console' environment: CONFIG_FILEPATH: /tmp/config.yml CONSOLE_CONFIG_FILE: | kafka: brokers: ["redpanda-0:9092"] # Schema registry config moved outside of kafka section schemaRegistry: enabled: true urls: ["http://redpanda-0:8081"] redpanda: adminApi: enabled: true urls: ["http://redpanda-0:9644"] ports: - 8080:8080 depends_on: - redpanda-0 healthcheck: test: ["CMD", "wget", "--spider", "--quiet", "http://localhost:8080/admin/health"] interval: 30s timeout: 5s retries: 3 start_period: 10s networks: - moose-network # --- END REDPANDA SERVICES --- # --- BEGIN TEMPORAL SERVICES (OPTIONAL) --- # Remove this section if you don't need workflow orchestration # Temporal PostgreSQL database postgresql: container_name: temporal-postgresql environment: POSTGRES_PASSWORD: temporal POSTGRES_USER: temporal image: postgres:${POSTGRESQL_VERSION} restart: always volumes: - postgresql-data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U temporal"] interval: 10s timeout: 5s retries: 3 networks: - moose-network # Temporal server # For initial setup, use temporalio/auto-setup image # For production, switch to temporalio/server after first run temporal: container_name: temporal depends_on: postgresql: condition: service_healthy environment: # Database configuration - DB=postgres12 - DB_PORT=5432 - POSTGRES_USER=temporal - POSTGRES_PWD=temporal - POSTGRES_SEEDS=postgresql # Namespace configuration - DEFAULT_NAMESPACE=moose-workflows - DEFAULT_NAMESPACE_RETENTION=72h # Auto-setup options - set to false after initial setup - AUTO_SETUP=true - SKIP_SCHEMA_SETUP=false # Service configuration - all services by default # For high-scale deployments, run these as separate containers # - SERVICES=history,matching,frontend,worker # Logging and metrics - LOG_LEVEL=info # Addresses - TEMPORAL_ADDRESS=temporal:7233 - DYNAMIC_CONFIG_FILE_PATH=/etc/temporal/config/dynamicconfig/development-sql.yaml # For initial deployment, use the auto-setup image image: temporalio/auto-setup:${TEMPORAL_VERSION} # For production, after initial setup, switch to server image: # image: temporalio/server:${TEMPORAL_VERSION} restart: always ports: - 7233:7233 # Volume for dynamic configuration - essential for production configs: - source: temporal-config target: /etc/temporal/config/dynamicconfig/development-sql.yaml mode: 0444 networks: - moose-network healthcheck: test: ["CMD", "tctl", "--ad", "temporal:7233", "cluster", "health", "|", "grep", "-q", "SERVING"] interval: 30s timeout: 5s retries: 3 start_period: 30s # Temporal Admin Tools - useful for maintenance and debugging temporal-admin-tools: container_name: temporal-admin-tools depends_on: - temporal environment: - TEMPORAL_ADDRESS=temporal:7233 - TEMPORAL_CLI_ADDRESS=temporal:7233 image: temporalio/admin-tools:${TEMPORAL_VERSION} restart: "no" networks: - moose-network stdin_open: true tty: true # Temporal Web UI temporal-ui: container_name: temporal-ui depends_on: - temporal environment: - TEMPORAL_ADDRESS=temporal:7233 - TEMPORAL_CORS_ORIGINS=http://localhost:3000 image: temporalio/ui:${TEMPORAL_UI_VERSION} restart: always ports: - 8081:8080 networks: - moose-network healthcheck: test: ["CMD", "wget", "--spider", "--quiet", "http://localhost:8080/health"] interval: 30s timeout: 5s retries: 3 start_period: 10s # --- END TEMPORAL SERVICES --- # Your Moose application moose: image: moose-df-deployment-x86_64-unknown-linux-gnu:latest # Update with your image name depends_on: # Required dependencies - clickhouse-0 - redis-0 # Optional dependencies - remove if not using - redpanda-0 - temporal restart: always environment: # Logging and debugging RUST_BACKTRACE: "1" MOOSE_LOGGER__LEVEL: "Info" MOOSE_LOGGER__STDOUT: "true" # Required services configuration # ClickHouse configuration MOOSE_CLICKHOUSE_CONFIG__DB_NAME: "moose" MOOSE_CLICKHOUSE_CONFIG__USER: "moose" MOOSE_CLICKHOUSE_CONFIG__PASSWORD: "your_moose_password" MOOSE_CLICKHOUSE_CONFIG__HOST: "clickhouse-0" MOOSE_CLICKHOUSE_CONFIG__HOST_PORT: "8123" # Redis configuration MOOSE_REDIS_CONFIG__URL: "redis://redis-0:6379" MOOSE_REDIS_CONFIG__KEY_PREFIX: "moose" # Optional services configuration # Redpanda configuration (remove if not using Redpanda) MOOSE_REDPANDA_CONFIG__BROKER: "redpanda-0:9092" MOOSE_REDPANDA_CONFIG__MESSAGE_TIMEOUT_MS: "1000" MOOSE_REDPANDA_CONFIG__RETENTION_MS: "30000" MOOSE_REDPANDA_CONFIG__NAMESPACE: "moose" # Temporal configuration (remove if not using Temporal) MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST: "temporal:7233" MOOSE_TEMPORAL_CONFIG__NAMESPACE: "moose-workflows" # HTTP Server configuration MOOSE_HTTP_SERVER_CONFIG__HOST: 0.0.0.0 ports: - 4000:4000 env_file: - path: ./.env.prod required: true networks: - moose-network healthcheck: test: ["CMD-SHELL", "curl -s http://localhost:4000/health | grep -q '\"unhealthy\": \\[\\]' && echo 'Healthy'"] interval: 30s timeout: 5s retries: 10 start_period: 60s # Define the network for all services networks: moose-network: driver: bridge ``` At this point, don't start the services yet. First, we need to configure the individual services for production use as described in the following sections. ## Configuring Services for Production ### Configuring ClickHouse Securely (Required) For production ClickHouse deployment, we'll use environment variables to configure users and access control (as recommended in the [official Docker image documentation](https://hub.docker.com/r/clickhouse/clickhouse-server)): 1. First, start the ClickHouse container: ```bash # Start just the ClickHouse container docker compose up -d clickhouse-0 ``` 2. After ClickHouse has started, connect to create additional users: ```bash # Connect to ClickHouse with the admin user docker exec -it clickhouse-0 clickhouse-client --user admin --password adminpassword # Create moose application user CREATE USER moose IDENTIFIED BY 'your_moose_password'; GRANT ALL ON moose.* TO moose; # Create read-only user for BI tools (optional) CREATE USER power_bi IDENTIFIED BY 'your_powerbi_password' SETTINGS PROFILE 'readonly'; GRANT SHOW TABLES, SELECT ON moose.* TO power_bi; ``` 3. To exit the ClickHouse client, type `\q` and press Enter. 4. Update your Moose environment variables to use the new moose user: ```bash vim docker-compose.yml ``` ```yaml MOOSE_CLICKHOUSE_CONFIG__USER: "moose" MOOSE_CLICKHOUSE_CONFIG__PASSWORD: "your_moose_password" ``` 5. Remove the following environement variables from the clickhouse service in the docker-compose.yml file: ```yaml MOOSE_CLICKHOUSE_CONFIG__USER: "admin" MOOSE_CLICKHOUSE_CONFIG__PASSWORD: "adminpassword" ``` 6. For additional security in production, consider using Docker secrets for passwords. 7. Restart the ClickHouse container to apply the changes: ```bash docker compose restart clickhouse-0 ``` 8. Verify that the new configuration works by connecting with the newly created user: ```bash # Connect with the new moose user docker exec -it moose-stack-clickhouse-0-1 clickhouse-client --user moose --password your_moose_password # Test access by listing tables SHOW TABLES FROM moose; # Exit the clickhouse client \q ``` If you can connect successfully and run commands with the new user, your ClickHouse configuration is working properly. ### Securing Redpanda (Optional) For production, it's recommended to restrict external access to Redpanda: 1. Modify your Docker Compose file to remove external access: - Use only internal network access for production - If needed, use a reverse proxy with authentication for external access 2. For this simple deployment, we'll keep Redpanda closed to the external world with no authentication required, as it's only accessible from within the Docker network. ### Configuring Temporal (Optional) If your Moose application uses Temporal for workflow orchestration, the configuration above includes all necessary services based on the [official Temporal Docker Compose examples](https://github.com/temporalio/docker-compose). If you're not using Temporal, simply remove the Temporal-related services (postgresql, temporal, temporal-ui) and environment variables from the docker-compose.yml file. #### Temporal Deployment Process: From Setup to Production Deploying Temporal involves a two-phase process: initial setup followed by production operation. Here are step-by-step instructions for each phase: ##### Phase 1: Initial Setup 1. **Start the PostgreSQL database**: ```bash docker compose up -d postgresql ``` 2. **Wait for PostgreSQL to be healthy** (check the status): ```bash docker compose ps postgresql ``` Look for `healthy` in the output before proceeding. 3. **Start Temporal with auto-setup**: ```bash docker compose up -d temporal ``` During this phase, Temporal's auto-setup will: - Create the necessary PostgreSQL databases - Initialize the schema tables - Register the default namespace (moose-workflows) 4. **Verify Temporal server is running**: ```bash docker compose ps temporal ``` 5. **Start the Admin Tools and UI**: ```bash docker compose up -d temporal-admin-tools temporal-ui ``` 6. **Create the namespace manually**: ```bash # Register the moose-workflows namespace with a 3-day retention period docker compose exec temporal-admin-tools tctl namespace register --retention 72h moose-workflows ``` Verify that the namespace was created: ```bash # List all namespaces docker compose exec temporal-admin-tools tctl namespace list # Describe your namespace docker compose exec temporal-admin-tools tctl namespace describe moose-workflows ``` You should see details about the namespace including its retention policy. ##### Phase 2: Transition to Production After successful initialization, modify your configuration for production use: 1. **Stop Temporal services**: ```bash docker compose stop temporal temporal-ui temporal-admin-tools ``` 2. **Edit your docker-compose.yml file** to: - Change image from `temporalio/auto-setup` to `temporalio/server` - Set `SKIP_SCHEMA_SETUP=true` Example change: ```yaml # From: image: temporalio/auto-setup:${TEMPORAL_VERSION} # To: image: temporalio/server:${TEMPORAL_VERSION} # And change: - AUTO_SETUP=true - SKIP_SCHEMA_SETUP=false # To: - AUTO_SETUP=false - SKIP_SCHEMA_SETUP=true ``` 3. **Restart services with production settings**: ```bash docker compose up -d temporal temporal-ui temporal-admin-tools ``` 4. **Verify services are running with new configuration**: ```bash docker compose ps ``` ## Starting and Managing the Service ### Starting the Services Start all services with Docker Compose: ```bash docker compose up -d ``` ### Setting Up Systemd Service for Docker Compose For production, create a systemd service to ensure Docker Compose starts automatically on system boot: 1. Create a systemd service file: ```bash sudo vim /etc/systemd/system/moose-stack.service ``` 2. Add the following configuration (adjust paths as needed): ``` [Unit] Description=Moose Stack Requires=docker.service After=docker.service [Service] Type=oneshot RemainAfterExit=yes WorkingDirectory=/path/to/your/compose/directory ExecStart=/usr/bin/docker compose up -d ExecStop=/usr/bin/docker compose down TimeoutStartSec=0 [Install] WantedBy=multi-user.target ``` 3. Enable and start the service: ```bash sudo systemctl enable moose-stack.service sudo systemctl start moose-stack.service ``` ## Deployment Workflow You get a smooth deployment process with these options: ### Automated Deployment with CI/CD 1. Set up a CI/CD pipeline using GitHub Actions (if runner is configured) 2. When code is pushed to your repository: - The GitHub Actions runner builds your Moose application - Updates the Docker image - Deploys using Docker Compose ### Manual Deployment Alternatively, for manual deployment: 1. Copy the latest version of the code to the machine 2. Run `moose build` 3. Update the Docker image tag in your docker-compose.yml 4. Restart the stack with `docker compose up -d` ## Monitoring and Maintenance No more worrying about unexpected outages or performance issues. Set up proper monitoring: - Set up log monitoring with a tool like [Loki](https://grafana.com/oss/loki/) - Regularly backup your volumes (especially ClickHouse data) - Monitor disk space usage - Set up alerting for service health --- ## Monitoring your Moose App Source: moose/deploying/monitoring.mdx This content has moved to the unified Observability page > This page has moved. See the unified [/moose/metrics](/moose/metrics) page for observability across development and production. --- ## Packaging Moose for deployment Source: moose/deploying/packaging-moose-for-deployment.mdx Packaging Moose for deployment # Packaging Moose for Deployment Once you've developed your Moose application locally, you can package it for deployment to your on-prem or cloud infrastructure. The first step is to navigate (`cd`) to your moose project in your terminal. ```txt filename="Terminal" copy cd my-moose-project ``` The Moose CLI you've used to build your Moose project also has a handy flag that will automate the packaging and building of your project into docker images. ```txt filename="Terminal" copy moose build --docker ``` After the above command completes you can view your newly created docker files by running the `docker images` command: ```txt filename="Terminal" copy >docker images REPOSITORY TAG IMAGE ID CREATED SIZE moose-df-deployment-aarch64-unknown-linux-gnu latest c50674c7a68a About a minute ago 155MB moose-df-deployment-x86_64-unknown-linux-gnu latest e5b449d3dea3 About a minute ago 163MB ``` > Notice that you get two `moose-df-deployment` containers, one for the `aarch64` (ARM64) architecture and another for the `x86_64` architecture. This is necessary to allow you to choose the version that matches your cloud or on-prem machine architecture. You can then use standard docker commands to push your new project images to your container repository of choice. First tag your local images: ```txt filename="Terminal" copy docker tag moose-df-deployment-aarch64-unknown-linux-gnu:latest {your-repo-user-name}/moose-df-deployment-aarch64-unknown-linux-gnu:latest docker tag moose-df-deployment-x86_64-unknown-linux-gnu:latest {your-repo-user-name}/moose-df-deployment-x86_64-unknown-linux-gnu:latest ``` Then `push` your files to your container repository. ```txt filename="Terminal" copy docker push {your-repo-user-name}/moose-df-deployment-aarch64-unknown-linux-gnu:latest docker push {your-repo-user-name}/moose-df-deployment-x86_64-unknown-linux-gnu:latest ``` You can also use the following handy shell script to automate the steps above. ```bash filename="push.sh" copy #!/bin/bash version=$2 if [ -z "$1" ] then echo "You must specify the dockerhub repository as an argument. Example: ./push.sh container-repo-name" echo "Note: you can also provide a second argument to supply a specific version tag - otherwise this script will use the same version as the latest moose-cli on Github." exit 1 fi if [ -z "$2" ] then output=$(npx @514labs/moose-cli -V) version=$(echo "$output" | sed -n '2p' | awk '{print $2}') fi echo "Using version: $version" arch="moose-df-deployment-aarch64-unknown-linux-gnu" docker tag $arch:$version $1/$arch:$version docker push $1/$arch:$version arch="moose-df-deployment-x86_64-unknown-linux-gnu" docker tag $arch:$version $1/$arch:$version docker push $1/$arch:$version ``` --- ## Preparing access to ClickHouse, Redis, Temporal and Redpanda Source: moose/deploying/preparing-clickhouse-redpanda.mdx Preparing access to ClickHouse, Redis, Temporal and Redpanda # Preparing access to ClickHouse, Redis, Temporal and Redpanda Your hosted Moose application requires access to hosted ClickHouse and Redis service instances. You can also optionally use Redpanda for event streaming. You can stand up open source versions of these applications within your environments or opt to use cloud-hosted versions available at: - [ClickHouse Cloud](https://clickhouse.com) - [Redis Cloud](https://redis.com) - [Redpanda Cloud](https://redpanda.com) - [Temporal Cloud](https://temporal.io) ## ClickHouse Configuration If you're using `state_config.storage = "clickhouse"` in your config (serverless mode without Redis), your ClickHouse instance must support the **KeeperMap** table engine. This is used for migration state storage and distributed locking. ✅ **ClickHouse Cloud**: Supported by default ✅ **`moose dev` / `moose prod`**: Already configured in our Docker setup ⚠️ **Self-hosted ClickHouse**: See [ClickHouse KeeperMap documentation](https://clickhouse.com/docs/en/engines/table-engines/special/keeper-map) for setup requirements If you're using Redis for state storage (`state_config.storage = "redis"`), you don't need KeeperMap. For ClickHouse, you'll need the following information: | Parameter | Description | Default Value | |-----------|-------------|---------------| | DB_NAME | Database name to use | Your branch or application ID | | USER | Username for authentication | - | | PASSWORD | Password for authentication | - | | HOST | Hostname or IP address | - | | HOST_PORT | HTTPS port | 8443 | | USE_SSL | Whether to use SSL (1 for true, 0 for false) | 1 | | NATIVE_PORT | Native protocol port | 9440 | These values are used to configure the Moose application's connection to ClickHouse through environment variables following this pattern: ``` MOOSE_CLICKHOUSE_CONFIG__= ``` For example: ``` MOOSE_CLICKHOUSE_CONFIG__DB_NAME=myappdb MOOSE_CLICKHOUSE_CONFIG__HOST=myclickhouse.example.com MOOSE_CLICKHOUSE_CONFIG__USE_SSL=1 MOOSE_CLICKHOUSE_CONFIG__HOST_PORT=8443 MOOSE_CLICKHOUSE_CONFIG__NATIVE_PORT=9440 ``` ## Redis Configuration Moose requires Redis for caching and as a message broker. You'll need the following configuration: | Parameter | Description | |-----------|-------------| | URL | Redis connection URL | | KEY_PREFIX | Prefix for Redis keys to isolate namespaces | These values are configured through: ``` MOOSE_REDIS_CONFIG__URL=redis://username:password@redis.example.com:6379 MOOSE_REDIS_CONFIG__KEY_PREFIX=myapp ``` ## Temporal Configuration (Optional) Temporal is an optional workflow orchestration platform that can be used with Moose. If you choose to use Temporal, you'll need the following configuration: | Parameter | Description | Default Value | |-----------|-------------|---------------| | CA_CERT | Path to CA certificate | /etc/ssl/certs/ca-certificates.crt | | API_KEY | Temporal Cloud API key | - | | TEMPORAL_HOST | Temporal Cloud namespace host | Your namespace + .tmprl.cloud | These values are configured through: ``` MOOSE_TEMPORAL_CONFIG__CA_CERT=/etc/ssl/certs/ca-certificates.crt MOOSE_TEMPORAL_CONFIG__API_KEY=your-temporal-api-key MOOSE_TEMPORAL_CONFIG__TEMPORAL_HOST=your-namespace.tmprl.cloud ``` ## Redpanda Configuration (Optional) Redpanda is an optional component that can be used for event streaming. If you choose to use Redpanda, you'll need the following information: | Parameter | Description | Default Value | |-----------|-------------|---------------| | BROKER | Bootstrap server address | - | | NAMESPACE | Namespace for isolation (often same as branch or app ID) | - | | MESSAGE_TIMEOUT_MS | Message timeout in milliseconds | 10043 | | SASL_USERNAME | SASL username for authentication | - | | SASL_PASSWORD | SASL password for authentication | - | | SASL_MECHANISM | SASL mechanism | SCRAM-SHA-256 | | SECURITY_PROTOCOL | Security protocol | SASL_SSL | | REPLICATION_FACTOR | Topic replication factor | 3 | These values are used to configure the Moose application's connection to Redpanda through environment variables following this pattern: ``` MOOSE_REDPANDA_CONFIG__= ``` For example: ``` MOOSE_REDPANDA_CONFIG__BROKER=seed-5fbcae97.example.redpanda.com:9092 MOOSE_REDPANDA_CONFIG__NAMESPACE=myapp MOOSE_REDPANDA_CONFIG__SECURITY_PROTOCOL=SASL_SSL MOOSE_REDPANDA_CONFIG__SASL_MECHANISM=SCRAM-SHA-256 MOOSE_REDPANDA_CONFIG__REPLICATION_FACTOR=3 ``` ## Using Environment Variables in Deployment When deploying your Moose application, you'll need to pass these configurations as environment variables. Refer to the deployment guides for your specific platform (Kubernetes, ECS, etc.) for details on how to securely provide these values to your application. --- ## getting-started/from-clickhouse Source: moose/getting-started/from-clickhouse.mdx # Use Moose with Your Existing ClickHouse
## What This Guide Does This guide sets up a local ClickHouse development environment that mirrors your production database and enables code-first schema management: 1. **Introspect** your remote ClickHouse tables and generate TypeScript/Python data models 2. **Create** a local ClickHouse instance with your exact table schemas 3. **Seed** your local database with production data (optional) 4. **Build** APIs and pipelines on top of your ClickHouse data in code ## How It Works **Local Development:** - Your production ClickHouse remains untouched - You get a local ClickHouse instance that copies your remote table schemas - All development happens locally with hot-reload **Production Deployment:** - When you deploy your code, it connects to your remote ClickHouse - Any new tables, materialized views, or schema changes you create in code are automatically migrated to your target database - Your existing data and tables remain intact ## Prerequisites
## Step 1: Install Moose Install the Moose CLI globally to your system: ```bash filename="Terminal" copy bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose ``` After installation, you'll use `moose init` to create a new project that automatically connects to your ClickHouse and generates all the code you need. ## Step 2: Create Your Project Use the ClickHouse Playground tab to try it out! ```bash filename="Initialize new project" copy # Option 1: Provide connection string directly moose init my-project --from-remote --language python # Option 2: Run without connection string for interactive setup moose init my-project --from-remote --language python ``` **Connection String Format:** ``` https://username:password@host:port/?database=database_name ``` If you don't provide a connection string, Moose will guide you through an interactive setup process where you'll be prompted to enter: - **Host and port** (e.g., `https://your-service-id.region.clickhouse.cloud:8443`) - **Username** (usually `default`) - **Password** (your ClickHouse password) - **Database name** (optional, defaults to `default`) This is perfect if you're not sure about your connection details or prefer a guided experience. Moose will create a complete project structure with: - **Data models**: TypeScript/Python classes for every table in your ClickHouse - **Type definitions**: Full type safety for all your data - **Development environment**: Local ClickHouse instance that mirrors your production schema - **Build tools**: Everything configured and ready to go - Make sure you are using the `HTTPS` connection string, not the `HTTP` connection string. - Make sure the port is correct. For `HTTPS` the default is `8443` - The default username is `default` See the section: Connect to your remote ClickHouse. ```bash filename="Initialize new project" copy # Generate code models from your existing ClickHouse tables moose init my-project --from-remote https://explorer:@play.clickhouse.com:443/?database=default --language python ```
```bash filename="Install dependencies" copy cd my-project python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` You should see: `Successfully generated X models from ClickHouse tables` ### Explore Your Generated Models Check what Moose created from your tables in the `app/main.py` file:
Your generated table models are imported here so Moose can detect them.
Learn more about export pattern: local development / hosted
### If your database includes ClickPipes/PeerDB (CDC) tables As noted above, when you use `moose init --from-remote`, Moose introspects your database. If it detects CDC‑managed tables (e.g., PeerDB/ClickPipes with fields like `_peerdb_synced_at`, `_peerdb_is_deleted`, `_peerdb_version`), it marks those as `EXTERNALLY_MANAGED` and writes them into a dedicated external models file. Your root file is updated to load these models automatically. This separation is a best‑effort by the CLI to keep clearly CDC‑owned tables external. For other tables you don’t want Moose to manage, set the lifecycle to external and move them into the external file. See: - [External Tables](/moose/olap/external-tables) documentation for more information on how external tables work. - [DB Pull](/moose/olap/db-pull) for keeping models in sync with the remote schema.
## Step 3: Start Development Start your development server. This spins up a local ClickHouse instance that perfectly mirrors your production schema: ```bash filename="Start your dev server" copy moose dev ``` **What happens when you run `moose dev`:** - 🏗️ Creates a local ClickHouse instance with your exact table schemas in your project code - 🔄 Hot-reloads migrations to your local infrastructure as you save code changes - 🚀 Starts a web server for building APIs Your production ClickHouse remains completely untouched. This is a separate, local development environment. ```txt Created docker compose file ⡗ Starting local infrastructure Successfully started containers Validated clickhousedb-1 docker container Validated redpanda-1 docker container Successfully validated red panda cluster Validated temporal docker container Successfully ran local infrastructure Node Id: my-analytics-app::b15efaca-0c23-42b2-9b0c-642105f9c437 Starting development mode Watching "/path/to/my-analytics-app/app" Started Webserver. Next Steps 💻 Run the moose 👉 `ls` 👈 command for a bird's eye view of your application and infrastructure 📥 Send Data to Moose Your local development server is running at: http://localhost:4000/ingest ``` Don't see this output? [Check out the troubleshooting section](#troubleshooting) ### Seed Your Local Database (Optional) Copy real data from your production ClickHouse to your local development environment. This gives you realistic data to work with during development. **Why seed?** Your local database starts empty. Seeding copies real data so you can: - Test with realistic data volumes - Debug with actual production data patterns - Develop features that work with real data structures ```bash filename="Terminal" copy moose seed clickhouse --connection-string --limit 100000 ``` **Connection String Format:** The connection string must use ClickHouse native protocol: ```bash # ClickHouse native protocol (secure connection) clickhouse://username:password@host:9440/database ``` **Note:** Data transfer uses ClickHouse's native TCP protocol via `remoteSecure()`. The remote server must have the native TCP port accessible. The command automatically handles table mismatches gracefully. ```bash filename="Terminal" copy moose seed clickhouse --connection-string clickhouse://explorer:@play.clickhouse.com:9440/default --limit 100000 ``` ```bash filename="Terminal" copy # You can omit --connection-string by setting an env var export MOOSE_SEED_CLICKHOUSE_URL='clickhouse://username:password@host:9440/database' # copy a limited number of rows (batched under the hood) moose seed clickhouse --limit 100000 ``` - `--limit` and `--all` are mutually exclusive - `--all` can be used to copy the entire table(s), use with caution as it can be very slow and computationally intensive. - Large copies are automatically batched to avoid remote limits; you’ll see per-batch progress. - If you stop with Ctrl+C, the current batch finishes and the command exits gracefully. **Expected Output:** ```bash ✓ Database seeding completed Seeded 'local_db' from 'remote_db' ✓ table1: copied from remote ⚠️ table2: skipped (not found on remote) ✓ table3: copied from remote ``` **Troubleshooting:** - Tables that don't exist on remote are automatically skipped with warnings - Use `--table ` to seed a specific table that exists in both databases - Check `moose ls table` to see your local tables
## Step 4: Build Your First API Now that you have your data models, let's build something useful! You can create APIs, materialized views, and applications with full type safety. - **REST APIs** that expose your ClickHouse data to frontend applications - **Materialized Views** for faster queries and aggregations - **Streaming pipelines** for real-time data processing - **Full-stack applications** with your ClickHouse data as the backend ### Add APIs Build REST APIs on top of your existing tables to expose your data to your user-facing apps. This is a great way to get started with Moose without changing any of your existing pipelines. Check out the MooseAPI module for more information on building APIs with Moose. ### Build Materialized Views Build materialized views on top of your existing tables to improve query performance. If you have Materialized Views in your ClickHouse, you can use Moose to build new Materialized Views on top of your existing tables, or to migrate your existing Materialized Views to Moose. Check out the MooseOLAP module for more information on building Materialized Views with Moose.
## Known Limitations Some advanced ClickHouse features may not be fully supported yet. Join the Moose Slack and let us know if you have any issues, feedback, or requests. **What we're working on:** - **Selective table import** (currently imports all tables) - **Default value annotations**
## Troubleshooting ### Error: Failed to connect to ClickHouse This guide shows exactly where to find your host, port, username, and password, and how to construct a valid HTTPS connection string. 1. Log into your [ClickHouse Cloud console](https://clickhouse.cloud/) 2. Open your service details page 3. Click "Connect" in the sidebar
4. Select the `HTTPS` tab and copy the values shown - **Host**: e.g. `your-service-id.region.clickhouse.cloud` - **Port**: usually `8443` - **Username**: usually `default` - **Password**: the password you configured
5. Build your connection string: ```txt https://USERNAME:PASSWORD@HOST:PORT/?database=DATABASE_NAME ``` 6. Example (with placeholders): ```txt https://default:your_password@your-service-id.region.clickhouse.cloud:8443/?database=default ``` 7. Optional: Test with curl ```bash curl --user "USERNAME:PASSWORD" --data-binary "SELECT 1" https://HOST:PORT ```
### Self-hosted or Docker - Check your server config (usually `/etc/clickhouse-server/config.xml`) - `` default: `8123` - `` default: `8443` - Check users in `/etc/clickhouse-server/users.xml` or `users.d/` - For Docker, check environment variables in your compose/run config: - `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, `CLICKHOUSE_DB` Build the HTTPS connection string with your values: ```txt https://USERNAME:PASSWORD@HOST:8443/?database=DB ``` If you only have HTTP enabled, enable HTTPS or use an HTTPS proxy; Moose init expects an HTTPS URL for remote introspection. ### `moose dev` fails to start Double check Docker is running and you do not have any port conflicts. - ClickHouse local runs on port `18123` - Your local webserver runs on port `4000` - Your local management API runs on port `5001` ## What's Next? --- ## 5-Minute Quickstart Source: moose/getting-started/quickstart.mdx Build your first analytical backend with Moose in 5 minutes # 5-Minute Quickstart
## Prerequisites Check that your pre-requisites are installed by running the following commands: ```bash filename="Terminal" copy python --version ``` ```bash filename="Terminal" copy docker ps ``` Skip the tutorial and add Moose as a layer on top of your existing database
## Step 1: Install Moose (30 seconds) ### Run the installation script ```bash filename="Terminal" copy bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose ``` You should see this message: `Moose vX.X.X installed successfully!` (note that X.X.X is the actual version number) If you see an error instead, check [Troubleshooting](#need-help) below. ### Reload your shell configuration **This step is required.** Your current terminal doesn't know about the `moose` command yet. ```bash filename="Terminal" copy source ~/.zshrc ``` If `echo $SHELL` showed `/bin/bash` or `/usr/bin/bash`: ```bash filename="Terminal" copy source ~/.bashrc ``` ### Verify moose command works ```bash filename="Terminal" copy moose --version ``` You should see: ```txt moose X.X.X ``` **Try these steps in order:** 1. Re-run the correct `source` command for your shell 2. Close this terminal completely and open a new terminal window 3. Run `moose --version` again 4. If still failing, see [Troubleshooting](#need-help) You should see the moose version number. Do not proceed to Step 2 until `moose --version` works.
## Step 2: Create Your Project (1 minute) ### Initialize your project ```bash filename="Terminal" copy moose init my-analytics-app python ``` You should see output like: ```txt ✓ Created my-analytics-app ✓ Initialized Python project ``` ### Navigate to your project directory ```bash filename="Terminal" copy cd my-analytics-app ``` A virtual environment isolates your project's dependencies. We recommend creating one for your project. **Create a virtual environment (Recommended)** ```bash filename="Terminal" copy python3 -m venv .venv ``` **activate your virtual environment(Recommended)** ```bash filename="Terminal" copy source .venv/bin/activate ``` This creates a `.venv` folder and activates it. Your terminal prompt should now look something like this: ```txt (.venv) username@computer my-analytics-app % ``` ### Install dependencies ```bash filename="Terminal" copy pip install -r requirements.txt ``` **Wait for installation to complete.** You should see successful installation messages ending with: ```txt Successfully installed [list of packages] ``` You should see `(.venv)` in your prompt and dependencies installed with no errors. ### Start your development environment ```bash filename="Terminal" copy moose dev ``` Moose is: - Downloading Docker images for ClickHouse, Redpanda, and Temporal - Starting containers - Initializing databases - Starting the development server Do not proceed until you see the "Started Webserver" message. ```txt Created docker compose file ⡗ Starting local infrastructure Successfully started containers Validated clickhousedb-1 docker container Validated redpanda-1 docker container Successfully validated red panda cluster Validated temporal docker container Successfully ran local infrastructure Node Id: my-analytics-app::b15efaca-0c23-42b2-9b0c-642105f9c437 Starting development mode Watching "/path/to/my-analytics-app/app" Started Webserver. 👈 WAIT FOR THIS Next Steps 💻 Run the moose 👉 `ls` 👈 command for a bird's eye view of your application and infrastructure 📥 Send Data to Moose Your local development server is running at: http://localhost:4000/ingest ``` Keep this terminal running. This is your Moose development server. You'll open a new terminal for the next step. ## Step 3: Understand Your Project (1 minute) Your project includes a complete example pipeline: **Important:** While your pipeline objects are defined in the child folders, they **must be imported** into the root `main.py` file for the Moose CLI to discover and use them. ```python filename="app/main.py" from app.ingest.models import * # Data models & pipelines from app.ingest.transform import * # Transformation logic from app.apis.bar import * # API endpoints from app.views.bar_aggregated import * # Materialized views from app.workflows.generator import * # Background workflows ``` ## Step 4: Test Your Pipeline (2 minutes) **Keep your `moose dev` terminal running.** You need a second terminal for the next commands. **macOS Terminal:** - Press `Cmd+N` for a new window, or - Right-click Terminal icon in dock → New Window **VSCode:** - Click the `+` button in the terminal panel, or - Press `Ctrl+Shift+` ` (backtick) **Linux Terminal:** - Press `Ctrl+Shift+N`, or - Use your terminal's File → New Window menu ### Navigate to your project in the new terminal In your **new terminal window** (not the one running `moose dev`): ```bash filename="Terminal 2 (New Window)" copy cd my-analytics-app ``` If not automatically activated, activate the virtual environment: ```bash filename="Terminal 2 (New Window)" copy source .venv/bin/activate ``` ### Run the data generator workflow Your project comes with a pre-built [Workflow](../workflows) called `generator` that acts as a **data simulator**: ```bash filename="Terminal 2 (New Window)" copy moose workflow run generator ``` You should see: ```txt Workflow 'generator' triggered successfully ``` - Generates 1000 fake records with realistic data (using the Faker library) - Sends each record to your ingestion API via HTTP POST - Runs as a background task managed by Temporal - Helps you test your entire pipeline without needing real data You can see the code in the `/workflows/generator.py` file. ### Watch for data processing logs **Switch to your first terminal** (where `moose dev` is running). You should see new logs streaming: ```txt POST ingest/Foo [POST] Data received at ingest API sink for Foo Received Foo_0_0 -> Bar_0_0 1 message(s) [DB] 17 row(s) successfully written to DB table (Bar) ``` These logs show your pipeline working: Workflow generates data → Ingestion API receives it → Data transforms → Writes to ClickHouse **If you don't see logs after 30 seconds:** - Verify `moose dev` is still running in Terminal 1 - Check Terminal 2 for error messages from the workflow command - Run `docker ps` to verify containers are running The workflow runs in the background, powered by [Temporal](https://temporal.io). You can see workflow status at `http://localhost:8080`. ```bash filename="Terminal" copy moose peek Bar --limit 5 # This queries your Clickhouse database to show raw data; useful for debugging / verification ``` You should see output like: ```txt ┌─primaryKey─────────────────────────┬─utcTimestamp────────┬─hasText─┬─textLength─┐ │ 123e4567-e89b-12d3-a456-426614174000 │ 2024-01-15 10:30:00 │ 1 │ 42 │ │ 987fcdeb-51a2-43d1-b789-123456789abc │ 2024-01-15 10:31:00 │ 0 │ 0 │ └────────────────────────────────────┴─────────────────────┴─────────┴────────────┘ ``` If you see 0 rows, wait a few seconds for the workflow to process data, then try again. ### Query your data Your application has a pre-built [API](../apis) that reads from your database. The API runs on `localhost:4000`. **In Terminal 2**, call the API with `curl`: ```bash filename="Terminal 2 (New Window)" copy curl "http://localhost:4000/api/bar" ``` You should see JSON data like: ```json [ { "dayOfMonth": 15, "totalRows": 67, "rowsWithText": 34, "maxTextLength": 142, "totalTextLength": 2847 }, { "dayOfMonth": 14, "totalRows": 43, "rowsWithText": 21, "maxTextLength": 98, "totalTextLength": 1923 } ] ``` You should see JSON data with analytics results. Your complete data pipeline is working! **Try query parameters:** ```bash filename="Terminal 2 - Add filters and limits" copy curl "http://localhost:4000/api/bar?limit=5&orderBy=totalRows" ``` - **Port 4000**: Your Moose application webserver (all APIs are running on this port) - **Port 8080**: Temporal UI dashboard (workflow management) - **Port 18123**: ClickHouse HTTP interface (direct database access) **If the workflow command doesn't work:** - Make sure you're in the project directory (`cd my-analytics-app`) - Verify `moose dev` is still running in your first terminal - Check that Docker containers are running: `docker ps` **If curl returns an error:** - Verify the URL is `http://localhost:4000` (not 8080) - Make sure the workflow has had time to generate data (wait 30-60 seconds) - Check your `moose dev` terminal for error messages **If you get HTML instead of JSON:** - You might be hitting the wrong port - use 4000, not 8080 - Port 8080 serves the Temporal UI (workflow dashboard), not your API **If `moose peek Bar` shows 0 rows:** - Wait for the workflow to complete (it processes 1000 records) - Check the workflow is running: look for "Ingested X records..." messages - Verify no errors in your `moose dev` terminal logs **If you see connection refused:** - Restart `moose dev` and wait for "Started Webserver" message - Check if another process is using port 4000: `lsof -i :4000` 1. Install the [OpenAPI (Swagger) Viewer extension](https://marketplace.cursorapi.com/items?itemName=42Crunch.vscode-openapi) in your IDE 2. Open `.moose/openapi.yaml` in your IDE 3. Click the "Preview" icon to launch the interactive API explorer 4. Test the `POST /ingest/Foo` and `GET /api/bar` endpoints ## Step 5: Hot Reload Schema Changes (1 minute) 1. Open `app/ingest/models.py` 2. Add a new field to your data model: ```python filename="app/ingest/models.py" {16} copy from moose_lib import Key, StringToEnumMixin from typing import Optional, Annotated from enum import IntEnum, auto from pydantic import BaseModel class Baz(StringToEnumMixin, IntEnum): QUX = auto() QUUX = auto() class Bar(BaseModel): primary_key: Key[str] utc_timestamp: datetime baz: Baz has_text: bool text_length: int new_field: Optional[str] = None # New field ``` 3. Save the file and watch your terminal **Switch to Terminal 1** (where `moose dev` is running). You should see Moose automatically update your infrastructure: ```txt ⠋ Processing Infrastructure changes from file watcher ~ Table Bar: Column changes: + new_field: String ``` You should see the column change logged. Your API, database schema, and streaming topic all updated automatically! **Try it yourself:** Add another field with a different data type and watch the infrastructure update in real-time. ## Recap You've built a complete analytical backend with: ## Need Help? **Docker not running:** ```bash filename="Terminal" copy # macOS open -a Docker # Linux sudo systemctl start docker # Verify Docker is running docker ps ``` **Docker out of space:** ```bash filename="Terminal" copy docker system prune -a ``` **Python version too old:** ```bash filename="Terminal" copy # Check version python3 --version # Install Python 3.12+ with pyenv curl https://pyenv.run | bash pyenv install 3.12 pyenv local 3.12 ``` **Port 4000 already in use:** ```bash filename="Terminal" copy # Find what's using port 4000 lsof -i :4000 # Kill the process (replace PID) kill -9 # Or use a different port moose dev --port 4001 ``` **Permission denied:** ```bash filename="Terminal" copy # Fix Docker permissions (Linux) sudo usermod -aG docker $USER newgrp docker # Fix file permissions chmod +x ~/.moose/bin/moose ``` **Port 4000 already in use:** ```bash filename="Terminal" copy # Find what's using port 4000 lsof -i :4000 # Kill the process (replace PID) kill -9 # Or use a different port moose dev --port 4001 ``` **Permission denied:** ```bash filename="Terminal" copy # Fix Docker permissions (Linux) sudo usermod -aG docker $USER newgrp docker # Fix file permissions chmod +x ~/.moose/bin/moose ``` **Still stuck?** Join our [Slack community](https://join.slack.com/t/moose-community/shared_invite/zt-2fjh5n3wz-cnOmM9Xe9DYAgQrNu8xKxg) or [open an issue](https://github.com/514-labs/moose/issues). --- ## Minimum Requirements Source: moose/help/minimum-requirements.mdx Minimum Requirements for Moose ## Development Setup The development setup has higher requirements because Moose runs locally along with all its dependencies (Redpanda, ClickHouse, Temporal, Redis). - **CPU:** 12 cores - **Memory:** 18GB - **Disk:** >500GB SSD - **OS:** - Windows with Linux subsystem (Ubuntu preferred) - Linux (Debian 10+, Ubuntu 18.10+, Fedora 29+, CentOS/RHEL 8+) - Mac OS 13+ - **Runtime:** Python 3.12+ or Node.js 20+, Docker 24.0.0+, and Docker Compose 2.23.1+ ## Production Setup The production setup has lower requirements, as external components (Redpanda, ClickHouse, Redis, and Temporal) are assumed to be deployed separately. - **CPU:** 1vCPU - **Memory:** 6GB - **Disk:** >30GB SSD - **OS:** - Windows with Linux subsystem (Ubuntu preferred) - Linux (Debian 10+, Ubuntu 18.10+, Fedora 29+, CentOS/RHEL 8+) - Mac OS 13+ - **Runtime:** Python 3.12+ or Node.js 20+ --- ## Troubleshooting Source: moose/help/troubleshooting.mdx Troubleshooting for Moose # Troubleshooting Common issues and their solutions when working with Moose. ## Development Environment ### Issue: `moose dev` fails to start **Possible causes and solutions:** 1. **Port conflicts** - Check if ports 4000-4002 are already in use - Solution: Kill the conflicting processes or configure different ports ```bash # Find processes using ports lsof -i :4000-4002 # Kill process by PID kill 2. **Missing dependencies** - Solution: Ensure all dependencies are installed ```bash pip install . ``` 3. **Docker not running** - Solution: Start Docker Desktop or Docker daemon ```bash # Check Docker status docker info # Start Docker on Linux sudo systemctl start docker ``` ## Data Ingestion ### Issue: Data not appearing in tables 1. **Validation errors** - Check logs for validation failures - Solution: Ensure data matches schema ```bash filename="Terminal" copy moose logs ``` 2. **Stream processing errors** - Solution: Check transform functions for errors ```bash filename="Terminal" copy moose logs --filter functions ``` 3. **Database connectivity** - Solution: Verify database credentials in `.moose/config.toml` ```toml filename=".moose/config.toml" copy [clickhouse_config] db_name = "local" user = "panda" password = "pandapass" use_ssl = false host = "localhost" host_port = 18123 native_port = 9000 ``` ## Stream Processing ### Issue: High processing latency 1. **Insufficient parallelism** - Solution: Increase stream parallelism ```python from moose_lib import Stream, StreamConfig stream = Stream[Data]("high_volume", StreamConfig(parallelism=8) ) ``` ### Issue: Data transformations not working 1. **Transform function errors** - Solution: Debug transformation logic ```python # Add logging to transform def transform(record: Data) -> Data: print(f"Processing record: {record.id}") try: # Your transformation logic return transformed_record except Exception as e: print(f"Transform error: {e}") return None # Skip record on error ``` ## Database Issues ### Issue: Slow queries 1. **Missing or improper indexes** - Solution: Check orderByFields configuration ```typescript const table = new OlapTable("slow_table", { orderByFields: ["frequently_queried_field", "timestamp"] }); ``` 2. **Large result sets** - Solution: Add limits and pagination ```python # In query API results = client.query.execute( # not an f-string, the values are provided in the dict """ SELECT * FROM large_table WHERE category = {category} LIMIT {limit} """, {"category": "example", "limit": 100} ) ``` ## Deployment Issues ### Issue: Deployment fails 1. **Configuration errors** - Solution: Check deployment configuration ```bash # Validate configuration moose validate --config ``` 2. **Resource limitations** - Solution: Increase resource allocation ```yaml # In kubernetes manifest resources: requests: memory: "1Gi" cpu: "500m" limits: memory: "2Gi" cpu: "1000m" ``` 3. **Permission issues** - Solution: Verify service account permissions ```bash # Check permissions moose auth check ``` ### Issue: Migration stuck with "Migration already in progress" **Cause:** A previous migration was interrupted without releasing its lock. **Solution:** 1. **Wait 5 minutes** - locks expire automatically 2. **Or manually clear the lock:** ```sql DELETE FROM _MOOSE_STATE WHERE key = 'migration_lock'; ``` 3. **Verify it worked:** ```sql SELECT * FROM _MOOSE_STATE WHERE key = 'migration_lock'; -- Should return no rows ``` The `_MOOSE_STATE` table uses ClickHouse's KeeperMap engine for distributed locking, ensuring only one migration runs at a time across multiple deployments. ## Getting Help If you can't resolve an issue: 1. Ask for help on the [Moose community Slack channel](https://join.slack.com/t/moose-community/shared_invite/zt-2fjh5n3wz-cnOmM9Xe9DYAgQrNu8xKxg) 2. Search existing [GitHub issues](https://github.com/514-labs/moose/issues) 3. Open a new issue with: - Moose version (`moose --version`) - Error messages and logs - Steps to reproduce - Expected vs. actual behavior --- ## in-your-stack Source: moose/in-your-stack.mdx # Moose In Your Dev Stack Moose handles the analytical layer of your application stack. The [Area Code](https://github.com/514-labs/area-code) repository contains two working implementations that show how to integrate Moose with existing applications. ## User Facing Analytics (UFA) UFA shows how to add a dedicated analytics microservice to an existing application without impacting your primary database. View the open source repository to see the full implementation and clone it on your own machine. ### Data Flow 1. Application writes to Supabase (transactional backend) 2. Supabase Realtime streams changes to Analytical Backend and Retrieval Backend 3. Moose ingest pipeline syncs change events from Redpanda into ClickHouse 4. Frontend queries analytics APIs for dashboards ### Architecture Components The UFA template demonstrates a microservices architecture with specialized components for different data access patterns:
The user interface for dashboards and application interactions Technologies: [Vite](https://vite.dev), [React](https://react.dev), [TanStack Query](https://tanstack.com/query), [TanStack Router](https://tanstack.com/router), [Tailwind CSS](https://tailwindcss.com) Handles CRUD operations and maintains application state Technologies: [Supabase](https://supabase.com), [Fastify](https://fastify.dev), [Drizzle ORM](https://orm.drizzle.team/) Fast text search and complex queries across large datasets Technologies: [Elasticsearch](https://www.elastic.co/) + [Fastify](https://fastify.dev) High-performance analytical queries and aggregations Technologies: [ClickHouse](https://clickhouse.com/) + [Moose OLAP](/moose/olap), [Redpanda](https://redpanda.com/) + [Moose Streaming](/moose/streaming), [Moose APIs](/moose/apis) Keep data synchronized between transactional, retrieval, and analytics systems Technologies: [Supabase Realtime](https://supabase.com/docs/guides/realtime), [Temporal](https://temporal.io/) + [Moose Workflows](/moose/workflows)
## Operational Data Warehouse (ODW) ODW shows how to build a centralized data platform that ingests from multiple sources for business intelligence and reporting. View the open source repository to see the full implementation and clone it on your own machine. ### Data Flow 1. Sources send data to Moose ingestion endpoints 2. Streaming functions validate and transform data 3. Data lands in ClickHouse tables 4. BI tools query via generated APIs or direct SQL ### Architecture Components
Handles incoming data from push-based sources (webhooks, application logs) with validation and transformation Technologies: [Moose APIs](/moose/apis), [Redpanda](https://redpanda.com/) + [Moose Streaming](/moose/streaming) Connects to your existing databases, object storage, or third-party APIs Technologies: [Temporal](https://temporal.io/) + [Moose Workflows](/moose/workflows) Centralized analytical database for raw and transformed data Technologies: [ClickHouse](https://clickhouse.com/) + [Moose OLAP](/moose/olap) Query interface for business intelligence and reporting Technologies: [Streamlit](https://streamlit.io/) dashboards, [Moose APIs](/moose/apis), [ClickHouse Connect](https://clickhouse.com/docs/en/interfaces/http/connect)
--- ## Overview Source: moose/index.mdx Modular toolkit for building real-time analytical backends # MooseStack
Type-safe, code-first tooling for building real-time analytical backends--OLAP Databases, Data Streaming, ETL Workflows, Query APIs, and more. ## Get Started ```bash filename="Install Moose" copy bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose ```
## Everything as Code Declare all infrastructure (e.g. ClickHouse tables, Redpanda streams, APIs, etc.) and pipelines in pure TypeScript or Python. Your code auto-wires everything together, so no integration boilerplate needed. ```ts filename="Complete Analytical Backend in 1 TS file" copy interface DataModel { primaryKey: Key; name: string; } // Create a ClickHouse table ); // Create an ingest API endpoint ); // Create analytics API endpoint interface QueryParams { limit?: number; } : QueryParams, {client, sql}) => { const result = await client.query.execute(sql`SELECT * FROM ${clickhouseTable} LIMIT ${limit}`); return await result.json(); } ); ``` ```python filename="Complete Analytical Backend in 1 Python file" copy from moose_lib import Key, OlapTable, Stream, StreamConfig, IngestApi, IngestApiConfig, Api from pydantic import BaseModel class DataModel(BaseModel): primary_key: Key[str] name: str # Create a ClickHouse table clickhouse_table = OlapTable[DataModel]("TableName") # Create a Redpanda streaming topic redpanda_topic = Stream[DataModel]("TopicName", StreamConfig( destination=clickhouse_table, )) # Create an ingest API endpoint ingest_api = IngestApi[DataModel]("post-api-route", IngestConfig( destination=redpanda_topic, )) # Create a analytics API endpoint class QueryParams(BaseModel): limit: int = 10 def handler(client, params: QueryParams): return client.query.execute("SELECT * FROM {table: Identifier} LIMIT {limit: Int32}", { "table": clickhouse_table.name, "limit": params.limit, }) analytics_api = Api[RequestParams, DataModel]("get-api-route", query_function=handler) ```
## Core Concepts ```ts interface Event { id: Key; name: string; createdAt: Date; } interface AggregatedEvent { count: number; name: string; } ``` ```bash # Start local dev server moose dev ⡏ Starting local infrastructure Successfully started containers Validated clickhousedb-1 docker container Validated redpanda-1 docker container Successfully validated red panda cluster Validated temporal docker container Successfully ran local infrastructure ``` ## Modules ```ts const table = new OlapTable("events"); const mv = new MaterializedView({ selectStatement: sql` SELECT count(*) as count, name FROM ${table} GROUP BY name `, selectTables: [table], tableName: "events", materializedViewName: "aggregated_events" }); ``` ```ts const stream = new Stream("events", { destination: table, }); stream.addConsumer((event) => { console.log(event); }); ``` ```ts const etl = new Workflow("my_etl", { startingTask: startEtl, schedule: "@every 1h", retries: 3, }); ``` ```ts const postEvent = new IngestApi("post-event", { destination: stream, }); const getEvents = new Api("get-events", { async handler({limit = 10}, {client, sql}) { // query database and return results return await client.query.execute(sql` SELECT * FROM events LIMIT ${limit} `); } }); ``` Each module is independent and can be used on its own. You can start with one capability and incrementally adopt more over time. ## Tooling ```bash # Build for production moose build ``` ```bash # Create a plan moose plan # Example plan output: ~ Table events with column changes: [ Added( Column { name: "status", data_type: String, required: true, unique: false, primary_key: false, default: None })] and order by changes: OrderByChange { before: [], after: [] } ``` ## Technology Partners - [ClickHouse](https://clickhouse.com/) (Online Analytical Processing (OLAP) Database) - [Redpanda](https://redpanda.com/) (Streaming) - [Temporal](https://temporal.io/) (Workflow Orchestration) - [Redis](https://redis.io/) (Internal State Management) Want this managed in production for you? Check out Boreal Cloud (from the makers of the Moose Stack). --- ## LLM-Optimized Documentation Source: moose/llm-docs.mdx Language-scoped documentation feeds for AI assistants # LLM-Optimized Documentation Moose now publishes lightweight documentation bundles so AI assistants can reason about your project without scraping the entire site. Each docs page includes **LLM View** links for TypeScript and Python, and the CLI exposes HTTP endpoints that deliver pre-compiled reference text. ## Quick links - TypeScript bundle: `/llm-ts.txt` - Python bundle: `/llm-py.txt` - Scoped bundle: append `?path=relative/docs/section` to either endpoint to fetch a specific subsection You can open these URLs in a browser, pipe them into tooling, or share them with agents such as Claude, Cursor, and Windsurf. ```bash filename="Terminal" # Fetch the TypeScript bundle for the OLAP docs from the hosted site curl "https://docs.fiveonefour.com/llm-ts.txt?path=moose/olap/model-table" ``` For project-specific knowledge, combine these static bundles with live context from the [MCP server](/moose/mcp-dev-server). --- ## Development Mode Source: moose/local-dev.mdx Local development environment with hot reload and automatic infrastructure management # Setting Up Your Development Environment Development mode (`moose dev`) provides a full-featured local environment optimized for rapid iteration and debugging. It automatically manages Docker containers, provides hot reload capabilities, and includes enhanced debugging features. ## Getting Started ```bash # Start development environment moose dev # View your running infrastructure moose ls ``` ## Container Management Development mode automatically manages Docker containers for your infrastructure: - **ClickHouse** (when `olap` feature is enabled) - **Redpanda** (when `streaming_engine` feature is enabled) - **Temporal** (when `workflows` feature is enabled) - **Analytics APIs Server** (when `apis` feature is enabled) - **Redis** (always enabled) - **MCP Server** (always enabled) - Enables AI-assisted development. [Learn more](/moose/mcp-dev-server) ### Container Configuration Control which containers start with feature flags: ```toml copy # moose.config.toml [features] olap = true # Enables ClickHouse streaming_engine = true # Enables Redpanda workflows = false # Controls Temporal startup apis = true # Enables Analytics APIs server ``` ### Extending Docker Infrastructure You can extend Moose's Docker Compose configuration with custom services by creating a `docker-compose.dev.override.yaml` file in your project root. This allows you to add additional infrastructure (databases, monitoring tools, etc.) that runs alongside your Moose development environment. **Do not use docker-compose.dev.override.yaml to modify Moose-managed services** (ClickHouse, Redpanda, Redis, Temporal). The Docker Compose merge behavior makes it difficult to override existing configuration correctly, often leading to conflicts. Instead, use `moose.config.toml` to configure Moose infrastructure. See [Configuration](/moose/configuration) for all available options including database connections, ports, volumes, and service-specific settings. Use the override file **only for adding new services** that complement your Moose environment (e.g., PostgreSQL for application data, monitoring tools). **How it works:** When you run `moose dev`, Moose automatically detects and merges your override file with the generated Docker Compose configuration. The files are merged using Docker Compose's [standard merge behavior](https://docs.docker.com/compose/how-tos/multiple-compose-files/merge/). **Example: Adding PostgreSQL for Application Data** Create a `docker-compose.dev.override.yaml` file in your project root: ```yaml copy filename="docker-compose.dev.override.yaml" services: postgres: image: postgres:16 environment: POSTGRES_USER: myapp POSTGRES_PASSWORD: mypassword POSTGRES_DB: myapp_db ports: - "5432:5432" volumes: - postgres-data:/var/lib/postgresql/data volumes: postgres-data: ``` Now when you run `moose dev`, PostgreSQL will start alongside your other infrastructure. You'll see a message confirming the override file is being used: ``` [moose] Using docker-compose.dev.override.yaml for custom infrastructure ``` **Recommended Use Cases:** - **Add databases**: PostgreSQL, MySQL, MongoDB for application data - **Add monitoring**: Grafana, Prometheus for metrics visualization - **Add custom services**: Additional message queues, caching layers, or development tools **Not Recommended:** - Modifying Moose-managed services (ClickHouse, Redpanda, Redis, Temporal) - Overriding ports, volumes, or environment variables for Moose infrastructure - Attempting to change database credentials or connection settings For any Moose infrastructure configuration, use `moose.config.toml` instead. See [Configuration](/moose/configuration). **Example: Adding Grafana for Monitoring** ```yaml copy filename="docker-compose.dev.override.yaml" services: grafana: image: grafana/grafana:latest ports: - "3001:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin - GF_USERS_ALLOW_SIGN_UP=false volumes: - grafana-data:/var/lib/grafana volumes: grafana-data: ``` When merging files, Docker Compose follows these rules: - **Services**: Merged by name with values from the override file taking precedence - **Environment variables**: Appended (both files' values are used) - **Volumes**: Appended - **Ports**: Appended (use `!override` tag to replace instead of merge) See [Docker's merge documentation](https://docs.docker.com/reference/compose-file/merge/) for complete details. The override file is only used in development mode (`moose dev`). For production deployments, configure your infrastructure separately using your deployment platform's tools. ## Hot Reloading Development The development runtime includes a file watcher that provides near-instantaneous feedback when you save code changes. ### Watched Files The file watcher recursively monitors your entire `app/` directory structure and only rebuilds the components that actually changed. Only the root file in your `app/` directory is run when changes are detected. In order for your tables/streams/apis/workflows to be detected, you must import them in your root file (`main.py`). If you change a file in your app directory and it is a dependency of your root file, then those changes WILL be detected. ## Quick Example **❌ Doesn't work - No export from root:** ```py file="app/tables/users.py" from schemas.user import UserSchema users_table = OlapTable[UserSchema]("Users") # Moose can't see this - not imported in main.py ``` **✅ Works - Import in main file:** ```py file="app/tables/users.py" {3} from schemas.user import UserSchema users_table = OlapTable[UserSchema]("Users") # No export needed - Python modules are automatically discovered ``` ```py file="app/main.py" from tables.users import users_table # Moose sees this ``` Now because we imported the table in the main file, Moose will detect the change and rebuild the table. **✅ Works - Change dependency:** ```ts file="app/schemas/user.ts" {5} export interface UserSchema { id: string; name: string; email: string; age: number; // Adding this triggers migration } ``` *Moose detects this because `UserSchema` is imported in the root file via the dependency chain.* Learn more about how Moose handles migrations. ## Script Execution Hooks You can configure your dev server to run your own shell commands automatically during development. Use these hooks to keep generated artifacts in sync (e.g., refreshing external models, regenerating OpenAPI SDKs). ### Available hooks - `on_first_start_script`: runs once when the dev server first starts in this process - `on_reload_complete_script`: runs after each dev server reload when code/infra changes have been fully applied Configure these in `moose.config.toml` under the `http_server_config` section: ```toml copy # moose.config.toml [http_server_config] # One-time on first start on_first_start_script = "echo 'dev started'" # After every code/infra reload completes on_reload_complete_script = "echo 'reload complete'" ``` Notes: - Scripts run from your project root using your `$SHELL` (falls back to `/bin/sh`). - Use `&&` to chain multiple commands or point to a custom script. - Prefer passing credentials via environment variables or your secret manager. ### Use case: keep external models in sync (DB Pull) Refresh `EXTERNALLY_MANAGED` table models from a remote ClickHouse on dev start so your local code matches the live schema. ```bash filename="Terminal" copy export REMOTE_CLICKHOUSE_URL="https://username:password@host:8443/?database=default" ``` ```toml copy # moose.config.toml [http_server_config] on_first_start_script = "moose db pull --connection-string $REMOTE_CLICKHOUSE_URL" ``` See the full guide: [/moose/olap/db-pull](/moose/olap/db-pull) ### Use case: regenerate OpenAPI SDKs on reload Automatically regenerate client SDKs after Moose finishes applying code/infra changes so `.moose/openapi.yaml` is fresh. ```toml copy # moose.config.toml [http_server_config] on_first_start_script = "command -v openapi-generator-cli >/dev/null 2>&1 || npm i -g @openapitools/openapi-generator-cli" on_reload_complete_script = "openapi-generator-cli generate -i .moose/openapi.yaml -g typescript-fetch -o ./generated/ts" ``` More examples: [/moose/apis/openapi-sdk](/moose/apis/openapi-sdk) ## Local Infrastructure ### Port Allocation Development mode uses the following default ports: - **4000**: Main API server - **5001**: Management API (health checks, metrics, admin, OpenAPI docs) ### Service URLs Access your development services at: ```bash # Main application http://localhost:4000 # Management interface curl http://localhost:5001/metrics # OpenAPI documentation http://localhost:5001/openapi.yaml ``` ### Container Networking All containers run in an isolated Docker network with automatic service discovery: - Containers communicate using service names - Port mapping only for external access - Automatic DNS resolution between services ### MCP Server for AI-Assisted Development Development mode includes a built-in Model Context Protocol (MCP) server that lets AI assistants interact with your local infrastructure through natural language. **What you can do:** - Query your ClickHouse database with natural language - Inspect streaming topics and messages - Search and filter development logs - Explore your infrastructure map **Quick setup:** The MCP server runs automatically at `http://localhost:4000/mcp`. For Claude Code, just run: ```bash copy claude mcp add --transport http moose-dev http://localhost:4000/mcp ``` For other AI clients (Windsurf, VS Code, Cursor, Claude Desktop), see the [full setup guide](/moose/mcp-dev-server). **Example prompts:** - *"What errors are in the logs?"* - *"What tables exist in my project?"* - *"Show me the schema of all tables"* - *"Sample 5 messages from the Foo stream"* See the complete guide for all available tools, detailed configuration for each AI client, and example workflows. ## Troubleshooting ### Common Issues **Container Startup Failures** ```bash # Check Docker is running docker info # View container logs moose logs ``` **Port Conflicts** ```bash # Check what's using your ports lsof -i :4000 lsof -i :5001 # Use custom ports export MOOSE_HTTP_PORT=4040 export MOOSE_MANAGEMENT_PORT=5010 moose dev ``` --- ## MCP Server Source: moose/mcp-dev-server.mdx Built-in Model Context Protocol server for AI-assisted development # MCP Server for AI-Assisted Development The Moose development server includes a built-in Model Context Protocol (MCP) server that enables AI agents and IDEs to interact directly with your local development infrastructure. This allows you to use natural language to query data, inspect logs, explore infrastructure, and debug your Moose project. ## What is MCP? [Model Context Protocol (MCP)](https://modelcontextprotocol.io) is an open protocol that standardizes how AI assistants communicate with development tools and services. Moose's MCP server exposes your local development environment—including ClickHouse, Redpanda, logs, and infrastructure state—through a set of tools that AI agents can use. ## Quick Start The MCP server runs automatically when you start development mode: ```bash moose dev ``` The MCP server is available at: `http://localhost:4000/mcp` The MCP server is enabled by default. To disable it, use `moose dev --mcp=false`. ## Configure Your AI Client Connect your AI assistant to the Moose MCP server. Most clients now support native HTTP transport for easier setup. **Setup**: Use the Claude Code CLI (easiest method) ```bash copy claude mcp add --transport http moose-dev http://localhost:4000/mcp ``` That's it! Claude Code will automatically connect to your Moose dev server. **Scope**: This command adds the MCP server to Claude Code's project configuration, making it available to your project when using Claude Code. Other AI clients (Cursor, Windsurf, etc.) require separate configuration - see the tabs below. Make sure `moose dev` is running before adding the server. The CLI will verify the connection. **Alternative**: Manual configuration at `~/.claude/config.json` ```json filename="config.json" copy { "mcpServers": { "moose-dev": { "transport": "http", "url": "http://localhost:4000/mcp" } } } ``` **Location**: `~/.codeium/windsurf/mcp_config.json` Windsurf supports native Streamable HTTP transport: ```json filename="mcp_config.json" copy { "mcpServers": { "moose-dev": { "serverUrl": "http://localhost:4000/mcp" } } } ``` **Prerequisites**: - VS Code 1.102+ (built-in MCP support) - Or install the [Cline extension](https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev) **Option 1: Native HTTP Support (VS Code 1.102+)** Add to `.vscode/settings.json` or User Settings: ```json filename=".vscode/settings.json" copy { "mcp.servers": { "moose-dev": { "transport": "http", "url": "http://localhost:4000/mcp" } } } ``` **Option 2: Cline Extension** Configure in Cline's MCP settings: ```json copy { "moose-dev": { "transport": "sse", "url": "http://localhost:4000/mcp" } } ``` **Location**: `.cursor/mcp.json` (project-level) or `~/.cursor/settings/mcp.json` (global) Cursor currently uses stdio transport. Use `mcp-remote` to bridge to HTTP servers: ```json filename=".cursor/mcp.json" copy { "mcpServers": { "moose-dev": { "command": "npx", "args": [ "-y", "mcp-remote", "http://localhost:4000/mcp" ] } } } ``` **Location**: `~/Library/Application Support/Claude/claude_desktop_config.json` Access via: Claude > Settings > Developer > Edit Config ```json filename="claude_desktop_config.json" copy { "servers": { "moose-dev": { "command": "npx", "args": [ "-y", "mcp-remote", "http://localhost:4000/mcp" ] } } } ``` The `-y` flag automatically installs `mcp-remote` if not already installed. Make sure `moose dev` is running before using the MCP tools. The AI client will connect to `http://localhost:4000/mcp`. ## Available Tools The Moose MCP server provides five tools for interacting with your local development environment: ### `get_logs` Retrieve and filter Moose development server logs for debugging and monitoring. **What you can ask for:** - Filter by log level (ERROR, WARN, INFO, DEBUG, TRACE) - Limit the number of log lines returned - Search for specific text patterns in logs **Example prompts:** *"Show me the last 10 ERROR logs"* ``` Showing 10 most recent log entries from /Users/user/.moose/2025-10-10-cli.log Filters applied: - Level: ERROR [2025-10-10T17:44:42Z ERROR] Foo -> Bar (worker 1): Unsupported SASL mechanism: undefined [2025-10-10T17:44:43Z ERROR] FooDeadLetterQueue (consumer) (worker 1): Unsupported SASL mechanism [2025-10-10T17:51:48Z ERROR] server error on API server (port 4000): connection closed ... ``` *"What WARN level logs do I have?"* ``` Showing 6 most recent log entries Filters applied: - Level: WARN [2025-10-10T16:45:04Z WARN] HTTP client not configured - missing API_KEY [2025-10-10T16:50:05Z WARN] HTTP client not configured - missing API_KEY ... ``` **Tip**: Combine filters for better results. For example: "Show me ERROR logs with 'ClickHouse' in them" combines level filtering with search. **Use cases:** - Debugging application errors - Monitoring infrastructure health - Tracking data processing issues - Finding specific events or patterns --- ### `get_infra_map` Retrieve and explore the infrastructure map showing all components in your Moose project. **What you can ask for:** - List specific component types (tables, topics, API endpoints, workflows, etc.) - Get a complete overview of all infrastructure - Search for components by name - See detailed configuration or just a summary **Example prompts:** *"What tables exist in my project?"* ``` # Moose Infrastructure Map (Summary) ## Tables (28) - MergeTreeTest - ReplacingMergeTreeVersion - Bar - BasicTypes - UserEvents_1_0 - UserEvents_2_0 - FooDeadLetter - BarAggregated - FooWorkflow ... ``` *"Give me an overview of my Moose infrastructure"* ``` # Moose Infrastructure Map (Summary) ## Topics (11) - Bar, BasicTypes, Foo, FooDeadLetterQueue, SimpleArrays... ## API Endpoints (11) - INGRESS_Foo (INGRESS -> topic: Foo) - INGRESS_BasicTypes (INGRESS -> topic: BasicTypes) - EGRESS_bar (EGRESS (4 params)) ... ## Tables (28) - MixedComplexTypes, Bar, UserEvents_1_0... ## Topic-to-Table Sync Processes (10) - Bar_Bar, BasicTypes_BasicTypes... ## Function Processes (3) - Foo__Bar_Foo_Bar, Foo_Foo... ``` *"Find all components with 'User' in the name"* ``` ## Tables (2) - UserEvents_1_0 - UserEvents_2_0 ``` **Tip**: Search is case-sensitive by default. Use capital letters to match your component names, or ask the AI to search case-insensitively. **Use cases:** - Understanding project structure - Discovering available components - Debugging infrastructure issues - Documenting your data pipeline --- ### `query_olap` Execute read-only SQL queries against your local ClickHouse database. **What you can ask for:** - Query table data with filters, sorting, and aggregations - Inspect table schemas and column information - Count rows and calculate statistics - List all tables in your database - Results in table or JSON format **Example prompts:** *"What columns are in the UserEvents_1_0 table?"* ``` Query executed successfully. Rows returned: 4 | name | type | default_type | default_expression | comment | ... |-----------|-------------------|--------------|-------------------|---------| | userId | String | | | | | eventType | String | | | | | timestamp | Float64 | | | | | metadata | Nullable(String) | | | | ``` *"List all tables and their engines"* ``` Query executed successfully. Rows returned: 29 | name | engine | |-----------------------------|------------------------------| | Bar | MergeTree | | BasicTypes | MergeTree | | UserEvents_1_0 | MergeTree | | UserEvents_2_0 | ReplacingMergeTree | | ReplicatedMergeTreeTest | ReplicatedMergeTree | | BarAggregated_MV | MaterializedView | ... ``` *"Count the number of rows in Bar"* ``` Query executed successfully. Rows returned: 1 | total_rows | |------------| | 0 | ``` **Tip**: Ask the AI to discover table names first using "What tables exist in my project?" before querying them. Table names are case-sensitive in ClickHouse. **Use cases:** - Exploring data during development - Validating data transformations - Checking table schemas - Debugging SQL queries - Analyzing data patterns **Safety:** Only read-only operations are permitted (SELECT, SHOW, DESCRIBE, EXPLAIN). Write operations (INSERT, UPDATE, DELETE) and DDL statements (CREATE, ALTER, DROP) are blocked. --- ### `get_stream_sample` Sample recent messages from Kafka/Redpanda streaming topics. **What you can ask for:** - View recent messages from any stream/topic - Specify how many messages to sample - Get results in JSON or pretty-printed format - Inspect message structure and content **Example prompts:** *"Sample 5 messages from the Bar topic"* ```json { "stream_name": "Bar", "message_count": 5, "partition_count": 1, "messages": [ { "primaryKey": "e90c93be-d28b-47d6-b783-5725655c044f", "utcTimestamp": "+057480-11-24T20:39:59.000Z", "hasText": true, "textLength": 107 }, { "primaryKey": "b974f830-f28a-4a95-b61c-f65bfc607795", "utcTimestamp": "+057370-11-04T17:11:51.000Z", "hasText": true, "textLength": 166 }, ... ] } ``` *"What data is flowing through the BasicTypes stream?"* (pretty format) ```markdown # Stream Sample: BasicTypes Retrieved 3 message(s) from 1 partition(s) ## Message 1 { "id": "bt-001", "timestamp": "2024-10-09T12:00:00Z", "stringField": "hello world", "numberField": 42, "booleanField": true } ## Message 2 { "id": "bt-002", "timestamp": "2024-10-09T12:05:00Z", "stringField": "test", "numberField": 100, "booleanField": false } ... ``` **Tip**: Use "List all streaming topics" first to discover available streams in your project. **Use cases:** - Debugging data flow issues - Validating streaming transformations - Inspecting message formats - Troubleshooting pipeline failures --- ### `get_source` Look up the file path where a Moose component (table, stream, API, workflow, etc.) is defined. The tool searches the infrastructure map and returns the relative path plus snippet metadata. **What you can ask for:** - "Where is the `UserActivity` table defined?" - "Open the TypeScript file for the `orderEventPipeline` stream." - "Show me the FastAPI app behind the `analytics_api` web route." **Example prompt and response:** ``` You: "Find the source for the OrderItem stream" AI: Component: stream/OrderItem (TypeScript) File: src/ingest/order-items.ts Snippet: ); ``` **Use cases:** - Jump from CLI output or infra map entries to the actual code - Let AI assistants open the correct file before editing - Generate documentation references that include file paths The tool only reads paths within your workspace and never executes code. Pair it with `get_infra_map` for quick “find-then-open” workflows. --- ## Example Workflows ### Debugging a Data Pipeline ``` You: "My users table is empty, can you help me debug?" AI: Let me check the infrastructure and data flow. [Uses get_infra_map to see the pipeline] [Uses get_logs to check for errors] [Uses query_olap to verify table is empty] [Uses get_stream_sample to check if data is in the source stream] AI: I found the issue - there are errors in the logs showing the streaming function is failing. The stream has data, but it's not reaching the table. ``` ### Exploring a New Project ``` You: "Can you give me an overview of this Moose project?" AI: [Uses get_infra_map with format=summary] AI: This project has 3 tables (users, events, aggregated_metrics), 2 streaming topics, 4 API endpoints, and 1 workflow. The data flow appears to be: raw events → transformation → aggregated metrics. ``` ### Validating Data Transformations ``` You: "Did my streaming function correctly transform the user data?" AI: [Uses get_stream_sample on source stream] [Uses query_olap on destination table] AI: Yes, the transformation is working. I compared the source stream messages with the destination table records, and the enrichment fields are being added correctly. ``` ## Advanced Configuration ### Custom Port If you're running Moose on a non-default port, update the MCP configuration: ```bash export MOOSE_HTTP_PORT=8080 moose dev ``` Then update your MCP client configuration to use port 8080 instead of 4000. ### Disabling the MCP Server To run development mode without the MCP server: ```bash moose dev --mcp=false ``` ### Production Considerations The MCP server is designed for local development only. It provides direct access to your infrastructure and should **never** be exposed in production environments. The MCP server: - Runs only in development mode (`moose dev`) - Does not run in production mode (`moose prod`) - Provides read-only access to sensitive infrastructure - Should not be exposed over networks or proxied externally ## LLM-Optimized Documentation Feeds Before handing control to an AI assistant, prime it with a compact doc bundle so it understands Moose primitives and terminology. We publish TypeScript and Python versions at `/llm-ts.txt` and `/llm-py.txt`, with optional `?path=` filters for specific sections. See [LLM-optimized docs](/moose/llm-docs) for instructions on embedding these feeds into Claude, Cursor, Windsurf, or MCP clients alongside the live tools described above. ## Troubleshooting ### MCP Tools Not Appearing 1. Verify `moose dev` is running: `curl http://localhost:4000/mcp` 2. Check your AI client's MCP configuration is correct 3. Restart your AI client after updating configuration 4. Check the Moose logs for MCP-related errors: `moose logs --filter mcp` ### Connection Errors If your AI client can't connect to the MCP server: ```bash # Check if the dev server is running curl http://localhost:4000/health # Check MCP endpoint specifically curl -X POST http://localhost:4000/mcp \ -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","id":1,"method":"initialize"}' ``` ### Empty Results If tools return no data: - Verify your dev server has been running long enough to generate data - Check that infrastructure has been created: `moose ls` - Try ingesting test data: `moose peek ` ## Related Documentation - [Local Development](/moose/local-dev) - Development mode overview - [Moose CLI Reference](/moose/moose-cli) - CLI commands and flags - [Model Context Protocol](https://modelcontextprotocol.io) - MCP specification --- ## Observability Source: moose/metrics.mdx Unified observability for Moose across development and production—metrics console, health checks, Prometheus, OpenTelemetry, logging, and error tracking # Observability This page consolidates Moose observability for both local development and production environments. ## Local Development ### Metrics Console Moose provides a console to view live metrics from your Moose application. To launch the console, run: ```bash filename="Terminal" copy moose metrics ``` Use the arrow keys to move up and down rows in the endpoint table and press Enter to view more details about that endpoint. #### Endpoint Metrics Aggregated metrics for all endpoints: | Metric | Description | | :-------------------- | :---------------------------------------------------------------------------------- | | `AVERAGE LATENCY` | Average time in milliseconds it takes for a request to be processed by the endpoint | | `TOTAL # OF REQUESTS` | Total number of requests made to the endpoint | | `REQUESTS PER SECOND` | Average number of requests made per second to the endpoint | | `DATA IN` | Average number of bytes of data sent to all `/ingest` endpoints per second | | `DATA OUT` | Average number of bytes of data sent to all `/api` endpoints per second | Individual endpoint metrics: | Metric | Description | | :---------------------------- | :---------------------------------------------------------------------------------- | | `LATENCY` | Average time in milliseconds it takes for a request to be processed by the endpoint | | `# OF REQUESTS RECEIVED` | Total number of requests made to the endpoint | | `# OF MESSAGES SENT TO KAFKA` | Total number of messages sent to the Kafka topic | #### Stream → Table Sync Metrics | Metric | Description | | :---------- | :-------------------------------------------------------------------------------------------------- | | `MSG READ` | Total number of messages sent from `/ingest` API endpoint to the Kafka topic | | `LAG` | The number of messages that have been sent to the consumer but not yet received | | `MSG/SEC` | Average number of messages sent from `/ingest` API endpoint to the Kafka topic per second | | `BYTES/SEC` | Average number of bytes of data received by the ClickHouse consumer from the Kafka topic per second | #### Streaming Transformation Metrics For each streaming transformation: | Metric | Description | | :------------ | :---------------------------------------------------------------------------- | | `MSG IN` | Total number of messages passed into the streaming function | | `MSG IN/SEC` | Average number of messages passed into the streaming function per second | | `MSG OUT` | Total number of messages returned by the streaming function | | `MSG OUT/SEC` | Average number of messages returned by the streaming function per second | | `BYTES/SEC` | Average number of bytes of data returned by the streaming function per second | --- ## Production ### Health Monitoring Moose applications expose a health check endpoint at `/health` that returns a 200 OK response when the application is operational. This endpoint is used by container orchestration systems like Kubernetes to determine the health of your application. In production environments, we recommend configuring three types of probes: 1. Startup Probe: Gives Moose time to initialize before receiving traffic 2. Readiness Probe: Determines when the application is ready to receive traffic 3. Liveness Probe: Detects when the application is in a deadlocked state and needs to be restarted Learn more about how to configure health checks in your Kubernetes deployment. ### Prometheus Metrics Moose applications expose metrics in Prometheus format at the `/metrics` endpoint. These metrics include: - HTTP request latency histograms for each endpoint - Request counts and error rates - System metrics for the Moose process Example metrics output: ``` # HELP latency Latency of HTTP requests. # TYPE latency histogram latency_sum{method="POST",path="ingest/UserActivity"} 0.025 latency_count{method="POST",path="ingest/UserActivity"} 2 latency_bucket{le="0.001",method="POST",path="ingest/UserActivity"} 0 latency_bucket{le="0.01",method="POST",path="ingest/UserActivity"} 0 latency_bucket{le="0.02",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="0.05",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="0.1",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="0.25",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="0.5",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="1.0",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="5.0",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="10.0",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="30.0",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="60.0",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="120.0",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="240.0",method="POST",path="ingest/UserActivity"} 1 latency_bucket{le="+Inf",method="POST",path="ingest/UserActivity"} 1 ``` You can scrape these metrics using a Prometheus server or any compatible monitoring system. ### OpenTelemetry Integration In production deployments, Moose can export telemetry data using OpenTelemetry. Enable via environment variables: ``` MOOSE_TELEMETRY__ENABLED=true MOOSE_TELEMETRY__EXPORT_METRICS=true ``` When running in Kubernetes with an OpenTelemetry operator, you can configure automatic sidecar injection by adding annotations to your deployment: ```yaml metadata: annotations: "sidecar.opentelemetry.io/inject": "true" ``` ### Logging Configure structured logging via environment variables: ``` MOOSE_LOGGER__LEVEL=Info MOOSE_LOGGER__STDOUT=true MOOSE_LOGGER__FORMAT=Json ``` The JSON format is ideal for log aggregation systems (ELK Stack, Graylog, Loki, or cloud logging solutions). ### Production Monitoring Stack Recommended components: 1. Metrics Collection: Prometheus or cloud-native monitoring services 2. Log Aggregation: ELK Stack, Loki, or cloud logging solutions 3. Distributed Tracing: Jaeger or other OpenTelemetry-compatible backends 4. Alerting: Alertmanager or cloud provider alerting ### Error Tracking Integrate with systems like Sentry via environment variables: ``` SENTRY_DSN=https://your-sentry-dsn RUST_BACKTRACE=1 ``` Want this managed in production for you? Check out Boreal Cloud (from the makers of the Moose Stack). ## Feedback Join our Slack community to share feedback and get help with Moose. --- ## Migrations & Planning Source: moose/migrate.mdx How Moose handles infrastructure migrations and planning # Moose Migrate Moose's migration system works like version control for your infrastructure. It automatically detects changes in your code and applies them to your data infrastructure with confidence. Moose tracks changes across: - OLAP Tables and Materialized Views - Streaming Topics - API Endpoints - Workflows ## How It Works Moose collects all objects defined in your main file (main.py) and automatically generates infrastructure operations to match your code: ```python file="app/main.py" from pydantic import BaseModel from moose_lib import OlapTable, Stream class UserSchema(BaseModel): id: str name: str email: str users_table = OlapTable[UserSchema]("Users") user_events = Stream[UserSchema]("Users") ``` When you add these objects, Moose automatically creates: - A ClickHouse table named `Users` with the `UserSchema` - A Redpanda topic named `Users` with the `UserSchema` ## Development Workflow When running your code in development mode, Moose will automatically hot-reload migrations to your local infrastructure as you save code changes. ### Quick Start Start your development environment: ```bash filename="Terminal" copy moose dev ``` This automatically: 1. Recursively watches your `/app` directory for code changes 2. Parses objects defined in your main file 3. Compares the new objects with the current infrastructure state Moose stores internally 4. Generates and applies migrations in real-time based on the differences 5. Provides immediate feedback on any errors or warnings 6. Updates the internal state of your infrastructure to reflect the new state ### Example: Adding a New Table ```python file="app/main.py" {6} copy # Before users_table = OlapTable[UserSchema]("Users") # After (add analytics table) users_table = OlapTable[UserSchema]("Users") analytics_table = OlapTable[AnalyticsSchema]("Analytics") ``` **What happens:** - Moose detects the new `analyticsTable` object - Compares: "No Analytics table exists" - Generates migration: "Create Analytics table" - Applies migration automatically - Updates internal state In your terminal, you will see a log that shows the new table being created: ```bash ⠋ Processing Infrastructure changes from file watcher + Table: Analytics Version None - id: String, number: Int64, status: String - - deduplicate: false ``` ### Example: Schema Changes ```python file="app/main.py" {8} copy from moose_lib import Key # After (add age field) class UserSchema(BaseModel): id: Key[str] name: str email: str age: int # New field ``` **What happens:** - Moose detects the new `age` field - Generates migration: "Add age column to Users table" - Applies migration - Existing rows get NULL/default values ## Production Workflow Moose supports two deployment patterns: **Moose Server** and **Serverless**. ### Moose Server Deployments For deployments with a running Moose server, preview changes before applying: ```bash filename="Terminal" copy moose plan --url https://your-production-instance --token ``` Remote planning requires authentication: 1. Generate a token: `moose generate hash-token` 2. Configure your server: ```toml filename="moose.config.toml" copy [authentication] admin_api_key = "your-hashed-token" ``` 3. Use the token with `--token` flag **Deployment Flow:** 1. **Develop locally** with `moose dev` 2. **Test changes** in local environment 3. **Plan against production**: `moose plan --url --token ` 4. **Review changes** carefully 5. **Deploy** - Moose applies migrations automatically on startup ### Serverless Deployments For serverless deployments (no Moose server), use the ClickHouse connection directly: ```bash filename="Terminal" copy # Step 1: Generate migration files moose generate migration --clickhouse-url --save # Step 2: Preview changes in PR moose plan --clickhouse-url clickhouse://user:pass@host:port/database # Step 3: Execute migration after merge moose migrate --clickhouse-url ``` **Deployment Flow:** 1. **Develop locally** with `moose dev` 2. **Generate migration plan**: `moose generate migration --clickhouse-url --save` 3. **Create PR** with `plan.yaml`, `remote_state.json`, `local_infra_map.json` 4. **PR validation**: Run `moose plan --clickhouse-url ` in CI to preview changes 5. **Review** migration files and plan output 6. **Merge PR** 7. **Execute migration**: Run `moose migrate --clickhouse-url ` in CI/CD Requires `state_config.storage = "clickhouse"` in `moose.config.toml`: ```toml filename="moose.config.toml" copy [state_config] storage = "clickhouse" [features] olap = true data_models_v2 = true ``` Your ClickHouse instance needs the KeeperMap engine for state storage and migration locking. ✅ **ClickHouse Cloud**: Works out of the box ✅ **`moose dev` or `moose prod`**: Already configured ⚠️ **Self-hosted ClickHouse**: See [ClickHouse KeeperMap documentation](https://clickhouse.com/docs/en/engines/table-engines/special/keeper-map) for setup requirements ### Understanding Plan Output Moose shows exactly what will change: ```bash + Table: Analytics Version None - id: String, number: Int64, status: String - - deduplicate: false + Table: Users Version None - id: String, name: String, email: String - - deduplicate: false ``` ## Migration Types | Change Type | Infrastructure Impact | Data Impact | |-------------|----------------------|-------------| | **Add new object** | New table/stream/API created | No impact | | **Remove object** | Table/stream/API dropped | All data lost | | **Add field** | New column created | Existing rows get NULL/default | | **Remove field** | Column dropped | Data permanently lost | | **Change type** | Column altered | Data converted if compatible | ## Viewing Infrastructure State ### Via CLI ```bash # Check current infrastructure objects moose ls # View migration logs moose logs ``` ### Via Direct Connection Connect to your local infrastructure using details from `moose.config.toml`: ```toml file="moose.config.toml" [features] olap = true # ClickHouse for analytics streaming_engine = true # Redpanda for streaming workflows = false # Temporal for workflows [clickhouse_config] host = "localhost" host_port = 18123 native_port = 9000 db_name = "local" user = "panda" password = "pandapass" [redpanda_config] broker = "localhost:19092" message_timeout_ms = 1000 retention_ms = 30000 replication_factor = 1 ``` ## Best Practices ### Development - Use `moose dev` for all local development - Monitor plan outputs for warnings - Test schema changes with sample data ### Production - Always use remote planning before deployments - Review changes carefully in production plans - Maintain proper authentication - Test migrations in staging first ### Managing TTL Outside Moose If you're managing ClickHouse TTL settings through other tools or want to avoid migration failures from TTL drift, you can configure Moose to ignore TTL changes: ```toml filename="moose.config.toml" copy [migration_config] ignore_operations = ["ModifyTableTtl", "ModifyColumnTtl"] ``` This tells Moose to: - Skip generating TTL change operations in migration plans - Ignore TTL differences during drift detection You'll still get migrations for all other schema changes (adding tables, modifying columns, etc.), but TTL changes won't block your deployments. ## Troubleshooting ### Authentication Errors - Verify your authentication token - Generate a new token: `moose generate hash-token` - Check server configuration in `moose.config.toml` ### Migration Issues - Check `moose logs` for detailed error messages - Verify object definitions in your main file - Ensure all required fields are properly typed - **Stuck migration lock**: If you see "Migration already in progress" but no migration is running, wait 5 minutes for automatic expiry or manually clear it: ```sql DELETE FROM _MOOSE_STATE WHERE key = 'migration_lock'; ``` --- ## LifeCycle Management Source: moose/migrate/lifecycle.mdx Control how Moose manages database and streaming resources when your code changes # LifeCycle Management ## Overview The `LifeCycle` enum controls how Moose manages the lifecycle of database/streaming resources when your code changes. This feature gives you fine-grained control over whether Moose automatically updates your database schema or leaves it under external/manual control. ## LifeCycle Modes ### `FULLY_MANAGED` (Default) This is the default behavior where Moose has complete control over your database resources. When you change your data models, Moose will automatically: - Add new columns or tables - Remove columns or tables that no longer exist in your code - Modify existing column types and constraints This mode can perform destructive operations. Data may be lost if you remove fields from your data models or if you perform operations that require a destroy and recreate to be effective, like changing the `order_by_fields` field . ```py filename="FullyManagedExample.py" copy from moose_lib import OlapTable, OlapConfig, LifeCycle from pydantic import BaseModel class UserData(BaseModel): id: str name: str email: str # Default behavior - fully managed user_table = OlapTable[UserData]("users") # Explicit fully managed configuration explicit_table = OlapTable[UserData]("users", OlapConfig( order_by_fields=["id"], life_cycle=LifeCycle.FULLY_MANAGED )) ``` ### `DELETION_PROTECTED` This mode allows Moose to automatically add new database structures but prevents it from removing existing ones. Perfect for production environments where you want to evolve your schema safely without risking data loss. **What Moose will do:** - Add new columns, tables - Modify column types (if compatible) - Update non-destructive configurations **What Moose won't do:** - Drop columns or tables - Perform destructive schema changes ```py filename="DeletionProtectedExample.py" copy from moose_lib import IngestPipeline, IngestPipelineConfig, OlapConfig, StreamConfig, LifeCycle from pydantic import BaseModel from datetime import datetime class ProductEvent(BaseModel): id: str product_id: str timestamp: datetime action: str product_analytics = IngestPipeline[ProductEvent]("product_analytics", IngestPipelineConfig( table=OlapConfig( order_by_fields=["timestamp", "product_id"], engine=ClickHouseEngines.ReplacingMergeTree, ), stream=StreamConfig( parallelism=4, ), ingest_api=True, # automatically applied to the table and stream life_cycle=LifeCycle.DELETION_PROTECTED )) ``` ### `EXTERNALLY_MANAGED` This mode tells Moose to completely hands-off your resources. You become responsible for creating and managing the database schema. This is useful when: - You have existing database tables managed by another team - You're integrating with another system (e.g. PeerDB) - You have strict database change management processes With externally managed resources, you must ensure your database schema matches your data models exactly, or you may encounter runtime errors. ```py filename="ExternallyManagedExample.py" copy from moose_lib import Stream, OlapTable, OlapConfig, StreamConfig, LifeCycle, Key from pydantic import BaseModel from datetime import datetime class ExternalUserData(BaseModel): user_id: Key[str] full_name: str email_address: str created_at: datetime # Connect to existing database table legacy_user_table = OlapTable[ExternalUserData]("legacy_users", OlapConfig( life_cycle=LifeCycle.EXTERNALLY_MANAGED )) # Connect to existing Kafka topic legacy_stream = Stream[ExternalUserData]("legacy_user_stream", StreamConfig( life_cycle=LifeCycle.EXTERNALLY_MANAGED, destination=legacy_user_table )) ``` --- ## Migration Examples & Advanced Development Source: moose/migrate/migration-types.mdx Detailed migration examples and advanced development topics # Migration Types This guide provides detailed examples of different migration types. For the complete workflow overview, see [Migrations & Planning](/moose/migrate/planning). ## Adding New Infrastructure Components Keep in mind that only the modules that you have enabled in your `moose.config.toml` will be included in your migrations. ```toml file="moose.config.toml" [features] olap = true streaming_engine = true workflows = true ``` ### New OLAP Table or Materialized View ```python file="app/main.py" from pydantic import BaseModel from datetime import datetime from moose_lib import OlapTable class AnalyticsSchema(BaseModel): id: str event_type: str timestamp: datetime user_id: str value: float analytics_table = OlapTable[AnalyticsSchema]("Analytics") ``` **Migration Result:** Creates ClickHouse table `Analytics` with all fields from `AnalyticsSchema` If you have not enabled the `olap` feature flag, you will not be able to create new OLAP tables. ```toml file="moose.config.toml" [features] olap = true ``` Check out the OLAP migrations guide to learn more about the different migration modes. ### New Stream ```python file="app/main.py" user_events = Stream[UserSchema]("UserEvents") system_events = Stream[SystemEventSchema]("SystemEvents") ``` **Migration Result:** Creates Redpanda topics `UserEvents` and `SystemEvents` If you have not enabled the `streaming_engine` feature flag, you will not be able to create new streaming topics. ```toml file="moose.config.toml" [features] streaming_engine = true ``` ## Schema Modifications ### Adding Fields ```python file="app/main.py" # Before class UserSchema(BaseModel): id: str name: str email: str # After class UserSchema(BaseModel): id: str name: str email: str age: int created_at: datetime is_active: bool ``` **Migration Result:** Adds `age`, `created_at`, and `is_active` columns to existing table ### Removing Fields ```python file="app/main.py" # Before class UserSchema(BaseModel): id: str name: str email: str age: int deprecated_field: str # Will be removed # After class UserSchema(BaseModel): id: str name: str email: str age: int ``` **Migration Result:** Drops `deprecated_field` column (data permanently lost) ### Type Changes ```python file="app/main.py" # Before class UserSchema(BaseModel): id: str name: str email: str score: float # Will change to str # After class UserSchema(BaseModel): id: str name: str email: str score: str # Changed from float ``` **Migration Result:** Alters `score` column type (data converted if compatible) ## Removing Infrastructure ```python file="app/main.py" # Before users_table = OlapTable[UserSchema]("Users") analytics_table = OlapTable[AnalyticsSchema]("Analytics") deprecated_table = OlapTable[DeprecatedSchema]("Deprecated") # After (remove deprecated table) users_table = OlapTable[UserSchema]("Users") analytics_table = OlapTable[AnalyticsSchema]("Analytics") ``` **Migration Result:** Drops `Deprecated` table (all data lost) ## Working with Local Infrastructure There are two main ways to inspect your local infrastructure to see how your migrations are applied: ### Using the CLI Run `moose ls` to see the current state of your infrastructure: ```bash # Verify object definitions moose ls ``` ### Connecting to your local infrastructure You can also connect directly to your local infrastructure to see the state of your infrastructure. All credentials for your local infrastructure are located in your project config file (`moose.config.toml`). #### Connecting to ClickHouse ```bash # Using clickhouse-client clickhouse-client --host localhost --port 18123 --user panda --password pandapass --database local # Using connection string clickhouse-client "clickhouse://panda:pandapass@localhost:18123/local" ``` #### Connecting to Redpanda ```bash # Using kafka-console-consumer kafka-console-consumer --bootstrap-server localhost:19092 --topic UserEvents --from-beginning # Using kafka-console-producer kafka-console-producer --bootstrap-server localhost:19092 --topic UserEvents ``` #### Viewing Temporal Workflows Navigate to `http://localhost:8080` to view the Temporal UI and see registered workflows. ## Gotchas: Your dev server must be running to connect to your local infrastructure. ```bash moose dev ``` Only the modules that you have enabled in your `moose.config.toml` will be included in your migrations: ```toml file="moose.config.toml" [features] olap = true # Required for OLAP Tables and Materialized Views streaming_engine = true # Required for Streams workflows = true # Required for Workflows and Tasks ``` --- ## Moose CLI Reference Source: moose/moose-cli.mdx Moose CLI Reference # Moose CLI Reference ## Installation ```bash filename="Terminal" copy bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose ``` ## Core Commands ### Init Initializes a new Moose project. ```bash moose init --template