In-Database Transformations
Viewing typescript
switch to python
Overview
Materialized views allow you to precompute and store the results of complex queries. This can be useful for improving query performance, reducing the load on your database, and providing a more efficient way to access data.
How it works:
Model your target query result set
Write the schema of your materialized view result set as a data model
Define your transformation logic
Write a SELECT query to compute the result set
Moose generates the infrastructure
Moose turns your query into a materialized view and a physical table to store the results
Basic Example
import { MaterializedView, sql } from "@514labs/moose-lib";
import { sourceTable } from "path/to/SourceTable";
// 1. Define the schema of the materialized view result set
interface TargetSchema {
id: string;
average_rating: number;
num_reviews: number;
}
// 2. Write a SELECT query to compute the result set
const exampleMaterializedView = new MaterializedView<TargetSchema>({
selectStatement: sql` SELECT * FROM ${sourceTable}`,
tableName: "my_table",
materializedViewName: "example_materialized_view",
});
MooseTip:
Use the sql
function to interpolate tables and columns from your project
from moose_lib import MaterializedView, sql
# 1. Define the schema of the materialized view result set
class TargetSchema(BaseModel):
id: str
average_rating: float
num_reviews: int
# 2. Write a SELECT query to compute the result set
example_materialized_view = MaterializedView[TargetSchema](
select_statement=f"SELECT * FROM source_table_name",
table_name="my_table",
materialized_view_name="example_materialized_view",
)
Key benefits:
Automatic refresh on new data
Supports incremental aggregations
Type-safe schema definitions
Configuration Options
interface MaterializedViewOptions<T> {
selectStatement: sql; // sql query to compute the materialized view
tableName: string; // name of the table where materialized view result set will be stored
materializedViewName: string; // name of the materialized view in the database
engine?: ClickHouseEngines; // default is MergeTree
orderByFields?: (keyof T & string)[]; // default is the primary key of the source table
}
from moose_lib import MaterializedView, sql
from source_table import source_table
class MaterializedViewOptions(BaseModel):
select_statement: str
table_name: str
materialized_view_name: str
engine: ClickHouseEngines = ClickHouseEngines.MergeTree
order_by_fields: List[str] = []
Working with Aggregations
Materialized views support aggregations using the Aggregated
type in your schema definition and the AggregatingMergeTree
engine. This is crucial for efficiently maintaining pre-aggregated data in real-time systems.
Why Use ClickHouse Aggregate Functions
ClickHouse’s aggregate functions offer several advantages when used in materialized views:
- Incremental computation: ClickHouse can update aggregations incrementally when new data arrives, rather than recomputing the entire result set.
- Memory efficiency: Aggregate states are stored in a compact format optimized for the specific function.
- High performance: Native implementation of complex statistical functions that scale with your data volume.
- Type safety: The
Aggregated
type in Moose ensures correct mapping between your code and ClickHouse aggregation types.
When using the AggregatingMergeTree
engine, ClickHouse maintains the state of your aggregations between updates, making it much more efficient than re-aggregating from raw data each time.
Common Aggregate Functions
In TypeScript, the Aggregated
type is used to define fields in your schema that involve aggregation functions. This type ensures that the data is correctly aggregated and mapped to the appropriate ClickHouse function.
Aggregated<AggregationFunction extends string, ArgTypes extends any[]>
AggregationFunction
: A string that specifies the aggregation function to be applied. Common functions includesum
,avg
,count
,uniq
,min
, andmax
.ArgTypes
: An optional array of types that define the parameters the aggregation function will operate on. This ensures type safety and helps in mapping the function to the correct data types.
In Python, the Aggregated
type is used to define fields in your schema that involve aggregation functions. This type ensures that the data is correctly aggregated and mapped to the appropriate ClickHouse function.
from moose_lib import Aggregated
Aggregated[U, AggregateFunction(...)]
U
: Represents the type of the aggregated result. For example, float for sums and averages, int for counts.AggregateFunction(...)
: Specifies the aggregation operation to be performed. It includes:agg_func
: A string indicating the aggregation function, such as “sum”, “avg”, “count”, “uniq”, “min”, or “max”.param_types
: An optional list that defines the types of parameters the aggregation function will operate on. This ensures type safety and clarity in the schema definition.
MooseTip:
View the ClickHouse documentation for more information on the aggregate functions:
Aggregation Example
import { MaterializedView, ClickHouseEngines, Aggregated, sql } from "@514labs/moose-lib";
import { sourceTable } from "./SourceTable";
interface TargetTableSchema {
id: string;
average_rating: number & Aggregated<"avg", [number]>;
num_reviews: number & Aggregated<"sum", [number]>;
}
const query = sql`
SELECT
id,
avgState(${sourceTable.columns.rating}) as average_rating,
countState(${sourceTable.columns.id}) as num_reviews
FROM ${sourceTable}
GROUP BY ${sourceTable.columns.id}
`;
const exampleMaterializedView = new MaterializedView<TargetTableSchema>({
selectStatement: query,
tableName: "my_table",
materializedViewName: "example_materialized_view",
engine: ClickHouseEngines.AggregatingMergeTree,
orderByFields: ["id"],
});
from moose_lib import MaterializedView, ClickHouseEngines, Aggregated, AggregateFunction, MaterializedViewOptions
from app.source_table import source_table
class TargetTableSchema(BaseModel):
id: str
average_rating: float & Aggregated[float, AggregateFunction(agg_func="avg", param_types=[float])]
num_reviews: int & Aggregated[int, AggregateFunction(agg_func="sum", param_types=[int])]
query = f"""
SELECT
id,
avgState(rating) as average_rating,
countState(id) as num_reviews
FROM {source_table.name}
GROUP BY id
"""
example_materialized_view = MaterializedView[TargetTableSchema](MaterializedViewOptions(
select_statement=query,
table_name="my_table",
materialized_view_name="example_materialized_view",
engine=ClickHouseEngines.AggregatingMergeTree,
order_by_fields=["id"],
))
Note
When using the AggregatingMergeTree
engine, you need to use the State
suffix at the end of your aggregate function in your SQL query:
avgState(rating) as average_rating,
countState(id) as num_reviews
When querying the materialized view, you need to use the Merge
suffix at the end of your aggregate function:
SELECT avgMerge(average_rating), countMerge(num_reviews) FROM my_table
Best Practices for Aggregations
When working with aggregated materialized views:
- Choose the right engine: Always use
AggregatingMergeTree
for views with aggregate functions - Match schema to query: Ensure your TargetSchema fields match the aggregate functions in your query
- Consider data volume: Aggregations are most valuable for large datasets where full scans are expensive
- Group wisely: Be thoughtful about your GROUP BY clauses as they affect both storage and query patterns
- Combine with filtering: Pre-filter data in your select statement to improve performance
When these practices are followed, materialized views with aggregations can provide dramatic performance improvements for analytical queries while maintaining near real-time data freshness.
Querying Materialized Views
You can query materialized views just like any other table in Moose. When instantiating a MaterializedView
, Moose will create an OlapTable
for the materialized view result set. You can access the OlapTable
from the MaterializedView
object:
import { sql } from "@514labs/moose-lib";
// Query the materialized view
const query = sql`SELECT * FROM ${exampleMaterializedView.targetTable}`;
// Retrieve the materialized view result set as an OlapTable
const table: OlapTable<TargetTableSchema> = exampleMaterializedView.targetTable;
from moose_lib import OlapTable
## Querying the materialized view
query = f"SELECT * FROM {example_materialized_view.target_table}"
## Or
table: OlapTable[TargetTableSchema] = example_materialized_view.target_table
How It Works
When you create a materialized view in Moose:
- A new table is automatically created in your database using the provided schema
- The
selectStatement
is used to compute the initial results - The results are stored in the new table
- In ClickHouse, materialized views function as triggers on the source table
- Whenever new data is inserted into the source table, the materialized view automatically updates
- The update uses the
selectStatement
select_statement
to compute the latest results from the source data
This ensures your precomputed views stay in sync with your underlying data without manual intervention.
MooseTip:
🚀 Help Shape the Future of Moose
We’re planning to improve the SQL query interpolation developer experience in Python when Python template strings are released (see discussion).
We’d love your feedback and ideas!
Join our community on Slack to discuss, share your use cases, and help us build the best experience for you.