Aurora - Automated Data Engineering

Welcome to Aurora

Install Aurora and Moose CLIs:

bash -i <(curl -fsSL https://fiveonefour.com/install.sh) aurora,moose

What is Aurora?

Aurora is a set of tools to make your chat, copilot or BI fluent in data engineering. Use the CLI to set up MCPs with the tools you need in the clients you use. Create new data engineering projects with a moose-managed ClickHouse based infrastructure or use these agents with your existing data infrastructure.

Quickstart Guides

AI Chat with ClickHouse

AI analytics engineering on ClickHouse

AI powered OLAP templates

Core features

Aurora offers a comprehensive suite of MCP tools and agents designed to streamline data engineering workflows, enabling faster and more efficient deployment and management.

CLI based deployment

3 steps in CLI to chat with ClickHouse
5 minutes to build a full stack OLAP project
Template based new projects for building your own infrastructure

Client agnostic

Data engineering in your IDE
Data analytics and analytics engineering in your chat client
BI in your BI tool of choice

Infrastructure agnostic, opinionated stack available

Opinionated OLAP deployments with Moose: Optimized ClickHouse development and deployment
Direct integration with your architecture: DuckDB, Snowflake, Databricks
Integration with your enterprise: metadata, CICD, logging and more

Context-aware data engineering agents

Full-stack context: code, logs, data, docs
Self-improving feedback loops
Embedded metadata for continuity

Enterprise ready by default

Each agent: context gathering → implementation → testing → doc workflows
Governance defaults, easily configurable: SDLC, data quality, reporting, privacy and security default practices
Learns your policies with minimal context, enforces them automatically

Why Aurora exists

LLMs don't have the tools or context they need to be good data engineers, or to make good data engineering cyborgs.

LLM tools are hard to set up, and usually small in scope

It takes me minutes to set up an MCP server, and I have to hold my breath every time I want to use it.

Shallow context makes bad data

If I'm just looking at the data, if I'm just looking at the code, if I'm just looking at the logs, if i'm just looking at the docs, I'm not going to make good decisions. I need all that context.

I don't want to start from scratch to use AI

I've already got an entire data infrastructure, why should I start from scratch to use AI?

Brittle data engineering agents make for annoying coworkers

I want my agents to be easy to prompt, to be able to test their work, and to be able to fit within my SDLC.

The DIY Approach

How would I prompt my LLM to create a new egress API on top of ClickHouse.

Use an LLM to create a SQL query representing the data I want to expose

Iterate on that until I'm happy with the query, and ready to create the egress API

Write a SQL query to get the DDL for that database

Open my IDE to wherever I have my egress API code, and feed its LLM the DDL as context

Manually add sample data as further context

Manually add code examples as further context

Replicate my ClickHouse database locally

Prompt chat to create the egress API

Manually test the egress API

Deploy the egress API

The Aurora Approach

How would I use Aurora to create a new egress API on top of ClickHouse.

Install Aurora and Moose

Create a project from my data in ClickHouse

Prompt Aurora to create the egress API with the given business requirements. Context added automatically. Used by Aurora to build egress API and test it.

Deploy

What jobs can you do with Aurora?

Ad hoc analytics

Give your LLM client a way to chat with your data, in ClickHouse, Databricks, Snowflake, or wherever it lives

Analytics Engineering

Agents that can build new data products, materialized views and egress methods

Data Engineering

Have agents build and test end to end data pipelines

Data Wrangling

Agents that interact with your data systems like DuckDB, Databricks, Snowflake and more to create scripts, clean data, and prepare data for use

Data Migration

Automated creation of data pipelines to migrate data from legacy systems to a modern data backend

Data quality, governance and reporting

Agents that can help you enforce data quality, governance and reporting during development and in run time

Getting Involved / Giving Feedback / Community

Join our Slack

Join our community, we'd love to hear from you!

Talk to the Aurora Team

We're always looking to improve Moose and would love to hear from you!

GitHub

Star, check out the code, and contribute on GitHub

Getting Started