
The medallion architecture organizes a data platform into three layers, Raw, Refined, and Curated, that take source data and refine it step by step into trusted, business-ready tables. It is the most common blueprint for the modern data lakehouse, and it is the foundation OptimaFlo is built on. This guide explains each layer in plain English, then shows how to stand one up without hiring a data team.
The medallion architecture is a data design pattern popularized by Databricks. It splits data processing into three progressively cleaner layers. You may know these layers by their traditional names, bronze, silver, and gold; we use clearer terms: Raw, Refined, and Curated. Each stage adds structure, quality, and business meaning:
The point is simple. You never transform data in one big, fragile step. You move it through three small, reliable ones, and every stage stays queryable and auditable.
Raw is your system of record. Data lands here exactly as the source sent it, with zero transforms and full history. Nothing is thrown away. If a downstream rule turns out to be wrong, you can always reprocess from Raw.
Because Raw keeps everything, the table format matters. OptimaFlo stores every layer in an open lakehouse table format, which gives Raw ACID transactions, schema evolution, and time travel. You can query the table as it looked last Tuesday, not just today.
Refined is where source data becomes reliable. This layer validates types, removes duplicates, fixes nulls, and joins sources into clean, conformed tables. It is the layer most analysts actually build on.
Refined is also where SQL lives. In a traditional stack this is the dbt layer, written and maintained by data engineers. The transforms are version-controlled and reviewed, because a bad Refined query quietly poisons everything downstream.
Curated is the layer your business sees. It holds aggregated metrics, star schemas, and the tables that power dashboards and reports. Curated tables are usually updated incrementally, so a daily metric refreshes in seconds instead of rebuilding from scratch.
Here is how the three layers compare:
| Layer | Holds | Transforms | Typical consumer |
|---|---|---|---|
| Raw | Source data, full history | None | Reprocessing, audits |
| Refined | Cleaned, validated, joined | Type, dedupe, join | Analysts, data scientists |
| Curated | Metrics, star schema | Aggregate, model | Dashboards, executives |
The pattern is popular because it solves real problems at once:
The architecture is elegant. Building it is not. A typical stack stitches together six to ten tools: ingestion (Fivetran), a warehouse (Snowflake), transforms (dbt), orchestration (Airflow), a BI tool (Tableau), and data observability (Monte Carlo). Each one is a contract, a bill, and a thing that breaks at 2 a.m.
It also takes people. Someone has to write the Refined and Curated SQL, wire up the orchestration, and own the schemas. For a data-rich team without a dedicated data engineering function, that is the wall most projects hit.
OptimaFlo is one platform that runs the full Raw, Refined, Curated flow inside your own cloud. It is built to remove the stack and the staffing problem at the same time.
The result: you can go from your first data source connection to a live, Curated-backed dashboard in a single afternoon. See the getting started guide for the five-minute version, or the architecture overview for the full picture.
A Refined model is just SQL: take the Raw table, cast the types, drop the bad rows, and write a clean table. OptimaFlo generates SQL like this, and you approve it before it runs:
SELECT
order_id,
customer_id,
CAST(order_total AS DECIMAL(12, 2)) AS order_total,
DATE_TRUNC('day', ordered_at) AS order_day
FROM raw.orders
WHERE order_total IS NOT NULL;That clean refined.orders table then feeds a Curated metric, like daily revenue by customer segment, which lands in your dashboard.
What are the Raw, Refined, and Curated layers? Raw stores source data with full history and no transforms. Refined stores cleaned, validated, and joined data. Curated stores business-ready metrics and star-schema tables for dashboards.
Is the medallion architecture only for Databricks? No. Databricks popularized the name, but the pattern is tool-agnostic. Any lakehouse built on an open table format can implement the Raw, Refined, and Curated layers.
Do I need Spark to use it? No. Spark shines on very large data, but for most workloads a lighter engine is faster and cheaper. OptimaFlo matches the compute to your data automatically, so you never have to choose.
How is it different from a traditional ELT data warehouse? ELT usually loads source data and transforms it inside one warehouse. The medallion architecture formalizes that into three explicit, queryable layers on an open lakehouse, which adds reproducibility, time travel, and clearer separation of concerns.
Can I build a medallion architecture without a data team? Yes. OptimaFlo generates the Refined and Curated SQL with an LLM, runs every layer on an open lakehouse in your own cloud, and replaces the six-to-ten-tool stack with one platform, so you can stand up the full flow without dedicated data engineers.
Ready to build your first Raw-to-Curated pipeline? Start building with OptimaFlo or read the getting started guide.
Enhancing data owners with a team of AI agents. From raw data to dashboards, all in your own cloud.
© 2026 OptimaFlo. All rights reserved.
We use cookies to enhance your browsing experience, serve personalized content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies. You can customize your preferences or learn more in our Cookie Policy and Privacy Policy.