Getting Started

Your First Pipeline in 5 Minutes

Connect a data source, build a medallion pipeline, and schedule it all from the browser. No infrastructure to manage, no YAML to write.

Quick Start

Create your organization and workspace through the onboarding wizard

Connect a Source

Link GCS, BigQuery, or a REST API to your workspace

Build a Pipeline

Use the visual canvas or let the Manager generate one for you

Run & Schedule

Execute your pipeline and set an Airflow schedule

No setup required

Try It Live: 5-Minute Walkthrough

Want to see the full flow before connecting your own data? Use our public sample API. The same example is exercised by our automated smoke test, so the happy path is verified on every release.

AAdd the sample data source

In your workspace, open Data Sources and add a REST API source with these settings:

Source type: REST API

Base URL: https://dogfood-ecommerce.fly.dev

Endpoint: /orders

Method: GET

Auth type: API key

API key header: X-API-Key

API key: dogfood-key-2026

Data path: orders

Pagination: none

BBuild a three-node pipeline

Open the canvas and add three nodes in a line, source on the left, two transforms on the right:

Source (Ingestion)

Dogfood Orders

Linked to the data source you just added. Lands in bronze.dogfood_orders.

Transform (Cleaning)

Clean Orders

Output table: silver.dogfood_orders_cleaned

SELECT * FROM bronze.dogfood_orders

Transform (Aggregation)

Orders by Product

Output table: gold.dogfood_orders_by_product

SELECT
  product_id,
  COUNT(*) AS order_count,
  SUM(total_amount) AS total_revenue
FROM silver.dogfood_orders_cleaned
GROUP BY product_id

Connect them with edges so data flows source to silver to gold. The platform stores the SQL on each node and runs validation at execute time.

CExecute and verify

Click Run Now on the monitoring page. All three layers run as one execution. You should see:

The status header card flips to Running now, then Last run succeeded within roughly 30 seconds.
The execution history table lists one entry with row counts for each layer.
Open the Catalog. You will see three new tables under bronze, silver, and gold.
Click the gold table and open Snapshots to see the Iceberg snapshot timeline. Each run creates a new snapshot.

The dogfood API is deterministic: the same date always returns the same orders. If you re-run the pipeline back-to-back, you will see the row count match exactly across runs. That makes it useful for testing schedule changes and Silver/Gold SQL edits without worrying about source drift.

Sign Up & Set Up Your Workspace

The onboarding wizard walks you through creating your organization and workspace in four steps. Everything is collected up front and submitted together at the end.

Organization: name your organization, set a URL slug, and select your industry and team size

Workspace: create one or more workspaces (up to 5 during onboarding). Separate by environment, team, or project.

BYOC: configure your Bring Your Own Cloud deployment on GCP. Select your infrastructure tier and region, then the platform provisions everything. AWS and Azure support is coming soon.

Each workspace gets its own Apache Iceberg catalog, so data from different workspaces is fully isolated — even within the same organization.

Connect a Data Source

OptimaFlo currently supports Google Cloud Storage, BigQuery, and REST API connectors, with more on the way. The Ingestion Engineer guides you through the entire process conversationally; tell it what you want to connect and it handles authentication, browsing, file selection, validation, and schema inference.

Available Today

Google Cloud Storage
BigQuery
REST API (any endpoint)

Coming Soon

Amazon S3
Redshift
Snowflake
PostgreSQL
MySQL
GraphQL

Open Data Sources in the sidebar and click Add Source, this opens the Ingestion Engineer

Tell the agent what you want to connect (e.g. "Connect my GCS bucket gs://company-data") — it authenticates via OAuth for cloud sources or asks for credentials for databases

The agent browses your buckets, folders, or tables and lets you select specific files or datasets to ingest

It validates the connection, infers your schema, and creates the data source record — ready for use in a pipeline

For GCS and BigQuery, authentication happens via a Google OAuth popup — no service account keys to manage. The platform handles token refresh automatically.

No data of your own? Use the public sample API at https://dogfood-ecommerce.fly.devwith API key dogfood-key-2026. See the Try It Live walkthrough above for the full configuration.

Build Your Pipeline

Three ways to build: drag-and-drop on the visual canvas, let the Data Engineer create a pipeline from a prompt, or let the Manager handle the entire workflow end-to-end.

Visual Canvas

Drag nodes onto the canvas and connect them. Configure SQL for each transform node. Preview results before saving.

Add a Ingestion node linked to your data source
Add Cleaning and Aggregation nodes for transformations
Connect them with edges to define data flow
Write SQL or let the Analytics Engineer generate it

Data Engineer

Describe your transformations and the Data Engineer builds the full canvas: Source, Ingestion, Cleaning, and Aggregation nodes with SQL already written.

Open the Data Engineer panel in the canvas sidebar
Describe what you want to transform
Review the generated nodes and SQL
Apply to canvas with one click

Manager

End-to-end orchestration. The Manager connects sources, generates SQL, builds the canvas, and deploys the pipeline, all from one conversation.

"Connect my GCS bucket gs://sales-data, clean the CSVs, deduplicate on order_id, and create a monthly revenue summary."

Run and Schedule

Execute your pipeline manually or set a schedule. OptimaFlo generates an Apache Airflow DAG behind the scenes; you never touch Airflow config directly.

Click Execute to run the pipeline immediately. The platform auto-saves before executing

Monitor progress in the execution panel and watch each layer complete from Ingestion through Aggregation

Open Settings to set a schedule (hourly, daily, weekly, monthly, quarterly, or yearly). The platform converts your selection to an Airflow cron

Use Backfills to re-process historical date ranges when you change transformation logic

Backfills run sequentially by default to avoid Iceberg write conflicts. You can increase parallelism for independent tables.

Core Concepts: Ingestion, Cleaning, Aggregation

The medallion architecture is the backbone of every OptimaFlo pipeline.

Ingestion

Raw Data

Your source data lands here untouched. Every record is preserved in Apache Iceberg tables with full history, ACID transactions, and time-travel.

Zero transformations — exact copy of the source
Schema detected automatically on ingestion
Full history retention for compliance and replay
Partitioned for query performance

Cleaning

Cleaned Data

Cleaned, deduplicated, and type-cast. The Analytics Engineer generates transformations from plain English, then you review and approve before anything runs.

LLM-generated SQL from natural language
Preview results before committing
Deduplication, null handling, and type casting
Validated and schema-enforced before execution

Aggregation

Business Metrics

Aggregated, business-ready metrics and star schemas. Feed dashboards, exports, and analyst queries from a single source of truth.

Aggregations and business KPIs
Star schema for analytics
Incremental updates to minimize compute
Direct connection to BI dashboards

If You Get Stuck

The handful of issues that catch new users most often.

The pipeline ran, but Silver or Gold has 0 rows

Open the table in Catalog and check the Bronze table first. If Bronze is also empty, the data source is not returning rows; verify the API URL, auth header, and data_path in Data Sources. If Bronze has rows but Silver does not, the Silver SQL is filtering everything out: open the Silver node, click Preview, and inspect the result.

REST API source returns 401 or 403

The auth header name is case-sensitive on some servers. For the dogfood API, the header must be X-API-Key exactly. Double-check the api_key_header field in the data source config. If you are using a Bearer token, set auth_type to bearer instead of api_key.

Schema looks empty or wrong

REST APIs that wrap the array under a key need data_path set to that key. The dogfood orders endpoint wraps under {orders: [...]}, so data_path must be orders. If you see one column called data containing JSON, set flatten_json to true.

Snapshots tab shows one entry but the pipeline ran multiple times

The snapshot timeline groups by calendar date. The (N snapshots) badge next to each date header shows the actual count for that day. Multiple hourly runs on the same day all nest under one date header, with one card per run underneath.

Run Now button is disabled

The button is disabled until the live execution status loads from the backend. Wait a second or two and it will enable. If it stays disabled, refresh the page and check the browser console for an authentication error.

Failed run with no obvious cause

Open the execution history table on the monitoring page. Failed rows expand to show the full error_message with the stack trace. If the error references AppError code, look it up in the error taxonomy for the suggested fix.

What's Next

Deepen your knowledge with these guides.

Platform Architecture

Understand the 8-layer architecture and engine selection

AI Agents

Use the Manager to build pipelines from a single prompt

Pipeline Builder

Deep dive into the visual canvas and node types

BYOC Deployment

Deploy in your own GCP project for full data residency

Ready to build?

Create your first pipeline in minutes.

Start Building Contact Us

Connect a source, transform with SQL, and schedule with Airflow — all from the browser.

OptimaFlo

Enhancing data owners with a team of AI agents. From raw data to dashboards, all in your own cloud.