Getting Started

Your First Pipeline in 5 Minutes

Connect a data source, build a medallion pipeline, and schedule it all from the browser. No infrastructure to manage, no YAML to write.

Quick Start

1
Sign Up & Set Up
Create your organization and workspace through the onboarding wizard
2
Connect a Source
Link GCS, BigQuery, or a REST API to your workspace
3
Build a Pipeline
Use the visual canvas or let the Manager generate one for you
4
Run & Schedule
Execute your pipeline and set an Airflow schedule
No setup required

Try It Live: 5-Minute Walkthrough

Want to see the full flow before connecting your own data? Use our public sample API. The same example is exercised by our automated smoke test, so the happy path is verified on every release.

AAdd the sample data source

In your workspace, open Data Sources and add a REST API source with these settings:

Source type: REST API
Base URL: https://dogfood-ecommerce.fly.dev
Endpoint: /orders
Method: GET
Auth type: API key
API key header: X-API-Key
API key: dogfood-key-2026
Data path: orders
Pagination: none

BBuild a three-node pipeline

Open the canvas and add three nodes in a line, source on the left, two transforms on the right:

Source (Ingestion)
Dogfood Orders
Linked to the data source you just added. Lands in bronze.dogfood_orders.
Transform (Cleaning)
Clean Orders
Output table: silver.dogfood_orders_cleaned
SELECT * FROM bronze.dogfood_orders
Transform (Aggregation)
Orders by Product
Output table: gold.dogfood_orders_by_product
SELECT
  product_id,
  COUNT(*) AS order_count,
  SUM(total_amount) AS total_revenue
FROM silver.dogfood_orders_cleaned
GROUP BY product_id

Connect them with edges so data flows source to silver to gold. The platform stores the SQL on each node and runs validation at execute time.

CExecute and verify

Click Run Now on the monitoring page. All three layers run as one execution. You should see:

  • The status header card flips to Running now, then Last run succeeded within roughly 30 seconds.
  • The execution history table lists one entry with row counts for each layer.
  • Open the Catalog. You will see three new tables under bronze, silver, and gold.
  • Click the gold table and open Snapshots to see the Iceberg snapshot timeline. Each run creates a new snapshot.

The dogfood API is deterministic: the same date always returns the same orders. If you re-run the pipeline back-to-back, you will see the row count match exactly across runs. That makes it useful for testing schedule changes and Silver/Gold SQL edits without worrying about source drift.

1

Sign Up & Set Up Your Workspace

The onboarding wizard walks you through creating your organization and workspace in four steps. Everything is collected up front and submitted together at the end.

1

Sign in at optimaflo.io/sign-in with Google, Microsoft, Amazon, or email

2

Organization: name your organization, set a URL slug, and select your industry and team size

3

Workspace: create one or more workspaces (up to 5 during onboarding). Separate by environment, team, or project.

4

BYOC: configure your Bring Your Own Cloud deployment on GCP. Select your infrastructure tier and region, then the platform provisions everything. AWS and Azure support is coming soon.

Each workspace gets its own Apache Iceberg catalog, so data from different workspaces is fully isolated — even within the same organization.

2

Connect a Data Source

OptimaFlo currently supports Google Cloud Storage, BigQuery, and REST API connectors, with more on the way. The Ingestion Engineer guides you through the entire process conversationally; tell it what you want to connect and it handles authentication, browsing, file selection, validation, and schema inference.

Available Today

  • Google Cloud Storage
  • BigQuery
  • REST API (any endpoint)

Coming Soon

  • Amazon S3
  • Redshift
  • Snowflake
  • PostgreSQL
  • MySQL
  • GraphQL
1

Open Data Sources in the sidebar and click Add Source, this opens the Ingestion Engineer

2

Tell the agent what you want to connect (e.g. "Connect my GCS bucket gs://company-data") — it authenticates via OAuth for cloud sources or asks for credentials for databases

3

The agent browses your buckets, folders, or tables and lets you select specific files or datasets to ingest

4

It validates the connection, infers your schema, and creates the data source record — ready for use in a pipeline

For GCS and BigQuery, authentication happens via a Google OAuth popup — no service account keys to manage. The platform handles token refresh automatically.

No data of your own? Use the public sample API at https://dogfood-ecommerce.fly.devwith API key dogfood-key-2026. See the Try It Live walkthrough above for the full configuration.

3

Build Your Pipeline

Three ways to build: drag-and-drop on the visual canvas, let the Data Engineer create a pipeline from a prompt, or let the Manager handle the entire workflow end-to-end.

Visual Canvas
Drag nodes onto the canvas and connect them. Configure SQL for each transform node. Preview results before saving.
  • Add a Ingestion node linked to your data source
  • Add Cleaning and Aggregation nodes for transformations
  • Connect them with edges to define data flow
  • Write SQL or let the Analytics Engineer generate it
Data Engineer
Describe your transformations and the Data Engineer builds the full canvas: Source, Ingestion, Cleaning, and Aggregation nodes with SQL already written.
  • Open the Data Engineer panel in the canvas sidebar
  • Describe what you want to transform
  • Review the generated nodes and SQL
  • Apply to canvas with one click
Manager
End-to-end orchestration. The Manager connects sources, generates SQL, builds the canvas, and deploys the pipeline, all from one conversation.

"Connect my GCS bucket gs://sales-data, clean the CSVs, deduplicate on order_id, and create a monthly revenue summary."

4

Run and Schedule

Execute your pipeline manually or set a schedule. OptimaFlo generates an Apache Airflow DAG behind the scenes; you never touch Airflow config directly.

1

Click Execute to run the pipeline immediately. The platform auto-saves before executing

2

Monitor progress in the execution panel and watch each layer complete from Ingestion through Aggregation

3

Open Settings to set a schedule (hourly, daily, weekly, monthly, quarterly, or yearly). The platform converts your selection to an Airflow cron

4

Use Backfills to re-process historical date ranges when you change transformation logic

Backfills run sequentially by default to avoid Iceberg write conflicts. You can increase parallelism for independent tables.

Core Concepts: Ingestion, Cleaning, Aggregation

The medallion architecture is the backbone of every OptimaFlo pipeline.

Ingestion

Raw Data

Your source data lands here untouched. Every record is preserved in Apache Iceberg tables with full history, ACID transactions, and time-travel.

  • Zero transformations — exact copy of the source
  • Schema detected automatically on ingestion
  • Full history retention for compliance and replay
  • Partitioned for query performance

Cleaning

Cleaned Data

Cleaned, deduplicated, and type-cast. The Analytics Engineer generates transformations from plain English, then you review and approve before anything runs.

  • LLM-generated SQL from natural language
  • Preview results before committing
  • Deduplication, null handling, and type casting
  • Validated and schema-enforced before execution

Aggregation

Business Metrics

Aggregated, business-ready metrics and star schemas. Feed dashboards, exports, and analyst queries from a single source of truth.

  • Aggregations and business KPIs
  • Star schema for analytics
  • Incremental updates to minimize compute
  • Direct connection to BI dashboards

If You Get Stuck

The handful of issues that catch new users most often.

The pipeline ran, but Silver or Gold has 0 rows

Open the table in Catalog and check the Bronze table first. If Bronze is also empty, the data source is not returning rows; verify the API URL, auth header, and data_path in Data Sources. If Bronze has rows but Silver does not, the Silver SQL is filtering everything out: open the Silver node, click Preview, and inspect the result.

REST API source returns 401 or 403

The auth header name is case-sensitive on some servers. For the dogfood API, the header must be X-API-Key exactly. Double-check the api_key_header field in the data source config. If you are using a Bearer token, set auth_type to bearer instead of api_key.

Schema looks empty or wrong

REST APIs that wrap the array under a key need data_path set to that key. The dogfood orders endpoint wraps under {orders: [...]}, so data_path must be orders. If you see one column called data containing JSON, set flatten_json to true.

Snapshots tab shows one entry but the pipeline ran multiple times

The snapshot timeline groups by calendar date. The (N snapshots) badge next to each date header shows the actual count for that day. Multiple hourly runs on the same day all nest under one date header, with one card per run underneath.

Run Now button is disabled

The button is disabled until the live execution status loads from the backend. Wait a second or two and it will enable. If it stays disabled, refresh the page and check the browser console for an authentication error.

Failed run with no obvious cause

Open the execution history table on the monitoring page. Failed rows expand to show the full error_message with the stack trace. If the error references AppError code, look it up in the error taxonomy for the suggested fix.

Ready to build?

Create your first pipeline in minutes.

Connect a source, transform with SQL, and schedule with Airflow — all from the browser.

Enhancing data owners with a team of AI agents. From raw data to dashboards, all in your own cloud.

© 2026 OptimaFlo. All rights reserved.

We value your privacy

We use cookies to enhance your browsing experience, serve personalized content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies. You can customize your preferences or learn more in our Cookie Policy and Privacy Policy.