How I built Clinikally's entire analytics infrastructure from scratch using DLT, BigQuery, dbt, and Metabase to serve cross-functional teams.
When you're scaling an e-commerce platform serving 100K+ daily users, gut feelings don't cut it. We needed data—clean, reliable, accessible data—for everyone from finance to marketing to make informed decisions.
The challenge? We had data scattered across Shopify, Google Analytics, Google Ads, and various other tools, but no way to connect the dots. No single source of truth. No automated reporting. Just manual exports and endless spreadsheets.
I built our entire analytics infrastructure from the ground up. Here's what that looked like:
Rather than writing custom API connectors from scratch, I used DLT (Data Load Tool) to build a robust extraction layer:
DLT handled the heavy lifting—incremental loads, schema evolution, API rate limiting—while I focused on what mattered: getting the right data flowing.
All raw data landed directly in BigQuery, our data warehouse.
Raw data is messy. dbt helped me transform it into something useful.
I structured the transformation layer following analytics engineering best practices:
Staging models → Clean, standardized raw data
Intermediate models → Business logic layer
Mart models → Final, business-ready datasets
The beauty of dbt? Everything is version-controlled, tested, and documented. When marketing asks "how is CAC calculated?", I can point them to the exact SQL transformation with inline documentation.
With clean data in BigQuery, I deployed Metabase as our BI layer.
What I built:
The key was making it self-serve. Instead of analysts being bottlenecks, teams could answer their own questions.
Before:
After:
Why BigQuery over Redshift/Snowflake? Google's ecosystem integration. We were already using Google Analytics and Ads, so native connectors made sense.
Why dbt over Dataform/Airflow transformations? Developer experience. dbt's testing framework, documentation, and lineage graphs are unmatched.
Why DLT? It's built for the modern data stack. Schema evolution, incremental loading, and Python-first approach meant I could move fast.
Start with business needs, not tech: I didn't build models nobody used. Every table had a clear stakeholder and use case.
Document everything: Six months later, you won't remember why you filtered out that one weird edge case. Your dbt models should read like a story.
Version control is non-negotiable: All transformations in git. All migrations tracked. No cowboy SQL in production.
Incremental beats perfect: Shipped v1 with just Shopify data. Added GA and Ads data later. Got value fast.
Currently exploring:
But that's a post for another day.
Stack Summary:
Want to know more about any specific part of this stack? Drop me a message.