- TechOps Examples
- Posts
- How One BigQuery Query Costs Shopify $1,000,000 a Month?
How One BigQuery Query Costs Shopify $1,000,000 a Month?
Hey — It's Govardhana MK 👋
I always find Thursdays to be that perfect balance between the hustle of the week and the anticipation of the weekend – hope you too!
Enjoy 1 use case, 2 trends/updates, and 3 free resources/tutorials.
Use Case
How One BigQuery Query Costs Shopify $1,000,000 a Month?
Shopify - 2022:
Calvin, a Shopify software engineer, and his team were shocked to discover they were already charged $1.2 million for a data pipeline.
Act 1: The Back Drop
His team built a data pipeline for a new marketing tool, rolled out in an early release to a select group of Shopify merchants. Merchant data was sent to Kafka and processed with Apache Flink for calculations.
For someone new to Kafka...
Kafka is a distributed streaming platform for real-time data pipelines and streaming applications. It uses a publish-subscribe model where producers send data to topics—containers for records that consumers can read.
Topics can be split into partitions, distributed across brokers. Brokers handle data storage, replication, and partitioning for high availability and fault tolerance.
Using Kafka between data sources and Apache Flink gave the team advantages with distributed, real-time data.
1. It decouples producers from consumers, allowing them to work independently.
2. It manages buffering and backpressure, preventing Flink from being overwhelmed.
3. It allows for data retention and replay in case of failures.
Apache Flink processes data streams, runs calculations, and stores the state in RocksDB, a key-value store optimized for fast access. This stateful approach tracks processed data.
The setup handled a small group of merchants, but they were already ingesting over 1 billion rows. With Shopify's global merchant base, the pipeline couldn't scale beyond the initial release.
To improve efficiency, they decided to offload part of the Flink pipeline to an external SQL-based data warehouse.
Flink would submit queries, and results would be written to Cloud Storage, removing data ingestion from the pipeline and boosting throughput for the general release.
When choosing a data warehouse, they had three requirements:
1. It should automatically load data daily.
2. It should handle 60 requests per minute.
3. It should export results to Cloud Storage.
They chose Google Cloud BigQuery, designed for large-scale data processing, capable of querying petabytes of data in seconds.
For context, 1 PB = 1,024 TB or 1,048,576 GB.
Act 2: The $1 Million Mistake
Shopify already had an internal tool Calvin's team used to load their billion rows into BigQuery.
After the load, their first query scanned 75 GB of data, which was a shock.
Remember, they needed to handle 60 requests/min.
So, the math comes to:
60 RPM X 60 mins/hr X 24 hrs/day X 30 days/month = 2,592,000 queries per month
Multiply that by 75 GB, and you get:
2,592,000 X 75 GB = 194,400 TB per month
With BigQuery’s on-demand pricing at that time:
Total Cost = $949,218.75 (~$1M per month)
The exact query is unknown, but it likely involved selecting everything from a table based on conditions like timestamp / geography.
If the data in BigQuery isn't sorted by those conditions, it scans massive amounts of data. They knew clustering their tables was necessary.
Clustering sorts data by columns, usually those in the WHERE condition.
For example, clustering by timestamp and geography helps BigQuery scan only relevant data, reducing the amount scanned significantly.
More on clustered tables in BigQuery HERE.
After clustering, the same query scanned just 108.3 MB of data.
Let’s re-calculate the cost:
60 RPM X 60 mins/hr X 24 hrs/day X 30 days/month = 2,592,000 queries per month
2,592,000 X 0.1 GB = 259,200 GB per month
Total Cost = $1,370.67 per month
Saved = $947,848.08 per month
A big win!
But they didn’t stop there.
They found more ways to optimize costs:
1. Avoid SELECT *
– Only select necessary columns.
2. Partitioned tables – Reduces scanning based on query patterns.
More on partitioned tables in BigQuery HERE.
3. Avoid running queries to explore or preview data—it costs money! Use BigQuery's free preview option instead.
Partitioning and clustering aren't magic, but fundamentals that can have a huge impact, as seen here.
Shopify reduced a nearly $1 million/month pipeline to just $1,370.
Hopefully, you can apply these lessons at your job.
ELEKS' intelligent automation: Unmatched efficiency for market leaders
ELEKS' intelligent automation transforms your business with custom data-driven tools. We streamline processes, boost productivity, and cut costs by automating complex tasks. Our tailored approach unlocks growth opportunities, freeing your team to focus on high-value tasks.
Tool Of The Day
Trends & Updates
Resources & Tutorials
P.S. Want help with technical consultation? If you're a business, you can book time with me here
P.P.S. Mind rating today's email ? |
Enjoy the newsletter? Please forward to a friend. It only takes 25 seconds. Writing this one took hours.
With that, it's time to wrap up. Got to get ready for tomorrow!