- TechOps Examples
- Posts
- How Canva Processed Billions of Events with OLAP Migration
How Canva Processed Billions of Events with OLAP Migration
TechOps Examples
Hey — It's Govardhana MK 👋
Along with a use case deep dive, we identify the top news, tools, videos, and articles in the TechOps industry.
IN TODAY'S EDITION
🧠 Use Case
How Canva Processed Billions of Events with OLAP Migration
🚀 Top News
AWS Lambda now supports CloudWatch Logs Live Tail and Metrics Insights in the console
📽️ Videos
Git & GitHub Tutorial - Visualized Git Course for Beginner & Professionals in 2024
AWS Networking - What is a NAT Gateway?
📚️ Resources
Cloud Logging Tips and Tricks
🛠️ TOOL OF THE DAY
cloud_enum - Multi-cloud OSINT tool. Enumerate public resources in AWS, Azure, and Google Cloud.
🧠 USE CASE
How Canva Processed Billions of Events with OLAP Migration
When Canva first built its architecture, they used MySQL and separated major components using worker services, storing multiple layers of reusable intermediary output.
Ref: Canva
The deduplication worker scanned and matched event types, updating records before aggregating results into counters. While this setup worked initially, Canva encountered three key challenges:
Processing scalability
Incident handling complexity
Rapid storage consumption
Processing scalability:
The deduplication scan used a single-threaded process with a pointer tracking the latest record, making it easy to verify fixes but not scalable.
Each record needed a database round trip, resulting in O(N) queries.
Batching helped but didn’t fully solve the issue. Multi-threading added complexity, and errors delayed the entire pipeline.
Increase total counts using database round trips.
Incident handling complexity:
Incident handling was complex, requiring manual fixes in the database. The key types of incidents were:
Overcounting: New usage types mistakenly included, needing pipeline pauses, data removal, and table corrections.
Undercounting: Missing event types required retrieving backup data, causing delays due to processing limits.
Misclassification: Usage events categorized incorrectly, needing code fixes and full recalculation of deduplication and aggregation data.
Processing delays: Sequential scans or unexpected data slowed down the pipeline, delaying aggregation.
Rapid storage consumption:
MySQL RDS couldn’t scale horizontally, leading to doubling the instance size every 8-10 months and rapidly consuming storage.
As it grew to several TBs, maintenance became complex with downtime risks for critical features.
Regular upgrades without downtime added complexity. A database split with sweepers helped clean old data but wasn’t sustainable long-term.
Migrate data to DynamoDB
Lessons learned drove pipeline changes, moving raw usage events to DynamoDB to ease storage pressure.
However, migrating all data was halted, as it improved storage scalability but didn’t solve processing scalability issues from database round trips.
Simplify using OLAP and ELT
Canva switched to end-to-end calculations, processing entire months of data using Snowflake for large-scale analysis.
Usage data was extracted via a data replication pipeline, transformed with scheduled SQL jobs (using DBT), and aggregated with queries like:
select
day_id,
template_brand,
sum(usage_count) as usage_count
from
group by
day_id,
template_brand
Key Steps Involved:
Extracted JSON data into optimized SQL tables.
Deduplicated usage events.
Aggregated totals using GROUP BY queries.
These changes improved performance, scalability, and reduced operational complexity as Canva now tracks billions of content usages monthly.
New Architecture
Canva’s core tracking functionality is now built as a counting pipeline, divided into three stages:
Data collection: Usage events are gathered from various sources, validated, and filtered.
Deduplication: Duplicate events are removed, and classification rules are applied to track distinct usages.
Aggregation: The total deduplicated usages are calculated and grouped by dimensions like design template or brand.
Hope this was an insightful use case for your learning!
Learn AI in 5 Minutes a Day
AI Tool Report is one of the fastest-growing and most respected newsletters in the world, with over 550,000 readers from companies like OpenAI, Nvidia, Meta, Microsoft, and more.
Our research team spends hundreds of hours a week summarizing the latest news, and finding you the best opportunities to save time and earn more using AI.