How Canva Processed Billions of Events with OLAP Migration

TechOps Examples

Hey — It's Govardhana MK 👋

Along with a use case deep dive, we identify the top news, tools, videos, and articles in the TechOps industry.

IN TODAY'S EDITION

🧠 Use Case

  • How Canva Processed Billions of Events with OLAP Migration

🚀 Top News

📽️ Videos

📚️ Resources

🛠️ TOOL OF THE DAY

cloud_enum - Multi-cloud OSINT tool. Enumerate public resources in AWS, Azure, and Google Cloud.

🧠 USE CASE

How Canva Processed Billions of Events with OLAP Migration

When Canva first built its architecture, they used MySQL and separated major components using worker services, storing multiple layers of reusable intermediary output.

Ref: Canva

The deduplication worker scanned and matched event types, updating records before aggregating results into counters. While this setup worked initially, Canva encountered three key challenges:

  • Processing scalability

  • Incident handling complexity

  • Rapid storage consumption

Processing scalability:

The deduplication scan used a single-threaded process with a pointer tracking the latest record, making it easy to verify fixes but not scalable.

Each record needed a database round trip, resulting in O(N) queries.

Batching helped but didn’t fully solve the issue. Multi-threading added complexity, and errors delayed the entire pipeline.

Increase total counts using database round trips.

Incident handling complexity:

Incident handling was complex, requiring manual fixes in the database. The key types of incidents were:

  • Overcounting: New usage types mistakenly included, needing pipeline pauses, data removal, and table corrections.

  • Undercounting: Missing event types required retrieving backup data, causing delays due to processing limits.

  • Misclassification: Usage events categorized incorrectly, needing code fixes and full recalculation of deduplication and aggregation data.

  • Processing delays: Sequential scans or unexpected data slowed down the pipeline, delaying aggregation.

Rapid storage consumption:

MySQL RDS couldn’t scale horizontally, leading to doubling the instance size every 8-10 months and rapidly consuming storage.

As it grew to several TBs, maintenance became complex with downtime risks for critical features.

Regular upgrades without downtime added complexity. A database split with sweepers helped clean old data but wasn’t sustainable long-term.

Migrate data to DynamoDB

Lessons learned drove pipeline changes, moving raw usage events to DynamoDB to ease storage pressure.

However, migrating all data was halted, as it improved storage scalability but didn’t solve processing scalability issues from database round trips.

Simplify using OLAP and ELT

Canva switched to end-to-end calculations, processing entire months of data using Snowflake for large-scale analysis.

Usage data was extracted via a data replication pipeline, transformed with scheduled SQL jobs (using DBT), and aggregated with queries like:

select

day_id,

template_brand,

sum(usage_count) as usage_count

from

group by

day_id,

template_brand

Key Steps Involved: 

  • Extracted JSON data into optimized SQL tables.

  • Deduplicated usage events.

  • Aggregated totals using GROUP BY queries.

These changes improved performance, scalability, and reduced operational complexity as Canva now tracks billions of content usages monthly.

New Architecture

Canva’s core tracking functionality is now built as a counting pipeline, divided into three stages:

  • Data collection: Usage events are gathered from various sources, validated, and filtered.

  • Deduplication: Duplicate events are removed, and classification rules are applied to track distinct usages.

  • Aggregation: The total deduplicated usages are calculated and grouped by dimensions like design template or brand.

Hope this was an insightful use case for your learning!

Learn AI in 5 Minutes a Day

AI Tool Report is one of the fastest-growing and most respected newsletters in the world, with over 550,000 readers from companies like OpenAI, Nvidia, Meta, Microsoft, and more.

Our research team spends hundreds of hours a week summarizing the latest news, and finding you the best opportunities to save time and earn more using AI.

Looking to promote your company, product, service, or event to 16,000+ TechOps Professionals? Let's work together.