Skip to main content
  1. OSS/
  2. Ozone/

Becoming an Ozone Committer

·9 mins· ·
Blog En Ozone Oss
Author
李緒成 | Peter Lee
OSS Contributor | Distributed System | Storage
Table of Contents

First of all…
#

Ozone is written in Java. My own Java résumé? Basically just Prof. Hsin-Chieh Lee’s class at NCKU, plus a tiny end-of-term project—a little stress-testing tool—built with legends Zhe-You Liu and Henry Chang: https://github.com/NCKU-CSIE-Union/Japybara-CLI.

I’m telling you this because if I can jump into Ozone with that background, so can you! 🚀

Self-intro
#

Hi, I’m Peter Lee (李緒成), a third-year CS undergrad at NCKU.

Obsessed with open source, distributed systems, and storage tech—probably a childhood trauma from using potato-grade PCs. My dream is to duct-tape a bunch of crummy machines into one awesome super-computer.

Fun fact: before contributing, I had never touched Ozone in real life. I’d only spun up MinIO in Docker for kicks, and at my internship we use GCS. Ozone’s S3 compatibility? Zero connection. 🤷‍♂️

Why Ozone?
#

The spark
#

Early 2025 I stumbled across Ozone, skimmed the architecture blog posts, went “hey this is cool,” and boom—cloned the repo, read the docs, started squashing Jira tickets. No ten-year master plan, just vibes.

Why it’s cool
#

  1. Protocols galore—HTTP, S3, HDFS. Plays nice with everything.
  2. Fixes HDFS’s scaling pain and loves small files.
  3. Containers give SCM centralized placement control without killing performance—arguably higher availability than Ceph/MinIO.
  4. Big names running Ozone: Tencent, LYC, Shopee, Preferred Networks. (Search GitHub—you’ll see more secret admirers.)
  5. Jesse (Ozone PMC) says Shopee’s cluster holds 4 billion keys—10× a maxed-out HDFS NameNode.
  6. Didi’s engineers claim tens of billions of keys and hundreds of PB. Blog post coming soon.
  7. Apache 2.0 license. Totally free. That alone is unbeatable.

threads-post
https://www.threads.com/@jc.techtalk/post/DFqWz69SarA

What I got out of it
#

Tech chops
#

  • Compare architectures of Ceph, MinIO, DeepSeek 3FS, etc.—see each one’s trade-offs.
  • Ozone feels micro-service-ish: each component with clear boundaries. Eye-opening design study.
  • Lots of RocksDB deep dives—compaction, iterators, checkpoints, key I/O bottlenecks.

Perspective
#

  • Collaborate with engineers way above my pay grade. Their knowledge just floods in like a typhoon.
  • Real users show up in Jira/Slack long before a shiny blog post appears on LinkedIn.

Communication
#

  • Most discussion happens on design docs, PRs, Jira—async, different time zones. Say things once and make them crystal. (I’m still learning. If you have resources, hit me up!)

Some numbers
#

PRs & Reviews
#

github-contribution-peterxcli

Third-party stats
#

  1. OSS Rank

    oss rank
    I’m #11… plus another me at #60 because the site duped my account. 🧬
    oss rank peterxcli at rk60

  2. OSS Insight Often swings into the top-2 for activity. Forgot to screenshot the pinnacle though.

    oss insight

None of this matters—just dopamine for self-care.

I got the Apache Ozone Committer badge!
#

Huge thanks to Jesse for shouting it out on his socials—professors reposted it, Taiwanese tech morale soared, and I had literally just binge-watched all five seasons of Breaking Bad the night before. Perfect timing.

fb-post-1
https://www.facebook.com/share/p/1ECtED9AJC/
fb-post-2
https://www.facebook.com/share/p/12Jyf5iAJ4K/
fb-post-3
https://www.facebook.com/share/p/1FGNLBitAw/
fb-post-4
https://www.facebook.com/share/p/12Jyf5iAJ4K/

kafkaesque

“源來適你” (OpenSource4You) community
#

Quoting Zhe-You Liu:

OpenSource4You is a Taiwan-based non-profit community for real-world OSS contributions. Mentors guide projects like Apache Airflow, Kafka, YuniKorn, etc. It’s Chinese-friendly, ask away!

  • Project list with mentors
  • Deep dive article: “Kafka Community Spotlight: TAIWAN 🇹🇼” on Stanislav’s Big Data Stream.

Recently there was a “Committers under 30” meetup at Dcard’s 14F public area!

committer under 30
https://www.facebook.com/share/p/1Bphui46dK/

My Apache Ozone journey
#

Early days → “mid-game”
#

1 . Refactoring tests
#

  • Include AWS request ID in S3G audit logs #7725
  • Add tests for SnapshotChainRepair #7741
  • Create endpoint builders for S3G tests #7753

2 . Pagination for listMultipartUploads in S3G & OM
#

Main PR: #7817

Follow-ups that popped up while I was knee-deep in the code:

Follow-upWhat it fixes / improves
Sort multipart uploads on response
#7929
Switched to UUID v7 (time-based) for upload IDs so OM no longer has to sort in memory; also matches the S3 spec’s “time-based ordering” requirement.
Duplicated key scanning on multipartInfo table
#7937

3 . ReplicationManager turbo-charge #7997
#

ReplicationManager (inside SCM) tracks container replicas across DataNodes. The patch lets it detect DataNode state changes much faster, cutting reaction latency from minutes to seconds.

4 . SCM Safemode refactors
#

AreaPRWhat changed
EC vs. Ratis containers#7951Split the one-size-fits-all rule into two rules, because EC and Ratis have different replica-count logic.
DataNode safemode#7998Instead of keeping its own copy of node status, the rule now asks NodeManager directly—single source of truth, fewer bugs.
Pipeline availability check property#8095Deleted an obsolete hdds.scm.safemode.pipeline-availability.check flag.

5 . CI speed-ups (with Attila)
#

Maven no longer compiles every module for flaky-test checks—only the modules that matter.

PR
Script to find modules by test classes https://github.com/apache/ozone/pull/8062
Detect test class module in flaky-test-check https://github.com/apache/ozone/pull/8162
Limit flaky-test-check to a sub-module https://github.com/apache/ozone/pull/8194

ci-flaky-test-optimization

Recent & current work
#

1 . DataNode improvements
#

DataNode is the Ozone component that physically stores every container’s files.
Because it deals with disk space directly, even tiny mis-calculations can snowball into outages—especially when multiple threads are creating or importing containers at the same time.

Import container?
That’s the act of pulling a fully-formed container from another DataNode (usually during recovery or data-rebalancing) and “installing” it locally. It happens in parallel with regular container creation, which means two different code paths might race to grab the same disk space.

If ten threads each try to put a 10 GB container onto a volume that only has 11 GB free, and they all check “hey, space looks fine!” before actually reserving it, you end up “allocating” 100 GB into thin air. Oops.

TopicPR(s)TL;DR
Disk-space accounting#8086Treats volumeFreeSpaceToSpare as genuinely reserved space when computing usage.
Atomic volume selection & space reservation#8090 & #8360VolumeChoosingPolicy is now synchronized: choose a volume and reserve its space in one atomic step, preventing over-allocation in highly parallel container creates/imports. Bonus: thread-local RNG + atomic counters actually boosted perf 2–3 × (see two images below)

atomic-choose-volume-chart
atomic-choose-volume-chart

2 . Snapshot work
#

  • Limit the number of snapshots per bucket #8157

    Ozone snapshots piggy-back on RocksDB checkpoints (hard-links). ext4 caps hard-links at ~65 000, so OM now enforces a snapshot limit at the application layer. Looked trivial, turned into a 70-comment marathon thanks to OM’s double-buffer concurrency corner cases.

3 . OM RocksDB compaction
#

  • Aggressive DB Compaction with Minimal Degradation #8178

    Problem: tombstones pile up; iterator scans die a slow death. Idea: slice the key space by volume, bucket, folder prefixes, keep per-range tombstone stats, and compact small ranges whose tombstone ratio blows past a threshold—so you whittle tombstones away in the background without nuking the whole table during peak traffic.

    Related community efforts worth noting:

    PRAngle
    Auto-compact big tables at intervals #8260Periodic compaction.
    Online repair command: manual compaction #7957Admin tool to compact a table asynchronously.

4 . Miscellaneous
#

  • EventExecutorMetrics instead of hacky reflection-based tweaks #8371

    Those ugly InaccessibleObjectException warnings in test logs? Gone. Re-implemented the metric renaming trick safely—log output is squeaky clean now.

    event-queue-metrics

So I’m a Committer… now what?
#

  • Reality check: Committer ≠ omniscient. I probably grok 15 % of the codebase. Haven’t even mastered the DataNode read/write path—like a chef who can’t turn on the stove.
  • More reviews & design discussions—container reconciliation, snapshot scaling, S3 lifecycle… I need to keep up.
  • Big-impact features someday. Still a long grind ahead.

How you can start contributing
#

  1. Fork Ozone, clone it, and run mvn install.
    (Confession: this step alone took me two days… newcomers these days blaze through it in hours. You’ve got this!)

  2. Read the docs.

  3. Pick an issue from the newbie dashboards or task lists.

    Dashboards

    Foundation projects (good first picks)

    Advanced projects (when you’re ready for a deeper dive)

Task management
#

I bounce between Microsoft To Do and Obsidian Kanban, but find whatever clicks for you.
Pro tip from Zhe-You: How to manage tasks


TL;DR / Quick start
#

Too long; didn’t read? Just hop into the Apache Ozone Slack channel. Take that first step—dig deeper whenever the mood strikes. Everyone’s super nice. 💪
(If you haven’t joined the OpenSource4You Slack channel, you can use this invite to join.)


Shout-outs
#

  1. Chia-Ping Tsai — founded OpenSource4You, plugged me into Ozone and a ton of awesome people, mentored my early Kafka dabbling, and—most crucially—kept my motivation bar full with endless emotional support. The whole channel crew chats, debates, and drops advice whenever I’m stuck—huge boost on every front!

  2. Wei-Chiu Chuang — spun up and runs the Ozone Slack channel, spends ridiculous amounts of time answering my newbie questions, and nominated me for Committer. Chung-En Lee hosts the weekly calls. Both of them flood the channel with pure gold—couldn’t keep up without their brain dumps.

  3. Attila (project gatekeeper extraordinaire), Semmi Chen (APAC community call host), Ethan (NA community call host), plus Jesse, Cheng-En, Ivan, Swami—and a gazillion other PMC/Committers—review my patches, spot hidden land-mines, and patiently answer even my silliest questions. Couldn’t ask for a kinder gauntlet.

  4. Prof. Kun-Da Chuang — invited Wei-Chiu to give a talk at NCKU on literally my second day of contributing (talk about timing!) and nudged my thesis toward Ozone. Massive guidance on research, networking, everything. Fun fact: after that talk I cornered Jesse with a truckload of rookie questions. 😆

  5. RoomiesEric, Jason, Owen: your relentless grind stops me from turning into a slacker. I used to day-dream you guys would found a startup and hire me for life—but hey, don’t put all the eggs in one basket, so I’d better hustle before you dump me. 😭
    Next Committer badge is yours, Owen!

  6. Dcard internship — bullet-proof CI/CD, a strict code-review culture, full test coverage, stellar teammates and leads, and a beefy codebase. All of that shaved weeks off my OSS ramp-up time. Immense gratitude!


Resources
#

Apache Ozone
#

OpenSource4You
#

Related

Ozone Introduction
·9 mins
Blog En Ozone Oss
Introduction to Ozone
Building a high-performance, scalable server for AD management
·13 mins
Blog En
A high-performance advertisement management system achieving 96,000 QPS through in-memory database, Redis streams, and PostgreSQL, featuring replica, custom indexing, and fault tolerance mechanisms.
Bikefest 2024 Backend
·4 mins
Blog En
How we build a backend system for NCKU Bikefest 2024 Website