This article records my experience of contributing to an open source project Iggy under The Apache Software Foundation , while also sharing some of my personal insights.
About Apache
The Apache Software Foundation, a U.S. 501(c)(3) non-profit foundation committed to software for the public good with the “Community over Code” ethos, manages hundreds1 of influential open source projects that underpin core infrastructure in cloud, big data and web industries, based on community and sponsors, boasting an ecosystem valued at over $30 Billion USD .2
About Iggy
Apache Iggy, an incubating open source message streaming project at The ASF, is a Rust built, ultra low latency persistent platform that processes millions of messages per second to power high efficiency rt data pipelines and cloud native streaming workloads at laser speed.
When Editing
The syntax has been corrected by LLM. Article done after committing some codes.
This brings me to the situation I had in early November 2025.
First, while I already knew how to use proxies in the 5th grade, and “could write”6Java code in the 6th grade, I made virtually not so many contributions to the open source community from 2021 to 2024. The reason is quite straightforward, I was obsessed with Minecraft during the 2021 academic year, immersed in Genshin Impact (Asia Server) in 2022, and took up Arknights in 2023. These three distractions completely squandered most of my time that could have been spent tinkering with distros and coding langs7.
Despite my early start in tech, I lack extensive experience in multiple areas, not just development experience, but also collaboration experience. This deficiency is already evident in my use8 of Git, and more shortcomings will quickly surface once I engage in actual development. These shortcomings are not merely limited to not knowing how to use Docker or distributed compilers , but extend to basic coding standards. Thus, my needs became clear from now on:
A small or mid scale open source project, better led by an organization or enterprise, to correct my various bad habits;
A project in its early/mid-stage of development with active maintainers, Learn effective communication;
The project itself should have a steady pace of progress;
The project language I prefer to use.
This led me to Apache. After reading their charter and mission, finding their GitHub account, and conducting searches, I discovered that Apache Iggy was the only project that fully met all the above requirements.
The Apache Software Foundation encourages people to contribute. Volunteers who are unsure where to start can take a look at the project’s good first issues. That’s how I began too: I picked the oldest unclaimed issue. The reason was simple. As a Chinese high school student, I had less than 20 hours available per weekend, and my skills were rusty, so I was worried the work would drag on for a long time.
Contributor Ladder
First, you are a contributor, then you are a committer, and finally you are a PMC member or higher. You may notice that there are also PPMC, which is member of Podling Project Management Committee , belongs to Apache Incubator.
GitHub is Not Necessary
Apache projects are NOT necessarily hosted on GitHub. Many long established Apache projects still use Apache SVN as their primary version control system and Bugzilla for issue tracking, like Apache HTTP Server Project, rather than GitHub’s Issues and Pull Requests mechanism.
I figured an issue that had been around for a long time probably meant it was important but not urgent for the maintainers, which would alleviate the mess on them from my slow progress.
By checking who merged the pull requests, we could easily tell who has write access. I made an attempt sending a request to @hubcio . He agreed promptly. Later, @spetz also joined the code review. @hubcio and @spetz are highly efficient and responsible collaborators. They’re both Members, and PPMCs.
Communication Methods
Email communication is also recommended at times, which might be PUBLIC. You can receive real-time messages from mailing lists by sending a subscription confirmation. For instance:
Just like when making contributions normally, first read through the Code of Conduct, Contributing Guidelines, and then make a fork, syncing. Besides, Iggy also provides a cli.
Fish Session
2 collapsed lines
1
gitclonegit@github.com:Svecco/iggy.git
2
cd./iggy/ && ssh-add~/.ssh/id_ed25519
3
gitcheckout-b46-size-logsupstream/master
4
cargoinstalliggy-cli && iggy--help
Nearly all medium or large modern Rust projects adopt the workspace mechanism. I’m not sure why so many Rust learning materials don’t cover it as a key topic, including the Programming Rust, 2nd Edition I have right now.
You'd Better Prepare Some Time Waiting for Cargo Building
4 collapsed lines
1
╭─ $ [svecco] ~/A/iggy git!(46-size-logs)
2
╰─ > cargo clean
3
Removed 65558 files, 52.1GiB total
4
╭─ $ [svecco] ~/A/iggy git!(46-size-logs)
5
╰─ > time sh -c 'cargo build --all-features --all-targets --workspace --release 2>&1 | tail -1'
6
Finished `release` profile [optimized] target(s) in 3m 18s
Iggy’s core business data model and interaction logic. It defines the nested hierarchy from the top-level namespace Stream down to the atomic data unit Message, and standardizes the core send/poll operations between producers and consumers.
Iggy’s layered technical architecture. It maps the complete end-to-end data path from multi-language SDK clients through the transport layer and core processing engine to the persistent storage layer, along with the officially supported transport protocol options.
Iggy’s two core end-to-end workflows: message production and consumption. It details the full lifecycle of a message, covering client authentication, routing and persistent append for production, as well as subscription, batch fetching and offset commit for consumption.
And for the issue, for example, server.toml controls the default configuration file, while defaults.rs handles the parsing of default values, and validators.rs is responsible for validating data integrity preventing issues like rotation times approaching zero10, for instance.
In the integration phase, as the name, we perform integration testing, which fully and independently implements features testing. Relying solely on unit testing is insufficient for complex projects involving overall system operation.
Logically, the implementation is straightforward, divide max_total_size by max_file_size to get the upper limit of the quantity, then read the rotation_check_interval and retention cycles configured by the user. After that, push file entries to Vec<(fs::DirEntry, SystemTime, Duration, u64)>, after calculating, for deletion according to these scheduled timings, and finally remove a batch.
# Maximum total size of all log files. When this size is reached,
13
# the oldest log files will be deleted first. Set it to 0 to allow
14
# an unlimited number of archived logs. This does not disable time
15
# based log rotation or per-log-file size limits.
16
max_total_size = "4 GB"
3 collapsed lines
17
18
19
# Time interval for checking log rotation status. Avoid less than 1s.
20
rotation_check_interval = "1 h"
3 collapsed lines
21
22
23
# Time to retain log files before deletion. Avoid less than 1s, too.
24
retention = "7 days"
The above are configuration level changes, and you can also see what has been implemented from here. Of course, for more specific details, you may as well directly use the commit hash to check the git log to find out about other additions and deletions. Since the issues people encounter are diverse, and log rotation is not highly technical, going into detail here would sacrifice general applicability. So, I won’t elaborate on the general implementation details here, instead, I will choose to pick things that I find quite interesting up later.
CPU_ALLOCATION="<more than your PHYSICAL cores>" RUST_BACKTRACE=1cargorun--binserver
Couldn’t Load Data From Disk?
That’s because the process attempted to read an incompatible system metadata file under the ./local_data/ directory. Just remove the ./local_data/ directory and try again.
Regarding the Num of Cores
On my device, testing shows that cpu_allocation = "19" is the limit, with single one 32 logical cores, distributed off. Might require a machine with many cores no, but SMT to trigger.11
Generally speaking, this situation occurs because the requested memory allocation is too large. The malloc function is overwhelmed and just kills the process (tears included). But you can check out, and, find there may something incorrect, but you don’t know exactly what it is:
That’s very strange for me. There are no external restrictions, on the contrary, resources are sufficient, yet it still fails to start, So this is an internal server issue?
Flood Full of These Panics
1
2026-02-13T10:08:28.268923ZERROR main iggy_server: Server shutting down due to shard failure. (shutdown took 31 ms)
2
Error: ShardFailure { message: "Shard 22 panicked: called `Result::unwrap()` on an `Err` value: Os { code: 12, kind: OutOfMemory, message: \"Cannot allocate memory\" }" }
When Editing the Essay: free -h | head -n 2
1
total used free shared buff/cache available
2
Mem: 46 Gi 20 Gi 772 Mi 199 Mi 24 Gi 26 Gi
Since the error is occurring with the shard, let’s look into the code that implements the shard functionality. Let’s check how the shard allocated.
This code that executes kernel allocation seems harmless enough at first glance, yet there appears to be something off about it. The cores are allocated via available_parallelism(), man it, see what can we get from the documentations.
The purpose of this API is to provide an easy and portable way toquery the default amount of parallelism the program should use.Among other things it does not expose information on NUMA regions,does not account for differences in (co)processor capabilities orcurrent system load, and will not modify the program’s global statein order to more accurately query the amount of available parallelism.
“NUMA regions”? What is “NUMA”?
In early computer systems, all CPU accessed memory through a single bus, an architecture known as SMP13. All CPU were equal, with no master slave relationship. As the number of processors increased, the system bus became a critical system bottleneck, leading to significant latency in communication between processors and memory.
From the Hardware Architecture
In the NUMA architecture, CPU are divided into multiple NUMA Nodes. Each node has its own independent memory space and PCIe bus subsystem. Intercommunication between CPU is achieved via the QPI bus.14
The speed at which a CPU accesses memory from different node types varies: access to the local node is the fastest, while access to remote nodes is the slowest. In other words, memory access speed depends on the distance to the node, the greater the distance, the slower the access. This is why it is called NUMA. The memory access distance is referred to as Node Distance.
This architecture effectively solves the performance issues caused by large scale CPU expansion under the SMP model.15 This means: This piece of code directly maps abstract CPU allocation rules to a set of logical CPU core IDs, where cpu_allocation = "all" generates a set with a size equal to the number of logical cores reported by the system. In this scenario, 32 shards are created instead of 16, which triggers two cascading issues: first, multiple shards are bound to the SMT threads of the same physical core, resulting in resource contention. Second, the 32 shards, combined with the pre-allocated memory pool, generate a massive amount of instantaneous memory requests. This value exceeds the kernel’s heuristic memory overcommit threshold vm.overcommit_memory=0, thereby triggering Code 12 and causing shard crashes.16
Now, we can examine the source code to see why the documentation states this.
# This defines the maximum, total memory allocated for the memory pool.
4
# Note: This number has to be multiplication of 4096 (default linux page size).
5
# Minimum size is 512 MiB due to internal implementation details.
6
7
size = "4 GiB"
Reducing the memory pool size (e.g., from 4 GiB to 1 GiB) lowers the initial pre-allocation peak and decreases the total instantaneous memory demand. This value may fall within the kernel’s “repayable” memory range, allowing the server to start occasionally.
However, this only alleviates the symptoms and does not address the root cause. Physical cores and SMT threads are not distinguished, nor is there a limit on the maximum number of shard. Ultimately, this amplifies memory pressure to the point of triggering an OOM error on SMT enabled multi cores CPU.
When the First Time I Encountered This
Of course, I didn’t know any of this at the beginning of the pull request, nor did I realize it was a NUMA issue. I was fixated entirely on the memory side and was just about to file an issue. After fetching, the problem was suddenly resolved. With some locates, let’s take a look at the commit that fixed the issue.
addressing #2387, feat(server): NUMA awareness (#2412), committed by @tungtose
The issue was handled by @tungtose , and the image above was the flow he drew. You can find the original issue #2387 here, filed by @hubcio (Hubert Gruszecki) , and the original pull request here #2412. Merged by @spetz (Piotr Gankiewicz) on Dec 15, 2025.
tracing::error!("Failed to bind memory {:?}", err);
28
ServerError::BindingFailed
29
})?;
30
31
info!("Memory bound to NUMA node {node_id}");
32
}
33
}
34
35
Ok(())
36
}
37
}
Although I don’t understand it many17, it’s much better than the indiscriminate thread allocation based on simple CPU sets from available_parallelism(). And even more delightful is that as a newcomer, I can finally write code happily.
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.33s
Let’s look at the source code. Ah, if hwloc.pc is in the dev of the nixpkgs, there will be multiple outputs… So I can’t rely on installing it correctly by just adding the package name directly.
nixpkgs/pkgs/by-name/hw/hwloc/package.nix:74-80
1
outputs= [ "out""lib""dev""doc""man" ];
2
# "out" was the default output. Below was my installation:
Alright, I’m not sure really. I’ll take a closer look before --force it next time. If I don’t have enough time, just test it later, also learn to write your own test scenarios. Don’t always ask others to test features, it’s not a good practice, and usually wastes maintainers’ time.
Now, let’s see what silly things this guy @Svecco did.
1/8 core/server/src/log/logger.rs: Please import rather than use solute path each time.
I don’t know if staying up late frequently is one of the reasons for these outrageous problems. But for now, it just seems like an excuse for not being good enough.
Btw, according to Programming Rust , when there are multiple line macro calls, rem comma. 21
4db971b ababa | 65f1d68 feat(api): impl data cache for github api pulls
7
8387e24 OvO | ed70a3e feat(fuwari): many personalizations
8
1287507 luyj | 3df9f6f init(fuwari): by svecco on sve.moe
It’s true that NO ONE has required me to use commit messages like this for my own tiny git repos, but I just can’t help it. It must be because been domesticated by Apache.
General Message Format that Applies Conventional Commits
According to the public standards on GitHub, I initially thought ASF’s CI was paid by them22, because the project is large, there are many compilations, and community are very active.
About Clippy
The uniqueness of a toolchain is jointly identified by its version && hash. Different hashes are regarded as different toolchains.
However, initially, due to system environment issues, the Clippy check could not be run. Since the issue can only be reproduced on NixOS… … and cannot be reproduced on Fedora or Gentoo right now, let’s use the Clippy source code to replace.
if current.commit_hash == artifact.commit_hash &&!current.commit_hash.is_empty() {
9
returnVersionCompatibility::ExactMatch;
10
}
14 collapsed lines
11
12
if current.version == artifact.version {
13
returnVersionCompatibility::VersionMatch;
14
}
15
16
let current_parts:Vec<&str> = current.version.split('.').collect();
17
let artifact_parts:Vec<&str> = artifact.version.split('.').collect();
18
if current_parts.len() >=2&& artifact_parts.len() >=2 {
19
if current_parts[0] == artifact_parts[0] && current_parts[1] == artifact_parts[1] {
20
returnVersionCompatibility::MinorVersionMatch;
21
}
22
}
23
24
VersionCompatibility::Different
25
}
The Clippy Output Before Looks Around This
1
Detected a version mismatch:
2
The Rust compiler (rustc) version 1.92.0 (3df9f6f) used during
3
the build process does not match the rustc 1.92.0 (ed70a3e) used
4
to build the compiled artifact.
5
6
Aborted with 1 error. Clippy check failed.
Obviously, I don’t really want to mess with it, just force it first and see. So this happens often:
Then I felt that something didn’t right, so I check the GitHub Actions Manual, and get this:
Operating system
Billing SKU
Per-minute rate (USD)
Linux 1-core (x64)
actions_linux_slim
$0.002
Linux 2-core (x64)
actions_linux
$0.006
Linux 2-core (arm64)
actions_linux_arm
$0.005
Windows 2-core (x64)
actions_windows
$0.010
Windows 2-core (arm64)
actions_windows_arm
$0.010
macOS 3-core or 4-core (M1 or Intel)
actions_macos
$0.062
.github/workflows/*
1
Permissions Size User Date Modified Name
2
.rw-r--r--@ 9.4k svecco 13 Feb 21:47_build_python_wheels.yml
10 collapsed lines
3
.rw-r--r--@ 11k svecco 13 Feb 21:47_build_rust_artifacts.yml
4
.rw-r--r--@ 13k svecco 13 Feb 21:47_common.yml
5
.rw-r--r--@ 17k svecco 13 Feb 21:47_detect.yml
6
.rw-r--r--@ 7.8k svecco 13 Feb 21:47_publish_rust_crates.yml
7
.rw-r--r--@ 4.6k svecco 13 Feb 21:47_test.yml
8
.rw-r--r--@ 4.2k svecco 13 Feb 21:47_test_bdd.yml
9
.rw-r--r--@ 6.6k svecco 13 Feb 21:47_test_examples.yml
10
.rw-r--r--@ 10k svecco 13 Feb 21:47post-merge.yml
11
.rw-r--r--@ 17k svecco 13 Feb 21:47pre-merge.yml
12
.rw-r--r--@ 52k svecco 13 Feb 21:47publish.yml
13
.rw-r--r--@ 2.0k svecco 13 Feb 21:47stale-prs.yml
Oh my, the cost would be quite high when the workload is heavy. Although just 2 synchronized by me, there were many others running, too. Anyway, it would be better to mention it to @spetz. I’ve had a fever for the few days at the beginning of 2026 and haven’t been able to work, better nothing serious happens, or I’ll just have to watch helplessly.
Fortunately. Well, although it’s true23 that Microsoft a sponsor to The ASF, 24 what matters more is this line from @spetz after I asked:
Thank you for all the changes, and dont worry about CI.
My local computing power is just sitting idle anyway, so I may as well put it to use.
Local CI/CD
Since a powerful computer cluster can be shared, running CI/CD on the cloud is generally the recommended choice. As for doing this locally, it seems rather controversial, as it competes for computing power with subsequent development work.
Act is an open source project that provides a way to run CI/CD locally, written in Golang.
Container Option
Act uses Docker to run the CI/CD workflow, which is the default, offering 3 image sizes: micro, medium, and large. In most cases, medium is sufficient. The large version can fully replicate the GitHub CI environment, comes pre-installed with all the toolchains included in the GitHub Actions runner, and can take up nearly 70 GB after decompression.
doas docker images catthehacker/ubuntu:full-latest
Even if some is installed, it may still fail to run, because GitHub Actions runs on fully virtualized machines, while act is based on Docker containers. The former is with the OOM Killer disabled and no seccomp/apparmor security restrictions, it can be more convenient.25
However, pulling the full image directly doesn’t seem to work either. This is because GitHub maintains its own official images, while the full version here does not include the Rust toolchain, which causes failures, can not find the rust toolchain or something.
After Some Probing I Inferred These Attributes
1
# For convenience, Docker will be used directly for demonstration below.
2
doasdockerpullcatthehacker/ubuntu:rust-latest# Fetch this
- target: x86_64-unknown-linux-gnu# As the arch above, linux/amd64
2
runner: ubuntu-latest
Judging from the logs, everything points to one thing:
Some of the failed tracebacks
1
failed to start test harness: Health check failed for iggy-connectors at http://127.0.0.1:35235 after 1000 retries
2
3
failed to start test harness: Health check failed for iggy-connectors at http://127.0.0.1:40443 after 1000 retries
4
failed to start test harness: Health check failed for iggy-connectors at http://127.0.0.1:39753 after 1000 retries
5
Received an invalid HTTP response when ingesting messages for index: test_topic. Status code: 404 Not Found, reason: { "message": "index test_topic not found" }
6
7
search: InvalidState{message:"Expected 13 documents but got 0 after 100 poll attempts"}
doas docker network ls
1
doas (svecco@orion) password:
2
NETWORK ID NAME DRIVER SCOPE
3
8462db97170e bridge bridge local
4
31ea411fde48 host host local
5
6
5cefc05182e2 iggy-quickwit-sink-4bd7b25b-36fb-4b04-97f1-ce11755d4322 bridge local
dockercontainerprune-f && dockernetworkprune-f# Later I did.
Uh? DinD? We may need --net=host attribute.
About Docker-in-Docker
Enables Docker-in-Docker container nesting capability, delivering a clean, isolated solution for containerized CI/CD pipelines, automated image building, environment replication and other container-native workflows, while avoiding dependency conflicts with the host environment.26
About quickwit-sink
quickwit-sink is a core delivery component of the cloud-native containerized logging stack, which streamlines unified log forwarding, indexing and persistence to Quickwit, with out-of-the-box compatibility and lighter configuration overhead for containerized environments.27
You May Have to Edit the Configs Under .github/ to Disable Some Tests
For Features Like CodeCoverage, You May Need a Token for Authentication
When act runs third-party Actions from the GitHub Marketplace (e.g., codecov-action). codecov-action requires a CODECOV_TOKEN (generated from the Codecov platform) to upload coverage reports. A GitHub Personal Access Token (PAT) is optionally needed for GitHub API authentication to reduce the risk of rate limiting, not mandatory for basic functionality. You can generate a GitHub PAT here: GitHub Personal Access Tokens. You can pass secrets to act by using: --secret-file <path/to/token/file>, e.g., add CODECOV_TOKEN=xxx or GITHUB_TOKEN=xxx.
This is my first ever Rust PR, and also my first contribution to be merged into a formal project that’s nearing 4k stars, rather than a package repo or other casual repositories. As a student, I initially feared the long dev and review cycle would be a hassle, but I’m extremely grateful to @hubcio and @spetz for your unparalleled patience in guiding and reviewing my work, who have helped me get up to speed with the workflow of a production ready backend project. I’ve also learned many things else e.g.podman, and many exp that I’d never get in school.
At my worst during this period, I even forgot the name of the PCIe interface. I didn’t know how to use Tmux either. I truly admire some middle school students who can code proficiently and contribute actively to various projects something I couldn’t do as a high school student. It was not until late 2023 to early 2025 that I gradually got back on track with Ubuntu. ↩
While I have created multiple commits locally, while the remote branch has updated some of these commits via a squash merge, I would create a new branch and then cherry-pick the desired commits one by one … Obviously, rebase --onto is fine. ↩
Initially, I aimed to avoid division by zero errors by outright blocking any operation that set the value to zero. Later, I noticed that IggyByteSize actually defines a specific meaning for the zero value, so I revised the logic to treat it as “unlimited” to align with the behavior stated in comments throughout the rest of the codebase. ↩
Crate hwloc: Rust Bindings for the Hwloc library. The hwloc library is a rust binding to the hwloc C library, which provides a portable abstraction of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. ↩
hwlocality Developers, hwlocality: Rust binding for hwloc, [Online]. (Designed for hardware topology detection, NUMA support, and thread binding optimization; secure, easy to use & actively maintained). [Accessed: Feb. 14, 2026]. ↩
CGPM, SI, Conférence Générale des Poids et Measures. [Online]. [Accessed: Feb. 14, 2026]. ↩