During the monitoring of a Ruby on Rails project, I noticed that the Sidekiq process was consuming more memory every day. At this rate, our VM would run out of memory in a week! So, I started investigating.

The ever-growing memory usage for a long-running process is a typical symptom of a memory leak. Since Ruby lifts the burden of memory management from developers, I couldn’t imagine how this issue could occur. I began researching how Ruby manages memory.

I found a great talk, Building a Compacting GC for MRI by Aaron Patterson. Aaron explores Ruby’s memory management, focusing on Copy on Write, building a compacting GC, and memory inspection tools. After listening to this talk, I knew more about how Ruby uses memory, but it was still unclear what could cause the issue with our Sidekiq process.

Searching for more talks on Ruby memory management, I found The Hitchhiker’s Guide to Ruby GC by Eric Weinstein. This talk gave a really good overview and presented GC configuration options that we can tweak to achieve better memory usage. It also described the Mark and Sweep mechanism and how keeping track of generations can help garbage collectors based on the fact that “most objects die young”. But the most important insight for me was:

When there’s no more free memory, Ruby marks all active objects, then combines (sweeps) inactive objects.

This means that Ruby won’t run the GC procedure until there is any free memory left. The reason is that the GC procedure is really costly, since it stops everything while it runs and can take multiple seconds to complete.

I wanted to double-check this information. AppSignal’s blog had a thorough post about Practical Garbage Collection Tuning in Ruby, which confirmed my suspicion.

Contrary to a common assumption that GC runs happen at fixed intervals, GC runs are triggered when Ruby starts running out of memory space. Minor GC happens when Ruby runs out of free_slots.

This post exactly described how my previous assumption about Ruby’s GC was wrong. Our application runs in containers managed by Docker Compose. Docker Compose doesn’t limit the memory for the containers by default, so each of them could potentially use up all the free memory on the host machine. This is what happened since Ruby’s garbage collector doesn’t run periodically!

The fix was easy. I just set some reasonable memory limits for the service.

# docker-compose.yml
version: "3"
services:
  sidekiq:
    build: .
    restart: always
    command: "bundle exec sidekiq"
    deploy:
      resources:
        limits:
          memory: 150M

Follow-ups

While investigating the issue, I heard a lot of interesting ideas that I want to follow up on sometime.

JRuby

Ruby uses Matz’s Ruby Interpreter, or Ruby MRI (also called CRuby), by default. The two talks I mentioned were focused on this interpreter. JRuby was frequently mentioned as an alternative, which could provide better memory usage and superior performance at the expense of some setup time and compatibility.

Puma vs Unicorn

Aaron Patterson’s talk showed a use case for a compacting GC with their Unicorn web server setup. This made me curious about the differences between the Unicorn and Puma web servers. It turns out the difference lies in how they manage concurrency. Unicorn basically forks the main process to enable concurrent processing, as described on Wikipedia. Puma also uses multiple processes, but it handles each request on a separate thread, as stated in their documentation. I also checked out some benchmarks which showed that Puma is more performant in many situations than Unicorn.