Skip to content

Feed aggregator

Penguin promoted to brigadier in Norwegian Army - CNET

The Wisdom of Clouds - James Urquhart - Tue, 08/23/2016 - 17:26
A king penguin living in Scotland is quite the celebrity in Norway thanks to his rapid rise through the ranks of the Royal Norwegian Guard.
Categories: Blogs

Microsoft's Office, OneDrive, Skype to climb into Lenovo phones - CNET

The Wisdom of Clouds - James Urquhart - Tue, 08/23/2016 - 17:19
Lenovo expects to ship millions of mobile devices over the next several years, many of which will now come preloaded with Microsoft productivity software.
Categories: Blogs

Dawn of the Dead Ends: Fixing a Memory Leak in Apache Kafka

Heroku Blog - Tue, 08/23/2016 - 15:41

At Heroku, we're always working towards increased operational stability with the services we offer. As we recently launched the beta of Apache Kafka on Heroku, we've been running a number of clusters on behalf of our beta customers.

Over the course of the beta, we have thoroughly exercised Kafka through a wide range of cases, which is an important part of bringing a fast-moving open-source project to market as a managed service. This breadth of exposure led us to the discovery of a memory leak in Kafka, having a bit of an adventure debugging it, and then contributing a patch to the Apache Kafka community to fix it.

Issue Discovery

For the most part, we’ve seen very few issues running Kafka in production. The system itself is very stable and performs very well even under massive amounts of load. However, we identified a scenario where processes in a long-running cluster would no longer fully recover from a restart. The process would seemingly restart successfully but always crash several seconds later. This is troubling to us; a significant part of safely managing and operating a distributed system like Kafka is the ability to restart individual members of a cluster. We started looking into this as a high priority problem.

Fortunately, an internal team had a staging cluster where we could reproduce the issue to investigate further without impacting a production app. We quickly figured out a remediation that we could apply in case we hit this issue on a production cluster before we addressed the bug, and then got to debugging the issue at hand.

A Tale of Failure to Debug

I'm going to discuss the process of debugging this failure, including all the dead ends we encountered. Many debugging stories jump straight to "we found this really difficult bug", but leave out all the gory details and dead ends chased. This is a false picture of fixing gnarly bugs -- in reality nearly all of them involve days of going down dead ends and misunderstanding.

Given that we were able to reproduce the issue consistently, we began our investigation by simply starting the process and looking at the resources on the server using top and similar diagnosis tools. It was immediately clear this was a memory issue. The process would boot, allocate memory at a rate of around 400MB/s without freeing any, and then crash after exhausting all the memory on the server.

We confirmed this by looking at stdout logs of the process. They contained this wonderful snippet:

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 131072 bytes for committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit

We now had two indications that there was a memory issue of some kind! This is progress - we can now start looking at memory allocation, and mostly ignore all other issues. Typically, JVM memory is allocated on the heap, and Java has a bunch of tooling to understand what's on the heap. We broke open the logs emitted by the JVM's garbage collector and found a mystery inside. The GC logs were almost empty, indicating the program wasn't under heap pressure at all. Furthermore, what little information we did have indicated that this broker only ever had 100 megabytes of on-heap memory used. This didn't make sense given our previous evidence from top and stdout.

JVM Memory: A Recap

The JVM mostly allocates memory in two ways:

  • On-heap
  • Off-heap

On-heap memory represents the majority of memory allocation in most JVM programs, and the garbage collector manages allocation and deallocation of on-heap memory. Some programs do use notable amounts of "off-heap" or native memory, whereby the application controls memory allocation and deallocation directly. Kafka shouldn't typically be using a lot of off-heap memory, but our next theory is that it must be doing exactly that. Clearly, this is the only alternative, right? It can't be on-heap memory, or we'd see more information in the GC logs.

To test this theory, we wrote a small script to query the JMX metric java.lang:type=Memory, which tells you about how much on- and off-heap memory the JVM thinks it is using. We ran this script in a tight loop while starting the broker, and saw, to our frustration, nothing useful:

heap_memory=100 MB offheap_memory=63.2 MB
heap_memory=101 MB offheap_memory=63.2 MB
heap_memory=103 MB offheap_memory=63.2 MB

Neither on-heap nor off-heap memory was being used! But what else can even allocate memory? At this point, we reached further into our debugging toolkit.

Tracing: A Magical Debugging Tool

Tracing is a very effective debugging tool, often employed in this kind of situation. Are the logs not telling you what you need to know? Time to dump out relevant events from a tracing tool and start looking through them for ideas.

In this particular case, we used sysdig, an especially powerful tool for debugging issues on a single server. Sysdig allows you to capture system calls, much like the more venerable strace. A syscall is the mechanism by which a userland process communicates with the kernel. Seeing as most kinds of resource usage involve talking to the kernel, looking at syscalls is a very effective way to diagnose this kind of issue.

Sysdig is best used in a "capture, then analyze" mode, much like tcpdump. This lets you write all the syscalls emitted by a process to a file, and then take your time analyzing them.

The command:

sudo sysdig '' -w ~/sysdig.scap

will capture to a file all syscalls emitted by the process named java. We ran this command, then booted our broker and watched it crash.

Now we have a file full of syscalls, what do we look at? The capture file, in this case, was 434MB, which you can't "just read through". Sysdig gives you a suite of analysis tools for looking at the events emitted by the process. In this case, we're interested in memory allocation, so we're interested in the syscalls mmap and munmap for the most part.

The issue we're debugging is that somewhere, Kafka is allocating memory and never freeing it. Remember, this isn't on-heap memory or off-heap memory, so something is doing native memory allocation.

Firing up sysdig, we see that this program does indeed allocate a lot of memory using mmap syscalls. Analysis using bash scripting reveals that 9GB of memory is being allocated using mmap during this run. This is more memory than was available on the server, which seems to point in the right direction. When memory is allocated by mmap, the caller has to call 'munmap' eventually to release it back to the operating system. Not releasing memory back to the operating system is the definition of a memory leak and will cause the process to crash after the leak has used all available memory.

A quick but complex sysdig query reveals this to be the case:

$ sudo sysdig -r ~/sysdig.scap 'evt.type=mmap' -p '%proc.vmsize %evt.dir %evt.type' | grep 'length=' | wc -l
$ sudo sysdig -r ~/sysdig.scap 'evt.type=munmap' -p '%proc.vmsize %evt.dir %evt.type' | grep 'length=' | wc -l

A lot is going on in this query -- sysdig is an incredibly powerful tool. This specific query allows us to see the memory usage of the process at the moment in time when the syscall was traced (that's what %proc.vmsize does in the format argument). While here, we are just counting events, we also examined them for patterns, and this memory usage output was invaluable there.

At this point, we were stumped. We are allocating a lot of memory, but not freeing it. The mmap calls didn't have any particular pattern that we could determine. At this stage in debugging, it's often time to take a break, and let your unconscious mind think up some possibilities. Recognize that you are human: getting rest helps with debugging just as much as staring at a screen.

Why Is All the RAM Gone?

A short while later, we had a hunch to follow: it's something to do with the cluster being configured to use GZIP to compress data.

To provide context, Kafka can use compression to dramatically reduce data volumes transferred, both over the network and to disk. This particular cluster had GZIP compression enabled. Java exposes GZIP compression in the JVM classes GZIPInputStream and GZIPOutputStream. Those classes are backed entirely by a C library, zlib, which does its own memory allocation.

A quick Google search for "JVM GZIP Memory Leak", and we came across this article, which describes almost exactly what we were seeing. That article describes a usage pattern with GZIP that causes the JVM to run out of memory very easily. You open a GZIPInputStream or GZIPOutputStream and never close it when finished with it. This explains why the JVM didn't show this memory as either on-heap or off-heap -- it can't know what memory this C library uses.

We broke open the Kafka source code, and found a point where it was opening a GZIPInputStream to decompress some data, but never closing it. Upon a restart, Kafka has to recover recently written messages, which involves decompressing them and checking them for validity. Most code in Kafka that does decompression reads the whole stream of compressed messages then closes the decompressor. This validity check works differently. It opens the compressed messages, reads the first message and checks its offset. It then short-circuits, failing to ever close the native buffers! This leads to our memory leak upon restart. As it so happens, this code path is never hit during normal operation. It only happens on restarts, which fits our evidence as well.

We confirmed this memory issue by reproducing the bug in a development version of Kafka and then failing to reproduce after applying a patch that closes the GZIPInputStream when short-circuiting.

This is often how debugging difficult bugs goes: hours and days of no progress staring at code and application logs and trace logs, and then some hunch points you in the right direction and is trivial to confirm. Trying different tools and different ways of looking at the problem help you get to a hunch.

Giving Back

Having found and diagnosed the issue, we sent in an upstream ticket to the Kafka project and started working on a patch. After some back and forth review from the Kafka committers, we had a patch in trunk, which is included in the brand new release of Kafka It's interesting to note how small the resulting patch was - this was a major bug that meant Kafka couldn't boot in certain situations, but the bug and resulting bug fix were both very simple. Often a tiny change in code is responsible for a huge change in behavior.

Kafka is now the default version on Apache Kafka on Heroku. For those of you in the Apache Kafka on Heroku beta, you can provision a new cluster that doesn't have this issue. Likewise, we have tooling in place to upgrade existing versions.

For our beta customers, the command

heroku kafka:upgrade heroku-kafka --version 0.10

where heroku-kafka is the name of your Heroku Kafka cluster, will perform an in-place upgrade to the latest version of 0.10 (which is at the time of writing).

If you aren’t in the beta, you can request access here:

Categories: Companies

CNET's OnePlus 3 Giveaway - CNET

The Wisdom of Clouds - James Urquhart - Tue, 08/23/2016 - 04:23
Enter to win* a OnePlus 3 phone with Dash Charging technology, a June winner of our Editor's' Choice award.
Categories: Blogs

Amazon said to plan music-streaming service for its Echo speaker - CNET

The Wisdom of Clouds - James Urquhart - Tue, 08/23/2016 - 03:48
Service would cost half what rival music services charge but be exclusive to Amazon's internet-connected speaker, Recode reports.
Categories: Blogs

Samsung pulls the plug on Milk Music streaming service in US - CNET

The Wisdom of Clouds - James Urquhart - Tue, 08/23/2016 - 03:46
Music service exclusive to recent Galaxy smartphones will shut down next month, the electronics giant says.
Categories: Blogs

Pokemon Go fever shows signs of fading - CNET

The Wisdom of Clouds - James Urquhart - Tue, 08/23/2016 - 03:37
Data collected by market researchers shows engagement, daily users and time spent on the app per day in decline, Bloomberg reports.
Categories: Blogs

Brock Lesnar just made people believe WWE was real - CNET

The Wisdom of Clouds - James Urquhart - Tue, 08/23/2016 - 03:01
Yes, this is pro wrestling, but things quickly got real in SummerSlam's main event this Sunday.
Categories: Blogs

Microsoft unveils the Xbox Onesie - CNET

The Wisdom of Clouds - James Urquhart - Tue, 08/23/2016 - 01:58
The latest reveal from Microsoft is not a console or game, but a cosy garment for gaming.
Categories: Blogs

Barb will get justice in second season of 'Stranger Things' - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 23:08
About time! The creators of the creepy Netflix show promise to revisit the fate of Nancy's poor best friend.
Categories: Blogs

Schwarzenegger gets assembled Terminator-style for 'Celebrity Apprentice' - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 22:52
After replacing Donald Trump as the reality show's host, the actor has apparently been re-assembled and ready to fire (celebrities) at will.
Categories: Blogs

Noodle on this: Ramen surpassing cigarettes as most coveted prison currency - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 22:43
New study says that the high-energy food is becoming more popular as prison food gets even worse.
Categories: Blogs

New Final Fantasy 15 video shows off the voice actors - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 22:22
Meet the voice actors who are bringing the long-awaited RPG's characters to life.
Categories: Blogs

Alan Tudyk on using motion capture for 'Rogue One': Anthony Daniels was envious - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 22:11
The man bringing K-2SO to life in the upcoming Star Wars film talks about having a much more comfortable time playing a droid than C-3PO actor Anthony Daniels.
Categories: Blogs

Nintendo sells its stake in Seattle Mariners MLB team for $661 million - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 22:09
The Mario game company still owns a 10 percent share of the Seattle Mariners.
Categories: Blogs

PlayStation Plus subscription price increasing - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 22:03
Here's how much it'll cost to keep playing online and getting free games.
Categories: Blogs

Paper cut: Can a saw made out of paper slice through wood? - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 22:00
A video that puts a paper blade to the test will leave you with a newfound respect for the sheets in your printer tray.
Categories: Blogs

EA on Nintendo -- 'They deserve to be successful' - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 21:53
"Nintendo is such an instrumental part of our whole industry."
Categories: Blogs

Driverless taxis could be the bullet train of the Tokyo 2020 Olympics - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 21:40
The next summer games could spark a slew of innovation in Japan, once again serving as an antidote to its slumping economy.
Categories: Blogs

​Artist makes Day of the Dead skull sculptures with Lego - CNET

The Wisdom of Clouds - James Urquhart - Mon, 08/22/2016 - 20:56
These unusual Lego sculptures make for a colorful tribute to the Mexican holiday Dia de los Muertos.
Categories: Blogs