Why does the GVL still exist?


Hi there! I'm Jesse, and you're getting this email because you bought one of my books or signed up for my email list. I'm sharing my knowledge here to give you a peek under the hood of tools you're already using. If you don't want to hear from me, there's an 'unsubscribe' link at the bottom of this email. Enjoy!

The last email I sent about thread-safety really seems to have struck a chord. I got a whole bunch of follow-up responses from people wanting to know more. Today I'll answer some of the common questions I received.

Missed the last email? Click here to read it.

In the last email I talked about a couple of key points:
  1. Ruby's core collection classes (Array and Hash) are not thread-safe.
  2. These classes appear to be thread-safe in MRI, the de facto Ruby implementation, because of its Global VM Lock (GVL).
  3. These classes are shown not to be thread-safe when using a Ruby implementation with no GVL, such as JRuby or Rubinius.
These key points raised a bunch of interesting questions from readers. Here are some exact quotes from the email responses I got last week.
"Isn't MRI superior then, since it handles most cases automatically thread-safely?"
"Why does GVL exist in the first place?"
"[The GVL] seems like a bad internal design which should have been removed years ago."
These are all excellent points. I'll answer each in turn.
Isn't MRI superior then, since it handles most cases automatically thread-safely?

That absolutely depends on your definition of 'superior'. I'm not being facetious here. In some cases, the GVL is a good thing.

The MRI team say actually says that the GVL is a feature, because it attempts to make your code automatically thread-safe. However, it does this by preventing your code from truly running in parallel. With MRI, even on a 4-core CPU, only one thread will be active at any given time.

But don't forget the caveat! In MRI, when one thread is blocking on IO, MRI switches to another thread and lets it run in parallel. So in a multi-threaded system where you have a lot blocking IO, MRI will perform similarly to JRuby when it comes to concurrency. (I'm ignoring other JRuby optimizations here, like its more efficient GC.)

If you're reading this, chances are that you work on Rails apps. In many cases, the bottlenecks in our Rails apps have to do with blocking IO. Between communicating with the upstream client, communicating with cache servers, databases, external services, etc. there's lots of IO going on. In many cases, the performance profile between MRI and JRuby will be similar for these situations.

Obviously I can't make a statement like that without saying that you shouldn't just take my word for it. If you're thinking of changing an app to use a different Ruby implementation, you need to measure the real differences for your application to make an informed decision.
All that being said, the other implementations (JRuby and Rubinius) do not have this global lock to contend with. So your code really will run in parallel. If you have a 4-core CPU you can have 4 threads working simultaneously. If you organize your code properly, this could give you a 400% speed increase for work that makes heavy use of the CPU when using one of these alternative implementations.

All of this hinges on one important question: does the GVL actually guarantee that your code will be thread-safe? Unfortunately, no. It makes its best effort and does well in many cases, but not all cases. I'll cover this in more depth in the next email.

Ultimately, GVL or no GVL is a tradeoff. You have to decide if you want to give up the thread-safety 'guarantees' that the GVL offers in favour of real parallelism and the challenges that go along with it.

Why does GVL exist in the first place? It seems like a bad internal design which should have been removed years ago.

These two points are two facets of the same question, and have the same answer, so I lumped them together.

From the beginning, MRI has always supported C extensions and had a culture of C hackers around it. Many of the C extensions that were being created wanted to mess with Ruby internals directly via its C API. Not to mention that some of the C libraries being wrapped were not thread-safe themselves.

By introducing a GVL, these problems went away. MRI no longer needed to worry about C extensions corrupting the internals.

So the GVL was introduced to make it easy to write C extensions for C code that wasn't necessarily thread-safe. On top of that, we didn't have multi-core CPUs at the time that Ruby began. The requirements for the language were different.

That gives us some idea why the GVL was put in place to begin with, but now that multi-core CPUs are a reality, why don't they remove it?

For one thing, the MRI codebase that we use today began in 1993. MRI was a pet project for a long time and still doesn't have much test coverage, or QA process of any kind. Given this, making a major architectural change like removing the GVL is a boatload of work that's almost guaranteed to introduce subtle bugs. This is not where the MRI team wants to put their efforts.

Another technical reason is that it would effectively break backwards compatibility with existing C extensions. Existing C extensions implicitly assume that they don't need to worry about thread-safety. Changing this guarantee could have a surprising effect on existing libraries.

One interesting anecdote that Matz brought up last year at Rubyconf. When asked if he was going to remove the GVL from MRI, he said that the MRI team had removed it in an experimental branch, replacing it with finer-grained locks to keep the internals consistent. He posits that this made MRI slower for single-threaded code at the expense of being truly parallel. He didn't think this tradeoff was worth making the change. His decision may be debated, but the MRI team should get extra points for trying it out :)
Many people give MRI a hard time for its GVL. It's something people love to hate. If you ask the MRI team, the GVL is there to make your data less likely to be corrupted by thread-safety issues. A growing number of people in the community think that this responsibility should be pushed down to developers to deal with. Indeed, that's the situation on JRuby and Rubinius, the responsibility of data safety is fully pushed down to the developer. 

I hope this clears up some of the mysteries around MRI and the GVL without raising any new ones. If you have unanswered questions, reply to this email and I'll help you out.

You'll hear from me again next week with more code. Since the built-in Array isn't thread-safe, I'll  show you how you might write your own thread-safe collection. Stay tuned.

Until then,

PS - The web site for my book about multi-threading in Ruby is now up. There's a sign up form there, but you don't need to sign up again. Since you're already on my list, I'll make sure you stay in the loop. However, I'd love some help to spread the word. If you have a friend or co-worker who would like to learn this stuff, please point them to the web site.

Copyright © 2013 Jesse Storimer, All rights reserved.