Dec. 29th, 2011

mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark
I'm going to be doing a code push shortly as soon as I'm happy with the positioning of my ducks. (I don't need them to all have the same Y coordinates, but they should at least be near each other.)

This is a smaller push than most, there isn't a ton of stuff that's changed. There are a few major changes to how the importer works though, which is my area of most concern and where I'll be spending a lot of time watching.

As always, please comment and let me know if there are any issues. Thank you!

Update: Code pushed. There was some temporary slowness, our load balancer (Perlbal) got into a weird state where it was using 100% CPU. I restarted it and things returned to normal.

Update #2: There was an issue affecting status messages for community imports. This should be fixed now.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark
Dreamwidth was having issues loading for ~15 minutes. This is now resolved.

The summary of the issue is that when you connect to Dreamwidth, you are actually connecting to a bit of software called Perlbal. This software handles routing your request to one of our web servers (we have a bunch) and it also does some other nifty stuff.

The main problem with it is that it's single-threaded. That means that, on the machines we have that have eight or more CPU cores (most modern stuff!), it can only ever run on one of those. This leaves the machine very underutilized -- i.e., it's mostly idle!

This was never a problem for us because even just one core was enough to handle all of the Dreamwidth traffic. At some point we split it up so that static traffic (images, CSS, etc) goes to a second Perlbal instance, but most of the main web traffic still goes through that primary instance.

Today, we finally hit the threshold where Perlbal was taking 100% of the one core it was on and couldn't go any faster. This caused it to queue up requests -- making the site feel really slow. The backend has plenty of capacity, it's just that the frontend wasn't able to go fast enough to handle the traffic.

The fix was to put a much faster load balancer in front and use it to balance traffic to two different Perlbal instances. Now we have a bit of software called Pound that runs in front. We have always been using Pound, but it was only serving SSL requests. Now it is also serving unencrypted HTTP traffic and is then passing that traffic on to two Perlbal instances. In short, it's a load balancer for our load balancers!

This lets us scale more since Pound is an order of magnitude more efficient than Perlbal. By the time we reach the limits of scalability on Pound, we'll have to legitimately move to bigger hardware. (And actually by the time we get there, we will probably be collocating! Exciting!)
Page generated Oct. 21st, 2017 10:08 am
Powered by Dreamwidth Studios