Denise (
denise) wrote in
dw_maintenance2011-07-28 03:56 pm
![[staff profile]](https://www.dreamwidth.org/img/silk/identity/user_staff.png)
![[site community profile]](https://www.dreamwidth.org/img/comm_staff.png)
site slowness: it's complicated!
The site slowness for the past day or so has been due to a bug somewhere in our code that's causing our webserver processes to run out of memory too quickly and lock up the machine.
alierak has been staying on top of things and tweaking the webserver settings to keep things running and to make sure that the settings we're using have the best chance of not running into the "run out of memory, lock up machine" problem. Unfortunately, this means that -- in order to minimize the chance that the site is down entirely -- we've had to seriously lower the number of webserver processes that are running at any time and lower the amount of time before they restart by themselves (and free up the locked-up memory). This means that there are fewer webserver processes available to accept your requests and serve you pages from the site.
Basically, at this point it's a case of "down because of the problem or slow because of the steps we're taking to fix the problem"!
Since it's obvious at this point that just webserver tweaks isn't going to cut it for now, we're doing two things to get the site back to its usual zippy self:
a) Trying to find the root cause of the bug that's making our webserver processes freak out. Memory leaks are really hard to find and debug, which is why it's taking so long. We have a few ideas on how to find what's causing it, and
fu is concentrating on that end.
b) Seeing what we can do to get more resources into the webserver pool so that even though the webservers are running out of memory quickly and we have to resource-starve them in order to keep them from checking out entirely, we'll still be able to get pages to load quickly without the delay we're experiencing right now. There's an easy way and a hard way for this, too. (And hopefully, the easy way will help enough that we won't have to get to the hard way.)
(This sort of thing always happens when
mark is literally unreachable -- he's on vacation for two weeks in remotest Alaska, with no cell phone reception -- but I wanted to specifically give a massive thank you to
alierak, our backup sysadmin, who is doing wonders with the problem.)
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Basically, at this point it's a case of "down because of the problem or slow because of the steps we're taking to fix the problem"!
Since it's obvious at this point that just webserver tweaks isn't going to cut it for now, we're doing two things to get the site back to its usual zippy self:
a) Trying to find the root cause of the bug that's making our webserver processes freak out. Memory leaks are really hard to find and debug, which is why it's taking so long. We have a few ideas on how to find what's causing it, and
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
b) Seeing what we can do to get more resources into the webserver pool so that even though the webservers are running out of memory quickly and we have to resource-starve them in order to keep them from checking out entirely, we'll still be able to get pages to load quickly without the delay we're experiencing right now. There's an easy way and a hard way for this, too. (And hopefully, the easy way will help enough that we won't have to get to the hard way.)
(This sort of thing always happens when
![[staff profile]](https://www.dreamwidth.org/img/silk/identity/user_staff.png)
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
no subject