denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
Denise ([staff profile] denise) wrote in [site community profile] dw_maintenance2011-07-28 03:56 pm

site slowness: it's complicated!

The site slowness for the past day or so has been due to a bug somewhere in our code that's causing our webserver processes to run out of memory too quickly and lock up the machine.

[personal profile] alierak has been staying on top of things and tweaking the webserver settings to keep things running and to make sure that the settings we're using have the best chance of not running into the "run out of memory, lock up machine" problem. Unfortunately, this means that -- in order to minimize the chance that the site is down entirely -- we've had to seriously lower the number of webserver processes that are running at any time and lower the amount of time before they restart by themselves (and free up the locked-up memory). This means that there are fewer webserver processes available to accept your requests and serve you pages from the site.

Basically, at this point it's a case of "down because of the problem or slow because of the steps we're taking to fix the problem"!

Since it's obvious at this point that just webserver tweaks isn't going to cut it for now, we're doing two things to get the site back to its usual zippy self:

a) Trying to find the root cause of the bug that's making our webserver processes freak out. Memory leaks are really hard to find and debug, which is why it's taking so long. We have a few ideas on how to find what's causing it, and [personal profile] fu is concentrating on that end.

b) Seeing what we can do to get more resources into the webserver pool so that even though the webservers are running out of memory quickly and we have to resource-starve them in order to keep them from checking out entirely, we'll still be able to get pages to load quickly without the delay we're experiencing right now. There's an easy way and a hard way for this, too. (And hopefully, the easy way will help enough that we won't have to get to the hard way.)

(This sort of thing always happens when [staff profile] mark is literally unreachable -- he's on vacation for two weeks in remotest Alaska, with no cell phone reception -- but I wanted to specifically give a massive thank you to [personal profile] alierak, our backup sysadmin, who is doing wonders with the problem.)
azurelunatic: "beautiful addiction", electron microscope photo of caffeine (caffeine)

Re: DW is love right now.

[personal profile] azurelunatic 2011-07-30 04:56 am (UTC)(link)
Some terminology nuance -- 'bug' tends to be a (known or unknown) unintentionally created flaw that causes a problem, either when something's wrong by itself, or two or more things that would be OK on their own come together and create a problem -- like, there's nothing inherently bad about a toddler, and nothing inherently bad about an unlidded jar of peanut butter, but put the two together and there's a problem, especially when the cat then walks by.

Then there's a whole class of jargon for security problems -- there are bugs that are also 'vulnerabilities', weak points that could be attacked by someone. Sometimes the vulnerabilities will be exploited, which was what I think you were hoping wasn't happening.

*brews some more sympathetic coffee, finds some Diet Coke with lime for [staff profile] denise*
wytchcroft: heavent sent (bird woman)

Re: DW is love right now.

[personal profile] wytchcroft 2011-07-30 06:02 pm (UTC)(link)
thank you - and i have say that i'm always impressed by how helpful, straight-forward and unpatronising the replies here are.
keep on keepin' on. :)