mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
Mark Smith ([staff profile] mark) wrote in [site community profile] dw_maintenance2013-01-13 02:45 pm

Site outage over

Hi all,

The site outage is over. My apologies for the downtime.

One of our databases filled up its disk and went offline, and this caused the site to stop responding. We failed over to the backup database and everything is now back up and running.

Everything should be working. Please let us know if you see any trouble.

We will need to schedule a maintenance window soon to handle the full database and rebuild the cluster so we have a pair again. Stay tuned to this account to watch for announcements about that.



Some time last year we realized that our master database pair was filling up its disk, so as part of another downtime we were taking, we cleaned up the slave database and brought it down to around 40% disk usage -- well within comfort.

At the time, we couldn't clean up the master database without taking the site down again or extending the downtime even more, so we decided not to do it at that time and to wait. (Also, it's generally good to separate your maintenances on pairs -- that way if you do something bad and don't notice it, it has time to come out.)

Anyway, the idea was that later we would take another downtime, switch the databases, and then clean up the second machine. That didn't happen though, and the result was that today that database finally ran out of disk space.
astro_noms: (cute owl)

[personal profile] astro_noms 2013-01-13 10:53 pm (UTC)(link)
Thanks for the awesome response time! <3
kuwdora: Pooka - card 60, brian froud (Default)

[personal profile] kuwdora 2013-01-13 10:54 pm (UTC)(link)
♥ !
senmut: modern style black canary on right in front of modern style deathstroke (Default)

[personal profile] senmut 2013-01-13 10:56 pm (UTC)(link)
Dang but you all rock for that response.
ilyena_sylph: picture of Labyrinth!faerie with 'careful, i bite' as text (Default)

[personal profile] ilyena_sylph 2013-01-13 10:57 pm (UTC)(link)
+1.

I was driving so I didn't even see it.

I got home and there were announcements that DW was down, that you were working on it, and then within seconds of my getting home it was all better!

<3333333333333
jumble: (back)

[personal profile] jumble 2013-01-13 10:58 pm (UTC)(link)
Seriously, props on not just the response time but the communication. It is bar none. This is a big part of why I love DW.
dil: (Default)

[personal profile] dil 2013-01-13 10:59 pm (UTC)(link)
Monitoring free space would be a good idea :)

Anyhow thanks for the perfect response time!
zing_och: Grace Choi from the Outsiders comic (Default)

[personal profile] zing_och 2013-01-13 11:00 pm (UTC)(link)
Wow, that was fast! Thank you for letting us know.
lawless523: kanzeon bosatsu (Default)

[personal profile] lawless523 2013-01-13 11:00 pm (UTC)(link)
I just tried posting a comment on someone else's journal and wasn't able to.
hellkitty: (kitten flattened)

[personal profile] hellkitty 2013-01-13 11:00 pm (UTC)(link)
You guys are amazing. Things happen but you were on it immediately *and* communicated with us fast and honestly.
strega_lyth: миру - мир (Default)

[personal profile] strega_lyth 2013-01-13 11:01 pm (UTC)(link)
Thanks!
pretty_panther: (av: thor and natasha)

[personal profile] pretty_panther 2013-01-13 11:02 pm (UTC)(link)
Dayum you guys were quick! Really appreciate your work and keeping us posted!
solmedes: (daughter (pm2) √ knight's honor)

[personal profile] solmedes 2013-01-13 11:04 pm (UTC)(link)
Seconding what they all said. That was remarkably quick.

I'm glad it was only disk space, and not something more serious.
darjeeling: (Default)

[personal profile] darjeeling 2013-01-13 11:04 pm (UTC)(link)
Not only was the hiccup barely long enough to even register as downtime, you swarmed all over fixing it like a Zergling rush AND we even got technical follow up on what caused it.

You guys are too awesome for words. ♥
dragondancer5150: (Default)

[personal profile] dragondancer5150 2013-01-13 11:06 pm (UTC)(link)
SECONDING
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2013-01-13 11:09 pm (UTC)(link)
When you say "not able to", what happened?
alierak: (Default)

[personal profile] alierak 2013-01-13 11:10 pm (UTC)(link)
Yeah. We also realized we've been graphing but not alerting on disk space :/
mn: (mn smile)

[personal profile] mn 2013-01-13 11:11 pm (UTC)(link)
Thank you very much indeed for fixing this so quickly!
darth_eldritch: (Black Hole)

[personal profile] darth_eldritch 2013-01-13 11:12 pm (UTC)(link)
Thank you!

I knew everything was going to be fine soon!
lawless523: kanzeon bosatsu (Default)

[personal profile] lawless523 2013-01-13 11:14 pm (UTC)(link)
I got a message from the web browser (Chrome) that the DW web address in question couldn't be accessed. This even though I'd been able to access my reading list and post an entry that I'd originally tried to post when the site was down using the same web browser.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2013-01-13 11:18 pm (UTC)(link)
Hm. That sounds like it could be a few different things. Start by restarting Chrome and try again? If that doesn't work, let me know the entry you're trying to reply to and I'll poke at it a bit more.
jelazakazone: (m/a stairs)

[personal profile] jelazakazone 2013-01-13 11:21 pm (UTC)(link)
Me too! The transparency is wonderful.
justhuman: dreamwidth icons with paul gross arms :-) (dreamwidth-yay)

[personal profile] justhuman 2013-01-13 11:34 pm (UTC)(link)
Thanks for the quick response and even more for the update. We love that you folks talk to us.
dil: (Default)

[personal profile] dil 2013-01-13 11:40 pm (UTC)(link)
JFYI: During the downtime today I got "connection reset while page was loading" error message in Firefox several times when I tried to access my Reading Page (/read).
It was not the "unable to connect" message which is displayed when the server does not respond.

After about a minute the server started responding, but returned error 404 until it was finally fixed.
Edited 2013-01-13 23:41 (UTC)
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2013-01-13 11:42 pm (UTC)(link)
Yeah, I'm pretty sure different browsers respond to the type of downtime we were having differently. But it should all be fixed now.
darth_eldritch: (Galaxies)

[personal profile] darth_eldritch 2013-01-13 11:47 pm (UTC)(link)
I can't get on my journal page, except through links to specific entries in my inbox or through tags.

I get the maintainence page when I click on my username, or any link that goes directly to my journal, such as from my favorites bar.

Page 1 of 3