|Mark Smith (mark) wrote in dw_maintenance,|
@ 2013-01-13 02:45 pm UTC
The site outage is over. My apologies for the downtime.
One of our databases filled up its disk and went offline, and this caused the site to stop responding. We failed over to the backup database and everything is now back up and running.
Everything should be working. Please let us know if you see any trouble.
We will need to schedule a maintenance window soon to handle the full database and rebuild the cluster so we have a pair again. Stay tuned to this account to watch for announcements about that.
Some time last year we realized that our master database pair was filling up its disk, so as part of another downtime we were taking, we cleaned up the slave database and brought it down to around 40% disk usage -- well within comfort.
At the time, we couldn't clean up the master database without taking the site down again or extending the downtime even more, so we decided not to do it at that time and to wait. (Also, it's generally good to separate your maintenances on pairs -- that way if you do something bad and don't notice it, it has time to come out.)
Anyway, the idea was that later we would take another downtime, switch the databases, and then clean up the second machine. That didn't happen though, and the result was that today that database finally ran out of disk space.