Today we started getting some alerts on the database, so I'm going to do some maintenance to verify the health and wellness of the machine and get it back to a non-alerting status.
I should be able to do this without any downtime, but just in case, you might want to make sure to use your favorite text editor to save a copy of any long entries or comments you're working on.
Once I've got things sorted out, I'll update this with more details for the technically curious.
Update [4:50PM PST]:
sb-db06 (the slave) has been rebooted and is recovering, I'm doing system updates on it since the problem looks like a kernel bug (it struck both databases at the same time). Next: master failover then recover the other database.
Update [5:05PM PST]: I'm doing what we call a "master failover" now. This means I'm shifting all traffic from the database that was active (
sb-db05) to the spare database (
sb-db06). I have to shut off "extra" services like imports, feeds, and searches while this happens.
Update [5:30PM PST]: Well, that was unexpectedly bumpy. Sorry for that. There should be no further bumping, as we're now on the spare database so I can take maintenance on the original master.
Update [6:20PM PST]: If you had userpics not loading, they should be back to normal.