EDIT: And we're done! We're watching for issues now, but if you spot anything, sing out.
EDIT: And we're done! We're watching for issues now, but if you spot anything, sing out.
This push contains some sweeping backend changes, so you either won't notice anything at all, or things will be Very Broken. :) (We're pretty sure things won't be Very Broken, since things have been working out fine in testing, but there's always the chance of things getting screwy when the new changes get widespread adoption.) We'll have everyone on hand to mke sure problems get dealt with quickly.
Hi all --
Some have noticed that re-importing to get new comments hasn't been working for a while. This has been fixed; it was an operational issue (the importer cache wasn't being cleaned).
Anyway, if you have been having trouble getting recent comments to import onto DW, things should be working now. Please give it a shot.
Edited: Also, if you are still having importer troubles, please open a new request and let us know here:
Thanks and sorry for the trouble!
* In some rare cases, trying to complete a payment will result in an error message. It seems to be even odds whether the failure happens before or after your card has been charged, but either way, the items won't be applied to your account. If this happens to you, the error message asks you to open a support request in the Account Payments category: please do so! I'll be able to check whether your card has been charged, and if so, make sure you get the items you paid for.
(EDIT: There was another point here about a different problem that had cropped up since last night, but further investigation turned up that it was only a variant of the above, and only a single payment was affected by it. So, false alarm there!)
I'm really sorry about the hassle, folks! We've been trying to work out what's causing this to happen, but no luck so far. If you ever have questions about whether or not your payment has gone through, just open a support request to ask, and I'll get to it as soon as possible. (This weekend there might be some delay, since my sister's getting married! But usually it's a pretty quick turnaround.)
1) Random "403 Forbidden" errors when loading site pages, mostly in journals but sometimes on site pages.
2) Shop carts paid for by credit card where you get an error when trying to check out, then the cart is set to "waiting for payment" status.
3) When crossposting to LiveJournal, the crosspost not going through and error messages in your inbox that read "Failed to connect to http://www.livejournal.com/interface/xm
All three of these errors are only happening occasionally.
The first two are problems on our end that we're working to track down -- we've added some extra debugging code that should help us pinpoint the cause, and we'll get it fixed as quickly as possible after that. (Things that only happen occasionally are very hard to diagnose and fix, since you can't always know for sure what the cause is, and you can't usually verify that the fixes you're putting into place actually fix the problem.)
The crossposting problem is, unfortunately, on LJ's end and not on ours: that error means that LJ was unreachable at the time the crosspost attempt was sent. If you get the error, your crosspost attempt will retry up to five times, at increasingly-longer intervals, before sending you a final failure notice in your inbox. Once that happens, you can edit the post and re-check the crosspost box, then save the post, to start it trying again. Whether or not it succeeds depends entirely on whether we can reach LJ at the time it tries. (It doesn't matter whether you can load LJ on your own computer when that happens -- the crosspost attempt is sent from our servers, so our servers have to be able to reach LJ, not your computer.)
If you run into problems with a payment, open a support request in the Account Payments category, and I'll get things fixed up for you as soon as I possibly can. (There's only one of me, though, so it probably won't be an immediate response, unless you happen to catch me while I'm sitting right in front of the computer!)
If you run into either of the other two problems, you can let us know by leaving a comment here, just so we can get a rough sense of how often the problems are coming up. We might not be able to tell you anything more than what's in this post, though!
If you're having an issue that isn't one of these three, open a support request, describing the issue as thoroughly as possible, and somebody will help you troubleshoot it.
I will be doing a small code push tonight in about an hour or two. It hasn't been long since our last push so this one isn't particularly large, hence the short notice.
I'll post on Twitter when we start. There should be no downtime for this.
The code push and downtime will be happening in one hour. Remember, the site will be down for ~10 minutes while I do a database failover to move us completely on to the new hardware.
I will post again when the push happens, and as always, you can check out our dreamwidth account on Twitter to keep up with the code push!
I'd like to do a brief site maintenance and code push in about 32 hours. The scheduled time:
- 2100 PDT Thursday
- 0400 UTC Friday
This code push will involve a period of downtime. I'm estimating it will take about 10 minutes. During that window, I will be switching us from our old
sb-db01 database master to our new
sb-db05 cluster. I have to take the site offline for this since it's what we call a master failover, and our system isn't designed to do that without downtime.
After this maintenance, we will be completely on new hardware -- and all of the trusty hardware we've had for the past three years since moving to ServerBeach will be completely retired.
As always, there will be another post here as well as on our Twitter account when the time comes. Feel free to shoot me any questions or comments, I'll watch this post!
PS HELLO PLURKERS.
EDIT: This problem has been fixed now!
I did some database maintenance today -- moving our workers around! -- and this caused a glitch in the replication between our old databases and the new ones, so the new ones weren't getting all the updated data.
What this means to you: if you saw problems trying to update your access list or subscription filters, or with community invitations, or viewing support requests, that was caused by the glitch in replication. I'm really sorry for the inconvenience.
This particular issue won't recur, since it was caused by a very specific circumstance related to moving the workers around. Since I'm done moving them, the problem won't happen again.
As part of our new hardware project, I'm going to be failing us over to our new load balancers. This will involve a brief downtime for the site while everything fails over, but it should be less than 60 seconds.
Thanks for your patience, and sorry for the interruption!
I have cleared out the pending queue of payments, so that we shouldn't have charged for anything in the past 24 hours, and that should mean there are no doubled (or more) payments. Please, of course, let us know if that's the case though, and we'll take care of it!
Sorry for the trouble!
If you wind up getting multiple charges when it comes back up (for instance, if you re-submitted the form, thinking that your internet connection was to blame) you can open a support request (in the Account Payments category) after the payment is processed and I'll issue a refund to your card for the extra charges.
We're really sorry about the downtime!
We don't consider this one "high risk", so (*knocks wood*) it should be pretty uneventful.
The new database machines I ordered are now installed and spinning up. They're in the beginning phases of their life, which means I've moved a few test accounts (a few communities and some other random people) and will be watching how they behave over the next day or two to make sure that everything is happy.
The new database cluster has been christened Epsilon Eridani and will soon be the home for all of our users.
You should really not expect to see anything yet, but take this post as fore-warning that sometime soon (I'll post again) I will start moving accounts in earnest. You can expect brief bouts of "read-only mode" when this happens, so if you see that starting to pop up around the site in the next few days -- that's why!
(For some California local definition of 'morning'!)
About 30 minutes ago one of our databases (sb-db03) locked up and stopped serving traffic. This was an active database, so the site quickly stopped when it could no longer serve requests. Alas.
I have failed us over to a backup database and now everything should be working again.
I'm not sure yet what happened to db03, but am currently investigating and will update this post if I come up with a root cause for the problem. Edit: It's back up and doesn't have any visible problems. Disks are fine, data's intact, etc. The graphs and logs show nothing. We'll have to keep an eye on it and see if it manifests further issues.
Sorry for the trouble, please let me know if you still see any problems!