mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark

Hi all --

Some have noticed that re-importing to get new comments hasn't been working for a while. This has been fixed; it was an operational issue (the importer cache wasn't being cleaned).

Anyway, if you have been having trouble getting recent comments to import onto DW, things should be working now. Please give it a shot.

Edited: Also, if you are still having importer troubles, please open a new request and let us know here:

http://www.dreamwidth.org/support/submit

Thanks and sorry for the trouble!

denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
[staff profile] denise
We're still having trouble tracking down a minor bug with the payment system:

* In some rare cases, trying to complete a payment will result in an error message. It seems to be even odds whether the failure happens before or after your card has been charged, but either way, the items won't be applied to your account. If this happens to you, the error message asks you to open a support request in the Account Payments category: please do so! I'll be able to check whether your card has been charged, and if so, make sure you get the items you paid for.

(EDIT: There was another point here about a different problem that had cropped up since last night, but further investigation turned up that it was only a variant of the above, and only a single payment was affected by it. So, false alarm there!)

I'm really sorry about the hassle, folks! We've been trying to work out what's causing this to happen, but no luck so far. If you ever have questions about whether or not your payment has gone through, just open a support request to ask, and I'll get to it as soon as possible. (This weekend there might be some delay, since my sister's getting married! But usually it's a pretty quick turnaround.)
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
[staff profile] denise
We've had a few intermittent issues lately that people have been running into:

1) Random "403 Forbidden" errors when loading site pages, mostly in journals but sometimes on site pages.

2) Shop carts paid for by credit card where you get an error when trying to check out, then the cart is set to "waiting for payment" status.

3) When crossposting to LiveJournal, the crosspost not going through and error messages in your inbox that read "Failed to connect to http://www.livejournal.com/interface/xmlrpc".

All three of these errors are only happening occasionally.

The first two are problems on our end that we're working to track down -- we've added some extra debugging code that should help us pinpoint the cause, and we'll get it fixed as quickly as possible after that. (Things that only happen occasionally are very hard to diagnose and fix, since you can't always know for sure what the cause is, and you can't usually verify that the fixes you're putting into place actually fix the problem.)

The crossposting problem is, unfortunately, on LJ's end and not on ours: that error means that LJ was unreachable at the time the crosspost attempt was sent. If you get the error, your crosspost attempt will retry up to five times, at increasingly-longer intervals, before sending you a final failure notice in your inbox. Once that happens, you can edit the post and re-check the crosspost box, then save the post, to start it trying again. Whether or not it succeeds depends entirely on whether we can reach LJ at the time it tries. (It doesn't matter whether you can load LJ on your own computer when that happens -- the crosspost attempt is sent from our servers, so our servers have to be able to reach LJ, not your computer.)

If you run into problems with a payment, open a support request in the Account Payments category, and I'll get things fixed up for you as soon as I possibly can. (There's only one of me, though, so it probably won't be an immediate response, unless you happen to catch me while I'm sitting right in front of the computer!)

If you run into either of the other two problems, you can let us know by leaving a comment here, just so we can get a rough sense of how often the problems are coming up. We might not be able to tell you anything more than what's in this post, though!

If you're having an issue that isn't one of these three, open a support request, describing the issue as thoroughly as possible, and somebody will help you troubleshoot it.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark

Hi all! The code push is downdone.

As always, please let us know if you find anything awry and we'll get right on it!

mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark
Hi all,

I will be doing a small code push tonight in about an hour or two. It hasn't been long since our last push so this one isn't particularly large, hence the short notice.

I'll post on Twitter when we start. There should be no downtime for this.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark
...and we're back!

Please let me know of any problems you see. We've got hands on deck ready to pounce any issues and take care of them as quickly as we can.

Thanks for your patience!
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark

Hi all,

The code push and downtime will be happening in one hour. Remember, the site will be down for ~10 minutes while I do a database failover to move us completely on to the new hardware.

I will post again when the push happens, and as always, you can check out our [twitter.com profile] dreamwidth account on Twitter to keep up with the code push!

mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark

Hi all!

I'd like to do a brief site maintenance and code push in about 32 hours. The scheduled time:

  • 2100 PDT Thursday
  • 0400 UTC Friday

This code push will involve a period of downtime. I'm estimating it will take about 10 minutes. During that window, I will be switching us from our old sb-db01 database master to our new sb-db05 cluster. I have to take the site offline for this since it's what we call a master failover, and our system isn't designed to do that without downtime.

After this maintenance, we will be completely on new hardware -- and all of the trusty hardware we've had for the past three years since moving to ServerBeach will be completely retired.

As always, there will be another post here as well as on our Twitter account when the time comes. Feel free to shoot me any questions or comments, I'll watch this post!

PS HELLO PLURKERS.

alierak: (ninja)
[personal profile] alierak
I'm going to restart Varnish on the main loadbalancer at 8am CDT (13:00 UTC). Dreamwidth will be completely unavailable for a very brief time, and then maybe a little slower for a while as the cache gets refilled. But when it's done, some things should be faster than they are now. Since we have new servers with more RAM, it's time to use some of it to cache more user icons.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
[staff profile] denise
We're getting multiple reports of crossposting to LiveJournal not working, with an error message of "Failed to crosspost entry to [username]@LiveJournal: Failed to connect to http://www.livejournal.com/interface/xmlrpc." We're looking into whether this is a problem on LJ's end or our end (and if it's our end, we'll do our best to fix it!)

EDIT: This problem has been fixed now!
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark

Hi all,

I did some database maintenance today -- moving our workers around! -- and this caused a glitch in the replication between our old databases and the new ones, so the new ones weren't getting all the updated data.

What this means to you: if you saw problems trying to update your access list or subscription filters, or with community invitations, or viewing support requests, that was caused by the glitch in replication. I'm really sorry for the inconvenience.

This particular issue won't recur, since it was caused by a very specific circumstance related to moving the workers around. Since I'm done moving them, the problem won't happen again.

A more technical explanation follows... )

mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark
Hi all,

As part of our new hardware project, I'm going to be failing us over to our new load balancers. This will involve a brief downtime for the site while everything fails over, but it should be less than 60 seconds.

Thanks for your patience, and sorry for the interruption!
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark
The payment system is back online. It was my fault; I was moving it to our new hardware, but I didn't realize there is a code change that I have to make. (For the details curious, the underlying SSL module we use was upgraded, and it now requires you to add some more options when you use it.)

I have cleared out the pending queue of payments, so that we shouldn't have charged for anything in the past 24 hours, and that should mean there are no doubled (or more) payments. Please, of course, let us know if that's the case though, and we'll take care of it!

Sorry for the trouble!
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
[staff profile] denise
The backend system that runs payments is temporarily unavailable, and will be fixed as soon as possible. If you've tried to make a payment at any time between last night & now and gotten an endless wait, your payment is almost certainly in the queue to be processed as soon as the backend is back up & running -- you don't need to submit it again.

If you wind up getting multiple charges when it comes back up (for instance, if you re-submitted the form, thinking that your internet connection was to blame) you can open a support request (in the Account Payments category) after the payment is processed and I'll issue a refund to your card for the extra charges.

We're really sorry about the downtime!

Code pushed

Jun. 7th, 2013 11:31 pm
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark
Hi all!

The code has been pushed. As always, please report problems here! We have lots of hands on deck and ready to jump on things that might be awry. Thanks!
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
[staff profile] denise
We've been hacking away in person at the conference we went to this week, and we'd like to share the fruit of our labors with you all! There'll be a code push tonight (6/7) at 9PM CDT, which is 10PM EDT/7PM PDT/2AM GMT (6/8). (Convert to your time zone!)

We don't consider this one "high risk", so (*knocks wood*) it should be pretty uneventful.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark

Hi all!

The new database machines I ordered are now installed and spinning up. They're in the beginning phases of their life, which means I've moved a few test accounts (a few communities and some other random people) and will be watching how they behave over the next day or two to make sure that everything is happy.

The new database cluster has been christened Epsilon Eridani and will soon be the home for all of our users.

You should really not expect to see anything yet, but take this post as fore-warning that sometime soon (I'll post again) I will start moving accounts in earnest. You can expect brief bouts of "read-only mode" when this happens, so if you see that starting to pop up around the site in the next few days -- that's why!

mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark

(For some California local definition of 'morning'!)

About 30 minutes ago one of our databases (sb-db03) locked up and stopped serving traffic. This was an active database, so the site quickly stopped when it could no longer serve requests. Alas.

I have failed us over to a backup database and now everything should be working again.

I'm not sure yet what happened to db03, but am currently investigating and will update this post if I come up with a root cause for the problem. Edit: It's back up and doesn't have any visible problems. Disks are fine, data's intact, etc. The graphs and logs show nothing. We'll have to keep an eye on it and see if it manifests further issues.

Sorry for the trouble, please let me know if you still see any problems!

mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark
Please comment and let us know of anything broken! It'll be live shortly.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark
FYI: We'll be doing the code push in about 90 minutes. I'll post again when it happens, and you can always watch our [twitter.com profile] dreamwidth account on Twitter for updates.
Page generated Jan. 25th, 2015 08:17 pm
Powered by Dreamwidth Studios