denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

"Heartbleed" security vulnerability

For those who have seen reference today in the press to the "Heartbleed" security vulnerability in OpenSSL, we'd like to reassure you that although we (like a large portion of the internet) were running the affected software, we patched our servers last night and were no longer vulnerable from that point.

We have no reason to believe that anyone was exploiting this vulnerability against us or that any user data has been compromised. We'll be changing our security certificates for extra confidence.

On the other hand, the nature of this vulnerablity means that it's impossible for a website to know for absolute certain whether someone was exploiting it. If someone was exploiting the vulnerability, against us or against any other website, they potentially have access to any information you sent to the site, including your username/password for the site and any data you sent to the site under HTTPS. It's a good idea to change your passwords pretty much everywhere, but don't do it until you can verify that a site is no longer vulnerable.

If you have any questions, feel free to ask!
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2014-01-26 16:06

Database maintenance

Hi all!

Today we started getting some alerts on the database, so I'm going to do some maintenance to verify the health and wellness of the machine and get it back to a non-alerting status.

I should be able to do this without any downtime, but just in case, you might want to make sure to use your favorite text editor to save a copy of any long entries or comments you're working on.

Once I've got things sorted out, I'll update this with more details for the technically curious.

Update [4:50PM PST]: sb-db06 (the slave) has been rebooted and is recovering, I'm doing system updates on it since the problem looks like a kernel bug (it struck both databases at the same time). Next: master failover then recover the other database.

Update [5:05PM PST]: I'm doing what we call a "master failover" now. This means I'm shifting all traffic from the database that was active (sb-db05) to the spare database (sb-db06). I have to shut off "extra" services like imports, feeds, and searches while this happens.

Update [5:30PM PST]: Well, that was unexpectedly bumpy. Sorry for that. There should be no further bumping, as we're now on the spare database so I can take maintenance on the original master.

Update [6:20PM PST]: If you had userpics not loading, they should be back to normal.

alierak: (Default)

Database restart

Our database servers have been generating some alerts today, where the monitoring system can't log into the servers to check on them. I can't log in either, but as best I can tell the databases are still running, or you wouldn't be seeing this. There is probably an issue with memory usage or excessive disk I/O, but it's kind of hard to troubleshoot at the moment. I would expect DW to be down for a while at some point today in order to restart the databases and/or reboot the servers. More info when [staff profile] mark's had a chance to look.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

(no subject)

Code push starting now! We'll update this entry when we're finished.

EDIT: And we're done! We're watching for issues now, but if you spot anything, sing out.

We're working on trying to diagnose and fix the issue of missing notifications. This bug's been fixed. If you were affected, you won't get notifications that you missed getting, but you will get notifications from here on out.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

code push

There will be a code push at midnight EST Sunday 24 Nov (9PM PST Sun 24 Nov, 5AM GMT Monday 25 Nov, see in your time zone)

This push contains some sweeping backend changes, so you either won't notice anything at all, or things will be Very Broken. :) (We're pretty sure things won't be Very Broken, since things have been working out fine in testing, but there's always the chance of things getting screwy when the new changes get widespread adoption.) We'll have everyone on hand to mke sure problems get dealt with quickly.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-10-23 20:56

Importer update

Hi all --

Some have noticed that re-importing to get new comments hasn't been working for a while. This has been fixed; it was an operational issue (the importer cache wasn't being cleaned).

Anyway, if you have been having trouble getting recent comments to import onto DW, things should be working now. Please give it a shot.

Edited: Also, if you are still having importer troubles, please open a new request and let us know here:

http://www.dreamwidth.org/support/submit

Thanks and sorry for the trouble!

denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

Payment system problems

We're still having trouble tracking down a minor bug with the payment system:

* In some rare cases, trying to complete a payment will result in an error message. It seems to be even odds whether the failure happens before or after your card has been charged, but either way, the items won't be applied to your account. If this happens to you, the error message asks you to open a support request in the Account Payments category: please do so! I'll be able to check whether your card has been charged, and if so, make sure you get the items you paid for.

(EDIT: There was another point here about a different problem that had cropped up since last night, but further investigation turned up that it was only a variant of the above, and only a single payment was affected by it. So, false alarm there!)

I'm really sorry about the hassle, folks! We've been trying to work out what's causing this to happen, but no luck so far. If you ever have questions about whether or not your payment has gone through, just open a support request to ask, and I'll get to it as soon as possible. (This weekend there might be some delay, since my sister's getting married! But usually it's a pretty quick turnaround.)
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

a few quick notes on ongoing, intermittent issues

We've had a few intermittent issues lately that people have been running into:

1) Random "403 Forbidden" errors when loading site pages, mostly in journals but sometimes on site pages.

2) Shop carts paid for by credit card where you get an error when trying to check out, then the cart is set to "waiting for payment" status.

3) When crossposting to LiveJournal, the crosspost not going through and error messages in your inbox that read "Failed to connect to http://www.livejournal.com/interface/xmlrpc".

All three of these errors are only happening occasionally.

The first two are problems on our end that we're working to track down -- we've added some extra debugging code that should help us pinpoint the cause, and we'll get it fixed as quickly as possible after that. (Things that only happen occasionally are very hard to diagnose and fix, since you can't always know for sure what the cause is, and you can't usually verify that the fixes you're putting into place actually fix the problem.)

The crossposting problem is, unfortunately, on LJ's end and not on ours: that error means that LJ was unreachable at the time the crosspost attempt was sent. If you get the error, your crosspost attempt will retry up to five times, at increasingly-longer intervals, before sending you a final failure notice in your inbox. Once that happens, you can edit the post and re-check the crosspost box, then save the post, to start it trying again. Whether or not it succeeds depends entirely on whether we can reach LJ at the time it tries. (It doesn't matter whether you can load LJ on your own computer when that happens -- the crosspost attempt is sent from our servers, so our servers have to be able to reach LJ, not your computer.)

If you run into problems with a payment, open a support request in the Account Payments category, and I'll get things fixed up for you as soon as I possibly can. (There's only one of me, though, so it probably won't be an immediate response, unless you happen to catch me while I'm sitting right in front of the computer!)

If you run into either of the other two problems, you can let us know by leaving a comment here, just so we can get a rough sense of how often the problems are coming up. We might not be able to tell you anything more than what's in this post, though!

If you're having an issue that isn't one of these three, open a support request, describing the issue as thoroughly as possible, and somebody will help you troubleshoot it.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-07-26 23:28

Code push complete.

Hi all! The code push is downdone.

As always, please let us know if you find anything awry and we'll get right on it!

mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-07-26 21:18

Small code push tonight

Hi all,

I will be doing a small code push tonight in about an hour or two. It hasn't been long since our last push so this one isn't particularly large, hence the short notice.

I'll post on Twitter when we start. There should be no downtime for this.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-07-18 21:11

Code push complete!

...and we're back!

Please let me know of any problems you see. We've got hands on deck ready to pounce any issues and take care of them as quickly as we can.

Thanks for your patience!
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-07-18 20:02

Code push very soon (1 hour)

Hi all,

The code push and downtime will be happening in one hour. Remember, the site will be down for ~10 minutes while I do a database failover to move us completely on to the new hardware.

I will post again when the push happens, and as always, you can check out our [twitter.com profile] dreamwidth account on Twitter to keep up with the code push!

mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-07-17 12:08

Code push in ~32 hours

Hi all!

I'd like to do a brief site maintenance and code push in about 32 hours. The scheduled time:

  • 2100 PDT Thursday
  • 0400 UTC Friday

This code push will involve a period of downtime. I'm estimating it will take about 10 minutes. During that window, I will be switching us from our old sb-db01 database master to our new sb-db05 cluster. I have to take the site offline for this since it's what we call a master failover, and our system isn't designed to do that without downtime.

After this maintenance, we will be completely on new hardware -- and all of the trusty hardware we've had for the past three years since moving to ServerBeach will be completely retired.

As always, there will be another post here as well as on our Twitter account when the time comes. Feel free to shoot me any questions or comments, I'll watch this post!

PS HELLO PLURKERS.

alierak: (ninja)

Loadbalancer tuning

I'm going to restart Varnish on the main loadbalancer at 8am CDT (13:00 UTC). Dreamwidth will be completely unavailable for a very brief time, and then maybe a little slower for a while as the cache gets refilled. But when it's done, some things should be faster than they are now. Since we have new servers with more RAM, it's time to use some of it to cache more user icons.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

(no subject)

We're getting multiple reports of crossposting to LiveJournal not working, with an error message of "Failed to crosspost entry to [username]@LiveJournal: Failed to connect to http://www.livejournal.com/interface/xmlrpc." We're looking into whether this is a problem on LJ's end or our end (and if it's our end, we'll do our best to fix it!)

EDIT: This problem has been fixed now!
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-07-01 17:17

Database maintenance

Hi all,

I did some database maintenance today -- moving our workers around! -- and this caused a glitch in the replication between our old databases and the new ones, so the new ones weren't getting all the updated data.

What this means to you: if you saw problems trying to update your access list or subscription filters, or with community invitations, or viewing support requests, that was caused by the glitch in replication. I'm really sorry for the inconvenience.

This particular issue won't recur, since it was caused by a very specific circumstance related to moving the workers around. Since I'm done moving them, the problem won't happen again.

A more technical explanation follows... )

mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-06-18 14:45

Planned load balancer failover shortly

Hi all,

As part of our new hardware project, I'm going to be failing us over to our new load balancers. This will involve a brief downtime for the site while everything fails over, but it should be less than 60 seconds.

Thanks for your patience, and sorry for the interruption!
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-06-17 10:59

Payments are back

The payment system is back online. It was my fault; I was moving it to our new hardware, but I didn't realize there is a code change that I have to make. (For the details curious, the underlying SSL module we use was upgraded, and it now requires you to add some more options when you use it.)

I have cleared out the pending queue of payments, so that we shouldn't have charged for anything in the past 24 hours, and that should mean there are no doubled (or more) payments. Please, of course, let us know if that's the case though, and we'll take care of it!

Sorry for the trouble!
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

Payment processing temporarily down

The backend system that runs payments is temporarily unavailable, and will be fixed as soon as possible. If you've tried to make a payment at any time between last night & now and gotten an endless wait, your payment is almost certainly in the queue to be processed as soon as the backend is back up & running -- you don't need to submit it again.

If you wind up getting multiple charges when it comes back up (for instance, if you re-submitted the form, thinking that your internet connection was to blame) you can open a support request (in the Account Payments category) after the payment is processed and I'll issue a refund to your card for the extra charges.

We're really sorry about the downtime!
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
[staff profile] mark2013-06-07 23:31

Code pushed

Hi all!

The code has been pushed. As always, please report problems here! We have lots of hands on deck and ready to jump on things that might be awry. Thanks!