Mark Smith (
mark) wrote in
dw_maintenance2011-04-24 10:18 am
![[staff profile]](https://www.dreamwidth.org/img/silk/identity/user_staff.png)
![[site community profile]](https://www.dreamwidth.org/img/comm_staff.png)
slowness/downtime
Hi all,
In the past 24 hours we've had some periods of slowness and outright downtime.
fu and I have managed to track it down to a single community that has a comment thread that causes our Apache workers to go into an infinite loop. After enough refreshes, all of our workers are dead, and the site no longer responds.
We're going to start looking into fixing the code so that this doesn't happen. Meanwhile, we're going to work with this community to make sure that they can't break the site while we get an actual, proper code fix working.
Thank you all for your patience as we sort this out.
In the past 24 hours we've had some periods of slowness and outright downtime.
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
We're going to start looking into fixing the code so that this doesn't happen. Meanwhile, we're going to work with this community to make sure that they can't break the site while we get an actual, proper code fix working.
Thank you all for your patience as we sort this out.
no subject
no subject
That goes for me, too!
no subject
no subject
Think of it like having plates at a dinner party. You can put food on the plate, serve someone, and they eat it. Then you wash the plate and you can re-use it. That's roughly how Dreamwidth works, except we cycle through plates very, very quickly so the site runs on a small number of them (30 or so).
(Plates are the metaphor for Apache workers. A single worker is a thing that processes an incoming requests and generates a response, one at a time.)
Anyway, the community we found causes the site to break in a certain way that means the plate fills up ... but never gets emptied. Eventually, we run out of plates, and the line of people who want to use the site backs up out of the door.
We figured out what was causing the plates not to empty and put a band-aid on it so everyone can still eat.
Okay. Darnit. Now I'm hungry.
no subject
First time I've ever heard of someone putting a band-aid on a full plate. *giggles*
Right. Dinner time...
no subject
no subject
no subject
Thanks for fixing this!
no subject
no subject
no subject
Thanks for letting us know what's up, though. You guys are made of awesome!
(frozen comment) no subject
(frozen comment) no subject
I'd understand this comment more if DW experienced significant outages (i.e. intermittent service for days at a time), but this is patently not the case. Also, the DW perps are fantastic at explaining issues as soon as they know what's going on. Please try and develop some real-world expectations, with the understanding that not one internet service is 100% available to all users all the time.
(frozen comment) no subject
(frozen comment) no subject
Let me just say here that I have been watching DW ever since its conception and launch, and I'm REALLY happy with what you guys have been doing, and that I am choosing to buy a seed account to support DW because I believe in the sort of transparency you bring. ♥
(frozen comment) no subject
Look at it my way. I announce to my LJ friends list that I am moving to DW, and half of my friends list says, "Oh, I'm sorry to hear that, I guess I won't be commenting on your blog as much as I used to." And then I get to DW, I pay $200 to buy 2000 points for a seed account on Sunday morning, and then BOOM, I get hit by the outages. What would you think in my shoes?
Also, I don't know about you, but my real-world expectations are _high_ when I am paying money. (I don't pay Google anything, and I use Google even more than I ever used LJ or DW, as they host all three of my e-mails and other cloud things.) If it does not work, the company hears from me. And if I cannot get to them, or if I talk and no one listens, I leave and I don't come back as a customer.
(frozen comment) no subject
(frozen comment) no subject
Anyway, everybody has different perspectives and opinions and expectations, and that's okay. We'll live up to some of them and fall down on others, and all we can do is hold to our guiding principles, let people know what's going on, and fix things when they break.
(frozen comment) no subject
(frozen comment) no subject
ETA: thought better of what I said. Bad day at work, tired, grumpy, can't afford the $200 but doing so anyway. Forget I said anything, and I'd like someone to please screen or freeze this thread so no one else jumps on me? Thanks.
(frozen comment) no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
^_^b
no subject
no subject
no subject
no subject
no subject
no subject
Thanks for doing a good job!
Best,
–Nic
no subject
I'm curious as too how a small comm manages to hog all the DW plates though. That seems like an odd one :D
no subject
no subject
It was a small RP community. They had gotten a post up to 1400 comments, which is really nothing, but they were arranged in a chain. Reply, reply, reply, reply. So instead of the normal view of things -- a wide tree -- we had a very, very deep tree.
The part of the code that checks to see if you are subscribed to a thread walks the tree from where you are up to the top to see if you're subscribed to any comment in the chain. Normally, this is simple -- you walk a few comments up and you're at the root and you're done.
In this case, it had to walk 1400 comments. The code was done in such a way that it had to do this frequently, and it caused an explosion of work on Apache, taking tons of memory and CPU. Ouch.
Aaaaaaaanyway. I know I didn't break that down into plate analogies, but I have to run. Maybe someone else can, if needed, and if not, I'll do it later if I remember. ^^;
no subject
So, whenever you and your friends go out to dinner, you all share your food by picking up your plates and handing them around the table so everyone can get a taste. When you've got four people at the table, that's no problem, you can make one circuit of the table really quickly and finish your meal and the plates can get back to the kitchen to get washed and reused.
But, say you've got a really big party. And everyone wants a taste of everyone's dinner still. So, you pass the plates around the table just like you're used to doing, but since there are so many of you, it takes the plate forever to make its way around, and by the time it's only halfway around your group, the restaurant says, "Um, guys, can you speed this up? We totally need the plates."
no subject
no subject
So there's this person building a staircase who needs to be served as well. One of the workers runs up a plate to the guy who is at the top of the staircase, going step by step up. It takes far longer for that plate to get there and back again than for someone in the main dining hall, and then it might not even get washed. And the staircase keeps increasing in length, but the guy on the staircase still needs to get served, so it takes longer and longer and the person doing the plate-running starts to get very tired out.
This is a situation for a bucket on a pulley.
no subject
no subject
no subject
no subject
no subject
no subject
no subject
Thank you for pouncing on it!
no subject
no subject
Thanks for keeping us informed!
ETA: Also seems Facebook is down (or is that just me)?
no subject
no subject
Though it takes more for downforeveryoneorjustme to register a problem.
Both work with other services; LJ looked terrible during the DDOS. You could see the drop followed by a climb back up and then another drop.
no subject
no subject
no subject
no subject
no subject
Well done for the rapid diagnosis and the temporary workaround!
no subject
no subject
Interesting, though. I'll have to wait and see what the fix looks like, to see what the heck was going on.
no subject
no subject
no subject
*hides head in hands*
I AM SO SO SORRY.
no subject
(ps: even though we've put in a fix, and the fix is live now, you still might want to break a thread every 100 or 200 comments just because.)
no subject
(Yeaaah, I'm getting that idea. /o\ I'm sorry!)
no subject
(Obviously it would've been better without the downtime. But still. *G*)
(We'll unsuspend the entry as soon as we're sure the problem is fixed!)
no subject
(I didn't even know there was any downtime! My internet was buggy yesterday so EVERYTHING was slow. ...unless I was causing my own internet problems. *facepaaalm* I totally was, wasn't I?)
(Yay! I'm sorry!)
no subject
You did not cause a problem; you FOUND one. Finding the bits that are goofy so that they can be fixed = a GOOD thing.
*hands over BugFinder button*
no subject
no subject
I count that as a net win. :)
Seriously! We do not blame you guys! We do not blame anybody. The problem was there, whether or not we knew about it. Whenever you're dealing with edge cases or people using the site in a way that is way outside the way that most other people use it, there's always the chance for bugs to crop up, because nobody has ever tested those particular combinations. It's why there are always bugs that don't show up until we push code live, because our users use the site in ways we never would have dreamed, much less tested.
So do not feel like a moron! Feel like somebody who just contributed to making DW better for everybody, because you totally did. :)
no subject
Uh. Good thing it's Three Weeks, so my shame and related Issues aren't allowed to drive me away from the site for a month or so to recover? I have obligations. ^^;;
And, uh. I was going to buy a seed account anyway, 'cause apart from this whole breaking it thing I do love this site like burning - but instead of waiting to see if I can nab one I'm gonna go buy the DW points now, so even if I don't get one you guys get my support.
no subject
no subject
And yeah, Three Weeks obligations, so. Around anyway.
no subject
In the IRC channels where many of the developers hang out (I am not a developer, but I make bad puns with many of them) there is a bot that lets the whole chat know when a new public bug has been filed in the bug-tracker. Occasionally, not as often these days because there aren't as many good ones, but sometimes, there is cheering, actual cheering, when a new bug is filed. This is because the bug getting filed means that there is enough information about a problem to formally enter it in the system, and because filing the bug is the first step to getting it fixed, and having it fixed means that no one else will ever be able to (deliberately or accidentally) do the same thing and cause the same problem ever again.
This is definitely a \o/ YAAAAAY!! \o/ bug.
I don't know if you've ever had the experience when writing fic, where there's something, and you're working on making it the best writing you can possibly manage, and you hand it off to a beta, and the beta points out something that screws everything up, but when you fix it, it is going to make things more awesome than you dreamed was possible? And you're a little sorry that you made a mistake, but you know no writer's perfect so it's not like you're the only one who's done something similar, and it's going to be a lot of work to fix, but once you fix it, it's going to be better than you had thought you could write, and it's exciting and the writing to fix it will be the fun kind of challenge? I would compare some bug report experiences to that. And in this analogy, you would be the beta who pointed it out, not the writer who made the mistake or the writer who's going to have to fix it.
no subject
Um. If you say so? And yeah, at least it was this weekend, not next. That would have been. Um. Bad. Very very bad.
I will try and take your words to heart!
no subject
Once before open beta, I managed to partially break the importer because I'd copied an email into my LJ and between one thing and another LJ told the importer that it was sending an email to import, so the entry page confused the importer a whole, whole lot.
And then the devs fixed it and everything was ok again!
no subject
no subject
no subject
no subject
i'm currently spending a lot of time playing the alpha2 release of FreeCol, and trying to find bugs. The obvious ones I don't need to report ("um, guys, why aren't my horses breeding/ That's kinda important"). So I'm playing with some batshit insane strategies to see if anything weird comes up that the Devs have never seen before. Thus far, I've found one thing.
The lead Dev has thanked me profusely (because, seriously, it took a lot of weird setup to find this thing, although I wasn't looking for that specifically I was just trying to break the game).
Now, it's fixed in trunk so when 0.10.beta comes out the game might actually be fully playable.
Dreamwidth is still in beta, we're still testing the servers, the code, the whatever.
You have found a bug. One they hadn't seen before, at all, one that shouldn't exist.
This is a Good Thing.
Because now thye can fix this bug, and possibly make sure that a) you can't do it again but more importantly, b) other people doing something similar but different don't do it again.
Congratulations, you have broken the site in a way they can fix. Really, that's a good thing to do.
no subject
no subject
♥
no subject
no subject
Don't get down on yourself! Now it's fixed and we can all play in the sunshine and rainbows (...and dinner parties). :D
no subject
no subject
no subject
no subject
no subject
no subject
no subject