mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
Mark Smith ([staff profile] mark) wrote in [site community profile] dw_maintenance2019-12-02 11:50 am

Notifications slow -- but recovering

Hi all,

Due to some behind the scenes maintenance last night, our notifications system got delayed. I've fixed the issue now and it's working on catching up.

For details -- I've been experimenting with Kubernetes as a way to make managing production easier (and hopefully reduce costs!), but it turns out that one of our worker jobs that handles notifications doesn't use much CPU (it mostly spends time waiting on the database).

This caused the pod autoscaler to reduce the size of that particular deployment below what we needed to sustain throughput on our notifications service. The temporary fix is to pin that deployment size to something much larger, the better fix will be to integrate Kubernetes' pod autoscaler with the ability to monitor the queue depth on our task queue.

Sorry for the trouble, and thank you for the person who pinged us on Twitter. When I checked last night, everything was working, but as traffic came back up we fell behind and I wasn't watching anymore. My bad.
ninetydegrees: Art: self-portrait (Default)

[personal profile] ninetydegrees 2019-12-02 08:04 pm (UTC)(link)
I hate to be that user but I stopped getting notifications (email and inbox) almost 2 years ago and never got any reply to my request. Is there *anything* I can do on my side to solve this issue?
mildred_of_midgard: (Default)

[personal profile] mildred_of_midgard 2019-12-02 08:08 pm (UTC)(link)
Glad you got it catching up!

Is there a particular database bottleneck? (asks the DBA)
mildred_of_midgard: (Default)

[personal profile] mildred_of_midgard 2019-12-02 09:55 pm (UTC)(link)
Makes sense!
siderea: (Default)

[personal profile] siderea 2019-12-02 08:37 pm (UTC)(link)
Oh thank goodness - I thought it was on my end and was getting a bit panicky that my email service (which I also use for business email) was bouncing incoming email (again!) Thanks for letting us know, very appreciated!
kore: (Default)

[personal profile] kore 2019-12-02 08:46 pm (UTC)(link)
Thank you for keeping us all in the loop!
dennisgorelik: 2020-06-13 in my home office (Default)

Watchdog service

[personal profile] dennisgorelik 2019-12-02 08:48 pm (UTC)(link)
Could you add a "watchdog" service that will check every hour how far behind your messages sending queue is?
Then if your messaging queue is behind more than 2 hours - email notification to yourself (to devops).
dennisgorelik: 2020-06-13 in my home office (Default)

Re: Watchdog service

[personal profile] dennisgorelik 2019-12-02 10:50 pm (UTC)(link)
> I hadn't ported it over to Kubernetes yet

That makes me wonder what other Dreamwidth backend services may be silently falling behind now?

Are you sure that switch to Kubernetes was a right call?
If your bottlenecks are:
1) Database performance.
2) Complexity of monitoring your queues.
Then Kubernetes is, probably, not the right tool to address these issues, right?

Re: Watchdog service

[staff profile] denise - 2019-12-02 23:28 (UTC) - Expand

Re: Watchdog service

[personal profile] metahacker - 2019-12-02 23:37 (UTC) - Expand

Re: Watchdog service

[staff profile] denise - 2019-12-02 23:53 (UTC) - Expand

Re: Watchdog service

[personal profile] nonelvis - 2019-12-03 00:02 (UTC) - Expand

Re: Watchdog service

[personal profile] ilyena_sylph - 2019-12-03 00:19 (UTC) - Expand

Re: Watchdog service

[personal profile] metahacker - 2019-12-06 03:08 (UTC) - Expand

Legacy vs Kubernetes

[personal profile] dennisgorelik - 2019-12-03 02:46 (UTC) - Expand

Re: Legacy vs Kubernetes

[personal profile] dennisgorelik - 2019-12-03 18:10 (UTC) - Expand

Re: Legacy vs Kubernetes

[personal profile] sporky_rat - 2019-12-08 00:54 (UTC) - Expand

Dreamwidth.org SPF record

[personal profile] dennisgorelik - 2019-12-09 15:50 (UTC) - Expand

Re: Dreamwidth.org SPF record

[personal profile] alierak - 2019-12-09 16:47 (UTC) - Expand

Re: Dreamwidth.org SPF record

[personal profile] alierak - 2019-12-09 19:26 (UTC) - Expand

Re: Dreamwidth.org SPF record

[personal profile] alierak - 2019-12-09 20:10 (UTC) - Expand

Re: Dreamwidth.org SPF record

[personal profile] alierak - 2019-12-09 20:22 (UTC) - Expand

Notification types

[personal profile] dennisgorelik - 2019-12-09 18:35 (UTC) - Expand

Sending emails

[personal profile] dennisgorelik - 2019-12-08 11:23 (UTC) - Expand

Re: Sending emails

[personal profile] sporky_rat - 2019-12-09 02:18 (UTC) - Expand

Re: Sending emails

[personal profile] madgastronomer - 2019-12-09 02:51 (UTC) - Expand

Re: Sending emails

[personal profile] dennisgorelik - 2019-12-09 03:46 (UTC) - Expand

Re: Sending emails

[personal profile] kore - 2019-12-12 15:47 (UTC) - Expand

Re: Sending emails

[personal profile] ilyena_sylph - 2019-12-09 14:30 (UTC) - Expand

Re: Sending emails

[personal profile] dennisgorelik - 2019-12-09 15:09 (UTC) - Expand

Re: Sending emails

[personal profile] ilyena_sylph - 2019-12-09 17:10 (UTC) - Expand

Re: Sending emails

[personal profile] dennisgorelik - 2019-12-09 19:13 (UTC) - Expand
trobadora: (Default)

[personal profile] trobadora 2019-12-02 08:55 pm (UTC)(link)
Thank you, so glad to hear it's catching up!

[personal profile] justice 2019-12-02 09:48 pm (UTC)(link)
I'm glad someone reached out to you on Twitter because I tried the support@dreamwidth.org method, and it turns out it just goes into the support queue. The Twitter said not to reach out to you there - but is it what you'd prefer in scenarios like this one where even the site notifications wouldn't be reaching you?
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2019-12-02 09:53 pm (UTC)(link)
It's fine to ping us on Twitter for a problem or outage that's affecting a bunch of people! It's more that we can't do individual troubleshooting in 280 characters for specific problems only affecting one person.

[personal profile] justice 2019-12-02 10:16 pm (UTC)(link)
Thanks! I'll keep that in mind for the future.
sovay: (Rotwang)

[personal profile] sovay 2019-12-02 09:51 pm (UTC)(link)
Thanks for the explanation!
niqaeli: cat with arizona flag in the background (Default)

[personal profile] niqaeli 2019-12-02 10:14 pm (UTC)(link)
Truly, one of the many best things about DW is that I can rely on y'all to consistently and without bullshit cop to whatever caused the last major problem! Even when it's "yeah, so today I have custody of the commit-and-ditch pony." <3
schematise: (5)

[personal profile] schematise 2019-12-02 11:06 pm (UTC)(link)
Bad pod autoscaler!

Kubernetes is brilliant when it works, though. We use it for our media streaming servers and it's ability to just self heal and self-manage is amazing once the set up is steady! I hope it continues to be everything you dreamed of.
devilbear: Markiplier with bright red hair is in the process of falling. The word "HECK" indicates his reaction to the situation. (Heck!)

[personal profile] devilbear 2019-12-03 12:06 am (UTC)(link)
Hi! I don't know if this is the right place to put this, since I basically rarely use this site yet. I'm assuming that it's maintenance related because this definitely didn't used to be a function of the site...?

Anyway, I had a lot of my journal layouts in a private community and the coding, put within textboxes, is now broken. It would seem that things like @ keyframes [without spaces; I just don't want to ping a person] has been turned into the raw username code for mentioning a user. Which means that now when I try to copypaste the layout codes featuring that, and presumably anything else using @ in it, I just get a completely broken layout I then have to go through and fix.

Is there a way to disable this and revert the changes without all of my entries needing manual fixing / being ruined?
ilyena_sylph: picture of Labyrinth!faerie with 'careful, i bite' as text (Default)

[personal profile] ilyena_sylph 2019-12-03 01:24 am (UTC)(link)
Hey, this is from a few months ago when Mark added an "/@username" feature in the Markdown step.

Last I heard, they're working on a fix for the stuf in textboxes and such, so, speaking as just another DW-izen... hang on a bit still?
Edited 2019-12-03 01:24 (UTC)

(no subject)

[personal profile] devilbear - 2019-12-03 01:28 (UTC) - Expand
kalloway: A close-up of Rocbouquet from Romacing SaGa 2 (Default)

[personal profile] kalloway 2019-12-03 01:24 am (UTC)(link)
I'd supposed that nobody wanted to go to work this morning, especially my tracking notifs. Thank you for the fix and transparency. ^_^
brickhousewench: (Tina Tech Writer)

[personal profile] brickhousewench 2019-12-04 04:21 am (UTC)(link)
I'm starting to learn about Kubernetes at my job, so it's kinda cool to know that my blogging platform uses it. (Also cool whenever I read nerdy stuff and understand what people are talking about. Go me!)
darjeeling: (ANIM | from whence we came)

[personal profile] darjeeling 2019-12-06 03:28 pm (UTC)(link)
Just an update as of 12/06, some notifications are still missing. It seems to only be for tracked things... like replies to your comments are coming through immediately as normal, but things like new posts in tracked comms, or replies to tracked posts, are not appearing at all.
(reply from suspended user)