Mark Smith (
mark) wrote in
dw_maintenance2019-12-02 11:50 am
Notifications slow -- but recovering
Hi all,
Due to some behind the scenes maintenance last night, our notifications system got delayed. I've fixed the issue now and it's working on catching up.
For details -- I've been experimenting with Kubernetes as a way to make managing production easier (and hopefully reduce costs!), but it turns out that one of our worker jobs that handles notifications doesn't use much CPU (it mostly spends time waiting on the database).
This caused the pod autoscaler to reduce the size of that particular deployment below what we needed to sustain throughput on our notifications service. The temporary fix is to pin that deployment size to something much larger, the better fix will be to integrate Kubernetes' pod autoscaler with the ability to monitor the queue depth on our task queue.
Sorry for the trouble, and thank you for the person who pinged us on Twitter. When I checked last night, everything was working, but as traffic came back up we fell behind and I wasn't watching anymore. My bad.
Due to some behind the scenes maintenance last night, our notifications system got delayed. I've fixed the issue now and it's working on catching up.
For details -- I've been experimenting with Kubernetes as a way to make managing production easier (and hopefully reduce costs!), but it turns out that one of our worker jobs that handles notifications doesn't use much CPU (it mostly spends time waiting on the database).
This caused the pod autoscaler to reduce the size of that particular deployment below what we needed to sustain throughput on our notifications service. The temporary fix is to pin that deployment size to something much larger, the better fix will be to integrate Kubernetes' pod autoscaler with the ability to monitor the queue depth on our task queue.
Sorry for the trouble, and thank you for the person who pinged us on Twitter. When I checked last night, everything was working, but as traffic came back up we fell behind and I wasn't watching anymore. My bad.

no subject
no subject
Since I'm tied to a coworking space for the day, I'll see about figuring out what's up this afternoon. I can't make any promises though, because that sounds wacky, but I'll try.
no subject
Is there a particular database bottleneck? (asks the DBA)
no subject
We get throughput on this particular job by running lots of copies. Which Kubernetes does by examining CPU usage ... but if your job isn't _capable_ of using lots of CPU, it tricks the autoscaler into getting rid of the jobs. Which ain't great.
no subject
no subject
no subject
Watchdog service
Then if your messaging queue is behind more than 2 hours - email notification to yourself (to devops).
Re: Watchdog service
Fool's hope, of course.
Re: Watchdog service
That makes me wonder what other Dreamwidth backend services may be silently falling behind now?
Are you sure that switch to Kubernetes was a right call?
If your bottlenecks are:
1) Database performance.
2) Complexity of monitoring your queues.
Then Kubernetes is, probably, not the right tool to address these issues, right?
Re: Watchdog service
Re: Watchdog service
Re: Watchdog service
Re: Watchdog service
Re: Watchdog service
Re: Watchdog service
Legacy vs Kubernetes
Re: Legacy vs Kubernetes
Re: Legacy vs Kubernetes
Re: Legacy vs Kubernetes
Amazon SES - misconfigured SPF record?
Re: Amazon SES - misconfigured SPF record?
Re: Amazon SES - misconfigured SPF record?
Re: Amazon SES - misconfigured SPF record?
Re: Amazon SES - misconfigured SPF record?
Re: Amazon SES - misconfigured SPF record?
Dreamwidth.org SPF record
Re: Dreamwidth.org SPF record
Re: Dreamwidth.org SPF record
Re: Dreamwidth.org SPF record
Re: Dreamwidth.org SPF record
Re: Dreamwidth.org SPF record
Re: Dreamwidth.org SPF record
Re: Dreamwidth.org SPF record
Processing notifications and email deliverability
Re: Processing notifications and email deliverability
Notification types
Sending emails
Re: Sending emails
Re: Sending emails
Re: Sending emails
Re: Sending emails
Re: Sending emails
Re: Sending emails
Re: Sending emails
Re: Sending emails
no subject
no subject
no subject
no subject
no subject
no subject
no subject
no subject
Kubernetes is brilliant when it works, though. We use it for our media streaming servers and it's ability to just self heal and self-manage is amazing once the set up is steady! I hope it continues to be everything you dreamed of.
no subject
Anyway, I had a lot of my journal layouts in a private community and the coding, put within textboxes, is now broken. It would seem that things like @ keyframes [without spaces; I just don't want to ping a person] has been turned into the raw username code for mentioning a user. Which means that now when I try to copypaste the layout codes featuring that, and presumably anything else using @ in it, I just get a completely broken layout I then have to go through and fix.
Is there a way to disable this and revert the changes without all of my entries needing manual fixing / being ruined?
no subject
Last I heard, they're working on a fix for the stuf in textboxes and such, so, speaking as just another DW-izen... hang on a bit still?
(no subject)
no subject
no subject
no subject
no subject