Mark Smith (
mark) wrote in
dw_maintenance2020-06-27 04:02 pm
![[staff profile]](https://www.dreamwidth.org/img/silk/identity/user_staff.png)
![[site community profile]](https://www.dreamwidth.org/img/comm_staff.png)
Code push happening shortly!
Hi all,
We're doing a code push as mentioned here.
As usual, a code push includes some visual changes (continuing our efforts to modernize common workflows) and also includes some conversions of old pages to be generated in a different way with different markup. As always, we've tested the changes in most of the major browsers and use cases, but some things may break when they meet real-world usage and different technical setups. If we broke it, we'll fix it as soon as we can.
We will be watching this post for comments, so please let us know if you see anything that needs to be addressed and we'll take a look!
Known Issues
- The spellchecker continues to go walkabout. This is intentional, we are deprecating it and removing it. Modern browsers have spellcheck that works in multiple languages and has actually been updated since 1995, so there's no reason for Dreamwidth to continue to try to maintain one.
Fixed Issues
- FAQs had gone missing. Well, the answers. They're back now.
- Replying to comments was incorrectly routing you back to the top level view, rather than the view of the thread you were on. This is fixed.
As always, please do keep an eye on our Twitter account dreamwidth.
no subject
Would not it be easier if someone just dropped you a Cyrillic comment (it should show up in Inbox, I presume)?
E.g. ВОРОТ (rus.letters) should look equivalent to BOPOT (eng.letters).
no subject
I found like 1/4 of a clue in the code so far... will keep digging.
The fact that your browser shows it's using UTF-8 is real intriguing; I initially thought we were sending proper UTF-8 and the browser was switching into an old-style 8-bit encoding, but it looks like I had it backwards -- like, something can cause DW to send weird old 8-bit while telling the browser it's UTF-8. 🤨
no subject
Could you please copy and paste some of those garbage characters from the front page for me? Along with the real Cyrillic they were supposed to represent?
If I can't reproduce the bug for real, maybe I can at least figure out which encoding fuckup would result in the same garbage...
no subject
First, I was able to reproduce the bug, though in a weird way.
1) I've googled up this ticket https://www.dreamwidth.org/support/see_request?id=41787, and the https://www.dreamwidth.org/inbox/?view=sitenotices link did the trick for me (see screenshot: https://pritkiy-kaban.dreamwidth.org/file/25035.jpg).
2) The interesting part is that now, https://www.dreamwidth.org/ is displayed properly, while https://www.dreamwidth.org/inbox is not. I was able to reproduce it in Opera and FF.
3) Under FF Network tool, headers are slightly different.
The Home page (Cyrillic is properly displayed) is as follows: https://pritkiy-kaban.dreamwidth.org/file/25451.jpg
The Inbox page (Cyrillic appears garbled) is as follows: https://pritkiy-kaban.dreamwidth.org/file/25190.jpg
Here are some examples of garbled characters:
Ex.1)
Grabled text: ÐеÑ. Ð Ñ.Ñ. ÑеÑез вÑÑÑоеннÑй VPN ÐпеÑÑ.
Proper text: Нет. В т.ч. через встроенный VPN Оперы.
Ex.2)
Grabled text: Ð¢ÐµÑ Ð½Ð¸ÑеÑкий вопÑÐ¾Ñ Ð¿Ð¾ DW
Proper text: ТЕХНИЧЕСКИЙ ВОПРОС ПО DW
no subject
Best guess is that there's something converting things into utf-8 which are already utf-8, and so characters end up double-encoded. Looking at it more closely, I see <br />s in my own comments where I put newlines, even though the comment had "Don't auto-format" checked, so maybe it's being run through the HTML cleaner twice?
Added: Following bug reports around, the utf-8 fixes itself under view=singleentry ("filter to this entry"), view=entrycomment, and view=unread for me as well, but the extra <br />s are still there, so those may be an unrelated issue.
no subject
I've been tearing my hair out trying to repro this, and eventually managed to force some mojibake on my dev server by... well, never mind. The point is, after
Basically, the site is accepting UTF-8 from the outside world, storing UTF-8 in the database, and outputting UTF-8 to the web... but it's never admitting as much to Perl, and tries to always treat text as a sequence of (mostly) opaque bytes. (In fact, it's not even telling MySQL it's storing UTF-8, so if I do a direct query for a unicode content on my dev server, I get absolute garbage.) There's a bunch of code to check for unicode validity and convert old 8-bit encodings from the database (which isn't relevant to us, but was to LJ), but it all kinda does it in the down-low.
The problem is, if you ever combine "just bytes" text with a string that has, at some point, confessed to being real unicode, the "just bytes" text gets deserialized as ISO-8859-1 so that it can also Be Real Unicode Characters, resulting in garbage (because it was UTF-8 all along and was trying to stealth through the system without ever getting decoded).
So, something in the chain is outputting a string that's marked as being UTF-8... but only under SOME circumstances, for SOME users, on SOME pages. And to fix the bug, someone's gotta figure out exactly what's doing that, and have it re-encode that text back to "just bytes" before passing it on. (That, or launch a multi-year inquisition to make the entire 20-year-old codebase unicode-aware.)
And no one working on the site has ever been able to repro the damn thing. UGH.
(Yes, I also think the br tag thing is unrelated. I'm interested in that, but it seems less urgent; also a big patch that interferes with that whole area of HTML-mangling just got merged, so it might act totally different after the next code deploy anyway.)
no subject
Oh man, yeah. You have my sympathies. I have no idea why it's doing it for me, but not you. Or maybe it's a production-not-dev situation?
Digging through the code via Github (ugggh, why didn't I just clone it? It would have been so much easier.), I don't see much difference between the codepaths, either. The only thing that's obviously different is the translation texts, but that shouldn't... you know what, let me try that... Ah, interesting! /inbox/?uselang=debug gets the encoding right, which would seem to suggest that one of the translation strings used by /inbox/, but not by /inbox/?view=…, is bringing in that pesky SvUTF8 flag. (But...why?)
As for the newlines, it looks like that might be coming from LJ::Event::JournalNewComment->content() doing $comment_body = LJ::html_newlines($comment_body); for some reason, though I'm too tired to think of what that reason might be.
no subject
I can't even reproduce it on prod when I get unicode in my notifs! IDEK man. 😩
WHOA, now that IS interesting. 🤨 Good instinct. Not sure what to make of that yet, but it will likely be useful.
no subject
no subject
I don't think the strings are the entire answer anyway, because again, it's *not triggering for everyone.*
no subject
no subject
no subject
While I share
roadrunnertwice's skepticism about that being a complete fix, it appears to have worked for me. My Inbox now looks fine both with and without uselang=debug!
no subject
no subject
no subject
https://www.dreamwidth.org/inbox/ is OK
https://www.dreamwidth.org/inbox/?view=sitenotices is not.
The latter gets loaded with Content-Type="text/html; charset=utf-8", while the former (good one) is just Content-Type="text/html"
no subject