dw_maintenance | Code push happening shortly!

Code push happening shortly!

Hi all,

We're doing a code push as mentioned here.

As usual, a code push includes some visual changes (continuing our efforts to modernize common workflows) and also includes some conversions of old pages to be generated in a different way with different markup. As always, we've tested the changes in most of the major browsers and use cases, but some things may break when they meet real-world usage and different technical setups. If we broke it, we'll fix it as soon as we can.

We will be watching this post for comments, so please let us know if you see anything that needs to be addressed and we'll take a look!

Known Issues

The spellchecker continues to go walkabout. This is intentional, we are deprecating it and removing it. Modern browsers have spellcheck that works in multiple languages and has actually been updated since 1995, so there's no reason for Dreamwidth to continue to try to maintain one.

Fixed Issues

FAQs had gone missing. Well, the answers. They're back now.
Replying to comments was incorrectly routing you back to the top level view, rather than the view of the thread you were on. This is fixed.

As always, please do keep an eye on our Twitter account dreamwidth.

Flat | Top-Level Comments Only

>> I've subscribed to notifications on some Cyrillic posts

Would not it be easier if someone just dropped you a Cyrillic comment (it should show up in Inbox, I presume)?

E.g. ВОРОТ (rus.letters) should look equivalent to BOPOT (eng.letters).

Ha, thanks! And ugh, and sure enough, those look exactly the same to me, on both inbox and front page! (I had to paste them into the character picker to be totally sure the first ones were Cyrillic. 😅)

I found like 1/4 of a clue in the code so far... will keep digging.

The fact that your browser shows it's using UTF-8 is real intriguing; I initially thought we were sending proper UTF-8 and the browser was switching into an old-style 8-bit encoding, but it looks like I had it backwards -- like, something can cause DW to send weird old 8-bit while telling the browser it's UTF-8. 🤨

OK, I have one more request, if you're not tired of this yet.

Could you please copy and paste some of those garbage characters from the front page for me? Along with the real Cyrillic they were supposed to represent?

If I can't reproduce the bug for real, maybe I can at least figure out which encoding fuckup would result in the same garbage...

No problem at all, glad to be of help.

First, I was able to reproduce the bug, though in a weird way.

1) I've googled up this ticket https://www.dreamwidth.org/support/see_request?id=41787, and the https://www.dreamwidth.org/inbox/?view=sitenotices link did the trick for me (see screenshot: https://pritkiy-kaban.dreamwidth.org/file/25035.jpg).

2) The interesting part is that now, https://www.dreamwidth.org/ is displayed properly, while https://www.dreamwidth.org/inbox is not. I was able to reproduce it in Opera and FF.

3) Under FF Network tool, headers are slightly different.
The Home page (Cyrillic is properly displayed) is as follows: https://pritkiy-kaban.dreamwidth.org/file/25451.jpg
The Inbox page (Cyrillic appears garbled) is as follows: https://pritkiy-kaban.dreamwidth.org/file/25190.jpg

Here are some examples of garbled characters:

Ex.1)
Grabled text: ÐÐµÑ. Ð Ñ.Ñ. ÑÐµÑÐµÐ· Ð²ÑÑÑÐ¾ÐµÐ½Ð½ÑÐ¹ VPN ÐÐ¿ÐµÑÑ.
Proper text: Нет. В т.ч. через встроенный VPN Оперы.

Ex.2)
Grabled text: Ð¢ÐµÑÐ½Ð¸ÑÐµÑÐºÐ¸Ð¹ Ð²Ð¾Ð¿ÑÐ¾Ñ Ð¿Ð¾ DW
Proper text: ТЕХНИЧЕСКИЙ ВОПРОС ПО DW

roadrunnertwice, you probably already recognize that, but juuust in case you don't: I'm pretty gosh-darned sure that's utf-8 being interpreted as ISO-8859-1 (Latin-1). Looking at my own Inbox, I see the same issue on non-ASCII characters such as em dashes (—).

Best guess is that there's something converting things into utf-8 which are already utf-8, and so characters end up double-encoded. Looking at it more closely, I see <br />s in my own comments where I put newlines, even though the comment had "Don't auto-format" checked, so maybe it's being run through the HTML cleaner twice?

Added: Following bug reports around, the utf-8 fixes itself under view=singleentry ("filter to this entry"), view=entrycomment, and view=unread for me as well, but the extra <br />s are still there, so those may be an unrelated issue.

Edited 2020-07-01 04:52 (UTC)

Yeah — that Ð.

I've been tearing my hair out trying to repro this, and eventually managed to force some mojibake on my dev server by... well, never mind. The point is, after

pritkiy_kaban's help with investigating the HTTP headers, I think the problem comes from the way the old LJ code tries to avoid ever invoking Perl's internal Unicode handling.

Basically, the site is accepting UTF-8 from the outside world, storing UTF-8 in the database, and outputting UTF-8 to the web... but it's never admitting as much to Perl, and tries to always treat text as a sequence of (mostly) opaque bytes. (In fact, it's not even telling MySQL it's storing UTF-8, so if I do a direct query for a unicode content on my dev server, I get absolute garbage.) There's a bunch of code to check for unicode validity and convert old 8-bit encodings from the database (which isn't relevant to us, but was to LJ), but it all kinda does it in the down-low.

The problem is, if you ever combine "just bytes" text with a string that has, at some point, confessed to being real unicode, the "just bytes" text gets deserialized as ISO-8859-1 so that it can also Be Real Unicode Characters, resulting in garbage (because it was UTF-8 all along and was trying to stealth through the system without ever getting decoded).

So, something in the chain is outputting a string that's marked as being UTF-8... but only under SOME circumstances, for SOME users, on SOME pages. And to fix the bug, someone's gotta figure out exactly what's doing that, and have it re-encode that text back to "just bytes" before passing it on. (That, or launch a multi-year inquisition to make the entire 20-year-old codebase unicode-aware.)

And no one working on the site has ever been able to repro the damn thing. UGH.

(Yes, I also think the br tag thing is unrelated. I'm interested in that, but it seems less urgent; also a big patch that interferes with that whole area of HTML-mangling just got merged, so it might act totally different after the next code deploy anyway.)

Oh man, yeah. You have my sympathies. I have no idea why it's doing it for me, but not you. Or maybe it's a production-not-dev situation?

Digging through the code via Github (ugggh, why didn't I just clone it? It would have been so much easier.), I don't see much difference between the codepaths, either. The only thing that's obviously different is the translation texts, but that shouldn't... you know what, let me try that... Ah, interesting! /inbox/?uselang=debug gets the encoding right, which would seem to suggest that one of the translation strings used by /inbox/, but not by /inbox/?view=…, is bringing in that pesky SvUTF8 flag. (But...why?)

As for the newlines, it looks like that might be coming from LJ::Event::JournalNewComment->content() doing $comment_body = LJ::html_newlines($comment_body); for some reason, though I'm too tired to think of what that reason might be.

Or maybe it's a production-not-dev situation?

I can't even reproduce it on prod when I get unicode in my notifs! IDEK man. 😩

Ah, interesting! /inbox/?uselang=debug gets the encoding right

WHOA, now that IS interesting. 🤨 Good instinct. Not sure what to make of that yet, but it will likely be useful.

Edited (formatting) 2020-07-01 16:43 (UTC)

I don't see any obvious WTF-8 in the strings for /inbox in prod, but if ?uselang=debug is not triggering the problem, that may be what's causing it. I'll go retype all the strings from scratch to make sure that's not it.

Noooo dooooon't 😱😂 (Unless that's somehow easy, but, sounds bad.)

I don't think the strings are the entire answer anyway, because again, it's *not triggering for everyone.*

Nah, it took me about 10 minutes and 8 of them were "what the fuck is this arcane and unhelpful error and how do I fix it" (ia ia translation system fth'agn). I figured at worst it could rule out an obvious possible cause.

Can you check both /inbox/ and /inbox?uselang=debug now and see if me retyping all the strings fixed the problem?

While I share roadrunnertwice's skepticism about that being a complete fix, it appears to have worked for me. My Inbox now looks fine both with and without uselang=debug!

.....I both love and hate that I appear to have been right. I'll check with

pritkiy_kaban above, too.

Can you try loading your inbox now and tell me if it's fixed?

Hi,

https://www.dreamwidth.org/inbox/ is OK
https://www.dreamwidth.org/inbox/?view=sitenotices is not.

The latter gets loaded with Content-Type="text/html; charset=utf-8", while the former (good one) is just Content-Type="text/html"

BTW, tried it on different PC, under FF 77. Both homepage and dreamwidth.org/inbox has that trademark "text/html; charset=utf-8" in reply header, and non-ASCII UTF appears to be messed up.

Flat | Top-Level Comments Only

Code push happening shortly!

Known Issues

Fixed Issues

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject