mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
Mark Smith ([staff profile] mark) wrote in [site community profile] dw_maintenance2020-06-27 04:02 pm

Code push happening shortly!

Hi all,

We're doing a code push as mentioned here.

As usual, a code push includes some visual changes (continuing our efforts to modernize common workflows) and also includes some conversions of old pages to be generated in a different way with different markup. As always, we've tested the changes in most of the major browsers and use cases, but some things may break when they meet real-world usage and different technical setups. If we broke it, we'll fix it as soon as we can.

We will be watching this post for comments, so please let us know if you see anything that needs to be addressed and we'll take a look!

Known Issues

  • The spellchecker continues to go walkabout. This is intentional, we are deprecating it and removing it. Modern browsers have spellcheck that works in multiple languages and has actually been updated since 1995, so there's no reason for Dreamwidth to continue to try to maintain one.

Fixed Issues

  • FAQs had gone missing. Well, the answers. They're back now.
  • Replying to comments was incorrectly routing you back to the top level view, rather than the view of the thread you were on. This is fixed.

As always, please do keep an eye on our Twitter account [twitter.com profile] dreamwidth.

pritkiy_kaban: (Default)

[personal profile] pritkiy_kaban 2020-06-29 07:56 pm (UTC)(link)
>> I've subscribed to notifications on some Cyrillic posts

Would not it be easier if someone just dropped you a Cyrillic comment (it should show up in Inbox, I presume)?

E.g. ВОРОТ (rus.letters) should look equivalent to BOPOT (eng.letters).
roadrunnertwice: Yehuda biking in the rain. (Bike - Rain (Yehuda Moon))

[personal profile] roadrunnertwice 2020-06-29 08:59 pm (UTC)(link)
Ha, thanks! And ugh, and sure enough, those look exactly the same to me, on both inbox and front page! (I had to paste them into the character picker to be totally sure the first ones were Cyrillic. 😅)

I found like 1/4 of a clue in the code so far... will keep digging.

The fact that your browser shows it's using UTF-8 is real intriguing; I initially thought we were sending proper UTF-8 and the browser was switching into an old-style 8-bit encoding, but it looks like I had it backwards -- like, something can cause DW to send weird old 8-bit while telling the browser it's UTF-8. 🤨
roadrunnertwice: Ryoga from Ranma 1/2. Image text: "*Now* where the hell am I?" (Lost (Ryoga))

[personal profile] roadrunnertwice 2020-06-30 05:01 pm (UTC)(link)
OK, I have one more request, if you're not tired of this yet.

Could you please copy and paste some of those garbage characters from the front page for me? Along with the real Cyrillic they were supposed to represent?

If I can't reproduce the bug for real, maybe I can at least figure out which encoding fuckup would result in the same garbage...
pritkiy_kaban: (Default)

[personal profile] pritkiy_kaban 2020-06-30 08:03 pm (UTC)(link)
No problem at all, glad to be of help.

First, I was able to reproduce the bug, though in a weird way.

1) I've googled up this ticket https://www.dreamwidth.org/support/see_request?id=41787, and the https://www.dreamwidth.org/inbox/?view=sitenotices link did the trick for me (see screenshot: https://pritkiy-kaban.dreamwidth.org/file/25035.jpg).

2) The interesting part is that now, https://www.dreamwidth.org/ is displayed properly, while https://www.dreamwidth.org/inbox is not. I was able to reproduce it in Opera and FF.

3) Under FF Network tool, headers are slightly different.
The Home page (Cyrillic is properly displayed) is as follows: https://pritkiy-kaban.dreamwidth.org/file/25451.jpg
The Inbox page (Cyrillic appears garbled) is as follows: https://pritkiy-kaban.dreamwidth.org/file/25190.jpg


Here are some examples of garbled characters:

Ex.1)
Grabled text: Нет. В т.ч. через встроенный VPN Оперы.
Proper text: Нет. В т.ч. через встроенный VPN Оперы.

Ex.2)
Grabled text: Технический вопрос по DW
Proper text: ТЕХНИЧЕСКИЙ ВОПРОС ПО DW

[personal profile] pinterface 2020-07-01 04:21 am (UTC)(link)

[personal profile] roadrunnertwice, you probably already recognize that, but juuust in case you don't: I'm pretty gosh-darned sure that's utf-8 being interpreted as ISO-8859-1 (Latin-1). Looking at my own Inbox, I see the same issue on non-ASCII characters such as em dashes (—).

Best guess is that there's something converting things into utf-8 which are already utf-8, and so characters end up double-encoded. Looking at it more closely, I see <br />s in my own comments where I put newlines, even though the comment had "Don't auto-format" checked, so maybe it's being run through the HTML cleaner twice?

Added: Following bug reports around, the utf-8 fixes itself under view=singleentry ("filter to this entry"), view=entrycomment, and view=unread for me as well, but the extra <br />s are still there, so those may be an unrelated issue.

Edited 2020-07-01 04:52 (UTC)
roadrunnertwice: Wrecked bicyclist. Dialogue: "I am fucking broken." (Bike - Fucking broken (Never as Bad))

[personal profile] roadrunnertwice 2020-07-01 05:55 am (UTC)(link)
Yeah — that Ð.

I've been tearing my hair out trying to repro this, and eventually managed to force some mojibake on my dev server by... well, never mind. The point is, after [personal profile] pritkiy_kaban's help with investigating the HTTP headers, I think the problem comes from the way the old LJ code tries to avoid ever invoking Perl's internal Unicode handling.

Basically, the site is accepting UTF-8 from the outside world, storing UTF-8 in the database, and outputting UTF-8 to the web... but it's never admitting as much to Perl, and tries to always treat text as a sequence of (mostly) opaque bytes. (In fact, it's not even telling MySQL it's storing UTF-8, so if I do a direct query for a unicode content on my dev server, I get absolute garbage.) There's a bunch of code to check for unicode validity and convert old 8-bit encodings from the database (which isn't relevant to us, but was to LJ), but it all kinda does it in the down-low.

The problem is, if you ever combine "just bytes" text with a string that has, at some point, confessed to being real unicode, the "just bytes" text gets deserialized as ISO-8859-1 so that it can also Be Real Unicode Characters, resulting in garbage (because it was UTF-8 all along and was trying to stealth through the system without ever getting decoded).

So, something in the chain is outputting a string that's marked as being UTF-8... but only under SOME circumstances, for SOME users, on SOME pages. And to fix the bug, someone's gotta figure out exactly what's doing that, and have it re-encode that text back to "just bytes" before passing it on. (That, or launch a multi-year inquisition to make the entire 20-year-old codebase unicode-aware.)

And no one working on the site has ever been able to repro the damn thing. UGH.

(Yes, I also think the br tag thing is unrelated. I'm interested in that, but it seems less urgent; also a big patch that interferes with that whole area of HTML-mangling just got merged, so it might act totally different after the next code deploy anyway.)

[personal profile] pinterface 2020-07-01 07:32 am (UTC)(link)

Oh man, yeah. You have my sympathies. I have no idea why it's doing it for me, but not you. Or maybe it's a production-not-dev situation?

Digging through the code via Github (ugggh, why didn't I just clone it? It would have been so much easier.), I don't see much difference between the codepaths, either. The only thing that's obviously different is the translation texts, but that shouldn't... you know what, let me try that... Ah, interesting! /inbox/?uselang=debug gets the encoding right, which would seem to suggest that one of the translation strings used by /inbox/, but not by /inbox/?view=…, is bringing in that pesky SvUTF8 flag. (But...why?)

As for the newlines, it looks like that might be coming from LJ::Event::JournalNewComment->content() doing $comment_body = LJ::html_newlines($comment_body); for some reason, though I'm too tired to think of what that reason might be.

roadrunnertwice: Yoshimori from Kekkaishi, with his beverage of choice. (Coffee milk (Kekkaishi))

[personal profile] roadrunnertwice 2020-07-01 04:42 pm (UTC)(link)
Or maybe it's a production-not-dev situation?

I can't even reproduce it on prod when I get unicode in my notifs! IDEK man. 😩

Ah, interesting! /inbox/?uselang=debug gets the encoding right

WHOA, now that IS interesting. 🤨 Good instinct. Not sure what to make of that yet, but it will likely be useful.

Edited (formatting) 2020-07-01 16:43 (UTC)
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2020-07-01 05:19 pm (UTC)(link)
I don't see any obvious WTF-8 in the strings for /inbox in prod, but if ?uselang=debug is not triggering the problem, that may be what's causing it. I'll go retype all the strings from scratch to make sure that's not it.
roadrunnertwice: Kim Pine wearing headphones, as someone hammers on her ceiling. (Music / racket (Scott Pilgrim))

[personal profile] roadrunnertwice 2020-07-01 05:48 pm (UTC)(link)
Noooo dooooon't 😱😂 (Unless that's somehow easy, but, sounds bad.)

I don't think the strings are the entire answer anyway, because again, it's *not triggering for everyone.*
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2020-07-01 05:52 pm (UTC)(link)
Nah, it took me about 10 minutes and 8 of them were "what the fuck is this arcane and unhelpful error and how do I fix it" (ia ia translation system fth'agn). I figured at worst it could rule out an obvious possible cause.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2020-07-01 05:34 pm (UTC)(link)
Can you check both /inbox/ and /inbox?uselang=debug now and see if me retyping all the strings fixed the problem?

[personal profile] pinterface 2020-07-01 06:37 pm (UTC)(link)

While I share [personal profile] roadrunnertwice's skepticism about that being a complete fix, it appears to have worked for me. My Inbox now looks fine both with and without uselang=debug!

denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2020-07-01 06:59 pm (UTC)(link)
.....I both love and hate that I appear to have been right. I'll check with [personal profile] pritkiy_kaban above, too.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2020-07-01 07:00 pm (UTC)(link)
Can you try loading your inbox now and tell me if it's fixed?
pritkiy_kaban: (Default)

[personal profile] pritkiy_kaban 2020-07-01 08:02 pm (UTC)(link)
Hi,

https://www.dreamwidth.org/inbox/ is OK
https://www.dreamwidth.org/inbox/?view=sitenotices is not.

The latter gets loaded with Content-Type="text/html; charset=utf-8", while the former (good one) is just Content-Type="text/html"
pritkiy_kaban: (Default)

[personal profile] pritkiy_kaban 2020-07-03 09:46 am (UTC)(link)
BTW, tried it on different PC, under FF 77. Both homepage and dreamwidth.org/inbox has that trademark "text/html; charset=utf-8" in reply header, and non-ASCII UTF appears to be messed up.