mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
Mark Smith ([staff profile] mark) wrote in [site community profile] dw_maintenance2020-06-27 04:02 pm

Code push happening shortly!

Hi all,

We're doing a code push as mentioned here.

As usual, a code push includes some visual changes (continuing our efforts to modernize common workflows) and also includes some conversions of old pages to be generated in a different way with different markup. As always, we've tested the changes in most of the major browsers and use cases, but some things may break when they meet real-world usage and different technical setups. If we broke it, we'll fix it as soon as we can.

We will be watching this post for comments, so please let us know if you see anything that needs to be addressed and we'll take a look!

Known Issues

  • The spellchecker continues to go walkabout. This is intentional, we are deprecating it and removing it. Modern browsers have spellcheck that works in multiple languages and has actually been updated since 1995, so there's no reason for Dreamwidth to continue to try to maintain one.

Fixed Issues

  • FAQs had gone missing. Well, the answers. They're back now.
  • Replying to comments was incorrectly routing you back to the top level view, rather than the view of the thread you were on. This is fixed.

As always, please do keep an eye on our Twitter account [twitter.com profile] dreamwidth.

[personal profile] pinterface 2020-07-01 04:21 am (UTC)(link)

[personal profile] roadrunnertwice, you probably already recognize that, but juuust in case you don't: I'm pretty gosh-darned sure that's utf-8 being interpreted as ISO-8859-1 (Latin-1). Looking at my own Inbox, I see the same issue on non-ASCII characters such as em dashes (—).

Best guess is that there's something converting things into utf-8 which are already utf-8, and so characters end up double-encoded. Looking at it more closely, I see <br />s in my own comments where I put newlines, even though the comment had "Don't auto-format" checked, so maybe it's being run through the HTML cleaner twice?

Added: Following bug reports around, the utf-8 fixes itself under view=singleentry ("filter to this entry"), view=entrycomment, and view=unread for me as well, but the extra <br />s are still there, so those may be an unrelated issue.

Edited 2020-07-01 04:52 (UTC)
roadrunnertwice: Wrecked bicyclist. Dialogue: "I am fucking broken." (Bike - Fucking broken (Never as Bad))

[personal profile] roadrunnertwice 2020-07-01 05:55 am (UTC)(link)
Yeah — that Ð.

I've been tearing my hair out trying to repro this, and eventually managed to force some mojibake on my dev server by... well, never mind. The point is, after [personal profile] pritkiy_kaban's help with investigating the HTTP headers, I think the problem comes from the way the old LJ code tries to avoid ever invoking Perl's internal Unicode handling.

Basically, the site is accepting UTF-8 from the outside world, storing UTF-8 in the database, and outputting UTF-8 to the web... but it's never admitting as much to Perl, and tries to always treat text as a sequence of (mostly) opaque bytes. (In fact, it's not even telling MySQL it's storing UTF-8, so if I do a direct query for a unicode content on my dev server, I get absolute garbage.) There's a bunch of code to check for unicode validity and convert old 8-bit encodings from the database (which isn't relevant to us, but was to LJ), but it all kinda does it in the down-low.

The problem is, if you ever combine "just bytes" text with a string that has, at some point, confessed to being real unicode, the "just bytes" text gets deserialized as ISO-8859-1 so that it can also Be Real Unicode Characters, resulting in garbage (because it was UTF-8 all along and was trying to stealth through the system without ever getting decoded).

So, something in the chain is outputting a string that's marked as being UTF-8... but only under SOME circumstances, for SOME users, on SOME pages. And to fix the bug, someone's gotta figure out exactly what's doing that, and have it re-encode that text back to "just bytes" before passing it on. (That, or launch a multi-year inquisition to make the entire 20-year-old codebase unicode-aware.)

And no one working on the site has ever been able to repro the damn thing. UGH.

(Yes, I also think the br tag thing is unrelated. I'm interested in that, but it seems less urgent; also a big patch that interferes with that whole area of HTML-mangling just got merged, so it might act totally different after the next code deploy anyway.)

[personal profile] pinterface 2020-07-01 07:32 am (UTC)(link)

Oh man, yeah. You have my sympathies. I have no idea why it's doing it for me, but not you. Or maybe it's a production-not-dev situation?

Digging through the code via Github (ugggh, why didn't I just clone it? It would have been so much easier.), I don't see much difference between the codepaths, either. The only thing that's obviously different is the translation texts, but that shouldn't... you know what, let me try that... Ah, interesting! /inbox/?uselang=debug gets the encoding right, which would seem to suggest that one of the translation strings used by /inbox/, but not by /inbox/?view=…, is bringing in that pesky SvUTF8 flag. (But...why?)

As for the newlines, it looks like that might be coming from LJ::Event::JournalNewComment->content() doing $comment_body = LJ::html_newlines($comment_body); for some reason, though I'm too tired to think of what that reason might be.

roadrunnertwice: Yoshimori from Kekkaishi, with his beverage of choice. (Coffee milk (Kekkaishi))

[personal profile] roadrunnertwice 2020-07-01 04:42 pm (UTC)(link)
Or maybe it's a production-not-dev situation?

I can't even reproduce it on prod when I get unicode in my notifs! IDEK man. 😩

Ah, interesting! /inbox/?uselang=debug gets the encoding right

WHOA, now that IS interesting. 🤨 Good instinct. Not sure what to make of that yet, but it will likely be useful.

Edited (formatting) 2020-07-01 16:43 (UTC)
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2020-07-01 05:19 pm (UTC)(link)
I don't see any obvious WTF-8 in the strings for /inbox in prod, but if ?uselang=debug is not triggering the problem, that may be what's causing it. I'll go retype all the strings from scratch to make sure that's not it.
roadrunnertwice: Kim Pine wearing headphones, as someone hammers on her ceiling. (Music / racket (Scott Pilgrim))

[personal profile] roadrunnertwice 2020-07-01 05:48 pm (UTC)(link)
Noooo dooooon't 😱😂 (Unless that's somehow easy, but, sounds bad.)

I don't think the strings are the entire answer anyway, because again, it's *not triggering for everyone.*
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2020-07-01 05:52 pm (UTC)(link)
Nah, it took me about 10 minutes and 8 of them were "what the fuck is this arcane and unhelpful error and how do I fix it" (ia ia translation system fth'agn). I figured at worst it could rule out an obvious possible cause.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2020-07-01 05:34 pm (UTC)(link)
Can you check both /inbox/ and /inbox?uselang=debug now and see if me retyping all the strings fixed the problem?

[personal profile] pinterface 2020-07-01 06:37 pm (UTC)(link)

While I share [personal profile] roadrunnertwice's skepticism about that being a complete fix, it appears to have worked for me. My Inbox now looks fine both with and without uselang=debug!

denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2020-07-01 06:59 pm (UTC)(link)
.....I both love and hate that I appear to have been right. I'll check with [personal profile] pritkiy_kaban above, too.