tinthedev 14 hours ago

Hah, I was about to criticise the text for far too lightly conflating markup and punctuation, just to see the afterword.

I actually do think the author has a point, in that must solutions today are inelegant, I also don't think this is a problem which has a real elegant solution. Where to draw the line? Why not encode fonts into the standard too, if we're doing bold? Etc.

I'm still mostly in favour of keeping everything markdown (in my own writing), however much it pollutes the "purity" of text.

  • astrobe_ 2 hours ago

    Yes, it's not markup but typesetting [1]. Well before 2013 people used to use stars, _underscores_ or /slashes/ in Usenet forums or mailing lists to mimic typesetting, which lead to Markdown.

    The name still maintains the confusion as it tries to be an alternative to markup systems such as HTML which had the purpose to introduce semantic clues for computers.

    We all know how it went; the semantic part was entirely thrown away and markup was thoroughly abused for layout (HTML tables before CSS - CSS which also has little to do with "style" and more to do with typesetting and layout), as no browser today can just show a table of contents based on the HTML title tags.

    [1] https://en.wikipedia.org/wiki/Typesetting

II2II 12 hours ago

You pretty much need to use markup (or control codes) for rich text. Take bold, italic, underline, strikeout: those four can, and are, used in nearly any combination. You would need one bit for each of them. You would need two bits to specify four levels of headings. If you don't allow for that, you are back to using markup. You would also need one bit to specify proportional/fixed width font, because that is a thing too. That remaining bit would have to be used for superscript, since superscripts are commonly used for footnotes and simple mathematical expressions.

Okay, you can now create passable rich text documents for a limited (though common) range of purposes with that 8/24-bit breakdown that was suggested. But you may have noticed the author mentioned subscripts, which wasn't in my list. Well, it turns out that subscript and superscript have a terribly limited range of applications if you are specifying them per character: x^2^2 would be visually identical to x^22, and x^a_b would look different from x_b^a (with both presentations being nonsensical). The use of subscripts and superscripts in any technical applications would be severely limited. You need a much richer markup language to be truly expressive. So there really isn't much of a point in offering subscripts. Superscripts, sure, because they have a few non-technical uses.

Yet the reality is that people want a much richer set of formatting options. At a minimum, they want to select fonts and font sizes. Some of the formatting options have semantics. I know I crammed four levels of headings in those eight bits, but that only makes sense in headings. It doesn't make sense to specify it per character. Then there are other common document elements, like tables. You can create decent tables using monospaced fonts, but that is limiting and would produce undesirable results in some cases (try displaying April 5^th sensibly, using a monospace font so that it won't affect the width of the columns). On top of that, you are ditching the concept of styles because that implies some sort of markup.

  • mmooss 7 hours ago

    Also, different languages have different formatting varieties. 256 combinations doesn't seem like nearly enough.

    Note that is 256 combinations. If you want both bold and italics, either it's one of the 256 combinations, separate from the bold-only combination and from the italics-only combination, or you need another 8 bits for each option.

    I think HN made a very aesthetically pleasing decision to exclude bold and underline. Imagine the appearance of comment pages if those were options.

ht_th 6 hours ago

The odd thing is, you can do quite some bold/italics/superscript in Unicode nowadays. Because, at least from the ASCII letter range, they have been used in symbolic ways in Mathematics, etc., and have been added to Unicode as symbols rather than bold variants of letters. For example:

, !

, !

ᴴᵉˡˡᵒ, ᵂᵒʳˡᵈ!

So, there's almost no bold/italic punctuation. And non-ASCII Unicode letters aren't "supported" this way either. But you can get quite far with "formatted" ASCII letters in Unicode, if you're so inclined.

  • ht_th 6 hours ago

    Of course, hackernews or the font it uses (?), doesn't seem to support the bold and italics Unicode symbols. Although it does seem to support the supperscript ones.

    • Tomte 2 hours ago

      body { font-family:Verdana, Geneva, sans-serif; font-size:10pt; color:#828282; }

      td { font-family:Verdana, Geneva, sans-serif; font-size:10pt; color:#828282; }

timeflex 12 hours ago

Sad what things like Markdown has done to people. It's like they forgot about all the amazing semantic markup of HTML 5 to create strong relations between their data. I'll take a Lexical editor with SQLite to store my data any day.

  • kstrauser 12 hours ago

    I don't think it’s that so much as that all that extra context is overkill in lots of situations. If I'm writing a blog post or a Slack message or my own internal-use note, I probably just want some lightweight formatting. Making rich semantic connections wouldn't have a good payoff for the extra work in those cases.

  • HankB99 8 hours ago

    > I'll take a Lexical editor with SQLite to store my data any day.

    Do you have tools that do this or an example?

    I'm pretty happy with Markdown and mkdocs (on Linux) to manage and format my notes. VS Code does a pretty good job with this providing both a preview and facilitating linking between documents (both file and heading links.) I'm always open to something better.

hello_computer 14 hours ago

This person is confused. He's citing a Ted Nelson paper about separating these things into layers (content, structure, & special effects), while personally advocating that we mash it all into unicode.

https://www.xml.com/pub/a/w3j/s3.nelson.html

  • LegionMammal978 10 hours ago

    Nelson's arguments sound odd to me. He says that embedded markup is bad for WYSIWYG editors since they have to maintain a connection between the raw and formatted text streams (which can have different character counts, etc.), but out-of-line styling would similarly need careful implementation work to keep it synchronized with the text stream at all times, even with concurrent editing and other such features.

    (Cf. how the cross-reference stream in PDF files makes it painful to edit objects in them, even when the files are nominally encoded in plaintext.)

    He then goes into how a separate styling layer can assist with transcluding text from other people's work while modifying the style. But style variations are hardly the only legitimate changes typically made to direct quotations: people often want to modify capitalization or punctuation, elide portions, or insert bracketed notes. And at that point, you're modifying the content as well as the styling, so style-only modifications would be very limiting for that use case.

    As for the structure layer, this would have the same issues as every other attempt in the last three decades to create a semantic web or whatever. Authors don't want to spend their time carefully curating metadata that 99.9% of readers won't care about, while bad actors want to game their relevancy metrics through any mechanism available.

    • hello_computer 8 hours ago

      I think anyone who has done the work quickly realizes all of that (i.e. no point kicking Ted while he’s down). Just thought it odd that the article is citing Ted to endorse the anti-Ted.

AlienRobot 14 hours ago

People are limited by their tools.

The author believes that plain text should encode bold, italic, etc., because that's all they had exposure to. Were the text written today, they would claim emojis belong in unicode as well.

Most social media don't support it, but on Tumblr, for example, you can specify the color of the text and even choose a different font. I think there was some other social media that allowed you to have animated effects on the text as well, but I forgot the name.

  • tomxor 14 hours ago

    > Were the text written today, they would claim emojis belong in unicode as well.

    Not sure what you mean, unicode does contain emojis. That's what most platform use for emojis now,

    • nextos 13 hours ago

      Yes, Unicode even defines characters for subindex and superindex. It's quite capable for basic inline math equations.

      • mmooss 7 hours ago

        Weren't many of those formatting codes - maybe not sub/superindex? - deprecated (but preserved for backward compatibility)?

    • AlienRobot 11 hours ago

      But should it contain emoji? I can copy and paste bold text from one rich text editor to another just fine. Why not use XML to encode emoji?

      • rhet0rica 6 hours ago

        <emoticon type="graphical" value="PILE OF POO" entity="&x1F4A9;" fallback="" fallback-encoding="utf-8" />