StrictMark: Markdown, Refactored

chipotle_coyote · on Jan 29, 2021

From the spec:

> No double symbols, i.e. strong not *strong*.

I understand the rationale. But, this is immediately no longer Markdown. "It's 'Markdown, Refactored!'" If "refactored" means "incompatible," sure? I guess? But still: incompatible.

"So what?" Good question, imaginary interlocutor. Here's so what: one of Markdown's defining features, love or it hate it, is that it deliberately doesn't make you choose between emphasizing with * characters or emphasizing with _ characters, other than being consistent with the start and end. Use a single * or _ character, get emphasis/italics; use double, get strong/bold; you don't get underlines because you don't use those in typography.

The same is true for lists: only "-" is acceptable, not "+" or "". Fine, but there are a lot of Markdown documents that are out there that use "" as the list character, and I am sure there is at least one weirdo inexplicably using "+".

So StrictMark breaks every Markdown document that uses boldface, and probably about half the Markdown documents that use italics. (I don't know about other Markdown users, but over the years I have not been consistent with how I markup italics, for a variety of reasons.) And, it probably breaks about half the unordered lists in the wild, too.

I understand that this is being done for well-intentioned and noble reasons, but: it ain't Markdown. It's a new, incompatible plaintext markup system. This creates a bit of an uncomfortable point: if I'm going to have to redo all my documents (or just do things The New Way going forward), then why not switch to AsciiDoc or Textile?

gritzko · on Jan 29, 2021

StrictMark breaks nothing. The arrow is one way: Markdown code mostly supports StrictMark. One may say that "mostly" is bad, but an arbitrary Markdown implementation mostly supports an arbitrary Markdown document, so nothing new here.

For ~full MarkDown support there is CommonMark code.

splatcollision · on Jan 29, 2021

Markdown was engineered to be flexible _on purpose_ because it was designed for writers and readers, not programmers.

Writers are less likely to care about what symbol means "bulleted list item", as long as anything that makes sense for it, just works.

It's a mistake to think of Markdown as a programming language, or a "formal grammar".

That being said, if it works for you go for it...

nine_k · on Jan 29, 2021

If you want to render Markdown in a way other than plain text, you need formal rules of parsing it. No matter how informal you want Markdown to be, if you want e.g. a nicely rendered Readme on Githiub, you have to follow the formal rules.

ufo · on Jan 29, 2021

One downside of markdown being so flexible is that it's hard to remember if a given dialect implements a certain feature, or how it will handle a certain corner case

I think I see the appeal of specifying a robust subset of markdown, something that you can trust will probably be parsed the same way for as many dialects as possible.

FriedrichN · on Jan 29, 2021

I totally agree with this. It's quite annoying you never quite know what dialect some website is using, they never mention, because it seems very few of these dialects have names.

What also annoys me is that they didn't base it off Wiki formatting. It would've been great if it had at least some compatibility for basic stuff like headings and links.

gritzko · on Jan 29, 2021

I have my agenda https://news.ycombinator.com/item?id=25961188

legulere · on Jan 29, 2021

lists are one of markdown's weak points though with numbered lists completely ignoring the numbers

admax88q · on Jan 29, 2021

Thats optimized for never having to renumber yoir lists as you edit your document.

I would hate to have to change 30 numbers because i want to move an item from last to first.

hyperpape · on Jan 29, 2021

Optimizing for readers definitely excludes lists like:

setr · on Jan 30, 2021

Tbf it’d be trivial to have a button/flag to fix it in the source, and a warning to notify the user. It’d might be debatable to do so automatically, but on demand there’s no issue (like words table of content update/warning).

But the current definition of “its always sequential, that’s it” means you could do it automatically without changing meaning. Whereas do-as-I-say is stuck with being unsafe, always

gerikson · on Jan 30, 2021

This is the same as HTML's <ol> though...

e12e · on Jan 29, 2021

It's a good effort, but I'm afraid it isn't a good fit for me, three things stand out:

1) the use of # rather than underline for headings. That why I feel you loose the beauty of the source form - headers disappear visually without a smart editor - if you demand a smart/wysiwyg editor anyway... Might just use a sgml/xml/html subset.

2) one symbol references. This strikes me as overly limiting - why not alphanumeric string - or at the very least any number string? It feels a bit strange to be forced to count 1,2...9,a - and not be able to things like a1, a2, b11.

3) "no way to put backticks inside a code span." So, no (full) examples of restricted markdown in codeblocks? Not to mention the many languages that use backticks?

I think you might be happier with reStructuredText than markdown: https://en.m.wikipedia.org/wiki/ReStructuredText

layer8 · on Jan 29, 2021

I always felt that the number of #s should increase with higher-level rather than with lower-level headings, for multiple reasons:

- More #s mean a larger visual footprint, which would be more appropriate for higher-level headings (cf. larger font size in rich-text rendering, and of course the = and - underlines). This is what bothers me the most.

- I tend to write documents bottom-up, starting with lowest-level headings, and higher-level headings are only introduced as the document evolves. Because I don’t know how many levels I’ll end up with, starting with a single # for the lowest level would work better.

- Lower-level headings are more frequent than higher-level headings, so having to type less for them makes sense, as well as reducing the likelihood to get the # count wrong (because you usually know what is a bottom-level heading, or the next one up, and you don’t have to know how many levels there are overall).

The existing convention mirrors the numbering levels of numbered headings, but I don’t find that visually convincing. It also matches the HTML h1-h6 elements of course, but nowadays that should be less of a concern.

Another thought is that maybe the headings hierarchy should be unlimited in both directions, so that you are never forced to shift all headings. One possibility to achieve that could be:

  ### even higher level
  ## higher level
  # base level
  .# sub level
  .## sub-sub level
  .### sub-sub-sub level

That syntax mirrors the magnitudes of decimal numbers. It’s just an idea I’ve been pondering, I’m not totally convinced myself. ;)

gerikson · on Jan 30, 2021

I like this idea, but that means you need to be able to decide what the "lowest" level is at the beginning (i.e. I seldom use anything below <h3>.)

layer8 · on Jan 30, 2021

Well, nothing keeps you from still starting with ###, so you have two spare levels to go down to if needed (## and #).

Alternatively, that’s where the “negative” levels would come in (i.e. the .# levels above).

codethief · on Jan 29, 2021

> 1) the use of # rather than underline for headings.

While in theory I understand the appeal of the "WYSIWYG" syntax of underlining headings, in practice I've always found it less readable than expected. Consider, for instance:

    I am a paragraph belonging to the previous section but it looks like the heading belongs to me.


    I am a heading indicating a new section
    =======================================

    I am a paragraph belonging to the heading but the heading is weirdly separated from me.

    I am another paragraph belonging to the heading.

In some editors/IDE, the underline even gets highlighted / boldfaced, thereby exacerbating this effect.

chipotle_coyote · on Jan 29, 2021

I very rarely see this style -- which comes from "Setext", IIRC, which might be patient zero of the "human readable plain text markup styles" -- in the wild anymore, and I suspect it's probably for the reasons you're describing here. I almost never use it myself, except in contexts where the "source file" is just as likely to be read as a rendered HTML version. README files are a good example.

> In some editors/IDE, the underline even gets highlighted / boldfaced, thereby exacerbating this effect.

I think a lot of editors just don't know about this style of heading, but they do know that a line of three or more dashes is used to create a horizontal rule, so they're trying to highlight that.

cannam · on Jan 29, 2021

I had never heard of Setext! I do know that, before Markdown became popular, lots of programmers had their own similar format going on - this was mine, from 1998:

Perl program https://sourceforge.net/p/rosegarden/code/HEAD/tree//branche...

Documentation https://sourceforge.net/p/rosegarden/code/HEAD/tree//branche...

I wonder now how much of this was picked up unknowingly from Setext-derived documents.

jasonpeacock · on Jan 29, 2021

There are many issues with the use of underline for headings:

1) Constantly fixing them to match the length of the heading, or just suck it up and always use the short form:

    My Heading
    ----

2) You're typing twice as much. I can easily start a line with # to make a heading, vs write the heading line, then carriage return, then add a bunch of dashes to make it a heading.

3) This is the fatal flaw - you only get 2 levels of headings when using underlines.

gritzko · on Jan 29, 2021

3) That is for an inline code span only; can interrupt the span, put an escaped backtick, reopen the span. There is an example in the document. In fenced blocks, anything is accepted.

2) Yes, that mostly implies [1],[2]... pattern. Have to think of other solutions.

1) Well, yes. Aestetics is difficult to quantify. I use empty lines so headers stand out.

jasonpeacock · on Jan 29, 2021

At first glance, there's a lot of words and not many examples. It's hard to quickly see what's different and special about this version of Markdown, other than "it's a formal subset of the other formats".

I see some examples for specific elements, but I'd love to see "here's an example demonstrating the problem(s) this addresses", and "here's the fixed version of that example".

There could be sub-sections of contrasting examples, like in coding style guides, showing the Old/New (Bad/Good?) styles.

gritzko · on Jan 29, 2021

Yep. The [document][1] itself may be a basic example.

[1]: http://doc.replicated.cc/%5EWiki/strictmark.sm?@text "The source code of the doc"

jedimastert · on Jan 29, 2021

I kinda got that vibe, but the source should have been way easier to find

TrevorFSmith · on Jan 29, 2021

I don't even agree with all of the design but Thank Goodness for a regular grammar, domain specific syntax transclusion, and only one syntax per feature. Love it.

gritzko · on Jan 29, 2021

The author is here, AMA.

mklcp · on Jan 29, 2021

Why are headers limited to four levels? Do they have to be always padded with spaces?

Why are references limited to one symbol? It can just be more mnemonic to have `java_doc` than `j

Why only full reference links are allowed? It would be handy to have a shortcut for links without reference, and it's not like we're running out of symbols: `?[]` could be used for links without reference while being consistent with `![]`, which is used for transclusion

Why is it impossible to put backticks inside a code span?

What is there a `` in an example in the Lists section?

I really like the spirit of this markdown alternative

gritzko · on Jan 29, 2021

Thanks. It makes all block level formatting aligned to 4 char indents. Hence, paddings. Hence the one letter limitation.

The terrible HTML comment hack is straight from the CommonMark spec... it solved a Markdown syntax issue... hope to kill it somehow.

ufo · on Jan 29, 2021

How fundamental is the requirement that all the "block stuff" be exactly four characters wide?

The parts that were the strangest for me are that link references can only contain a single unicode character and that block quotes are ">" plus three spaces, instead of ">" plus one space.

mixmastamyk · on Jan 29, 2021

It's odd for a second, then seemed fine to me. Think if hexadecimal just kept going…

jtbayly · on Jan 29, 2021

> no arbitrary nesting

What are the restrictions? For example, I care about these (off the top of my head):

1. Can I nest a block quote in another block quote? 2. Can I nest a block quote in an (ordered or unordered) list? (And vice versa?) 3. Can I emphasize some text inside a strong? (and vice versa?)

gritzko · on Jan 29, 2021

Thanks for the question. "No arbitrary nesting" applies to inline formatting. Higher precedence formatting can be nested within lower. So, nothing can be nested within a code span, for example, as it is the highest. Just a way to order things.

Container blocks are nested arbitrarily, up to 16 levels. Quote > list > quote > code, for example, is OK.

cratermoon · on Jan 29, 2021

What problem were you hoping to solve with this?

gritzko · on Jan 29, 2021

A reliable format for my docs, which is already well supported (and I want a Ragel parser because requirements).

In the long run, I am a Decentralized Web guy, so making Markdown a proper language has its own value.

m___ · on Feb 4, 2021

Your theoretical and well contexted overview of the Web, your mininal formatting overhead (no googles and 5000 cubicles), (single person in charge of content and formatting) view is essential not trivial.

mixmastamyk · on Jan 29, 2021

I had to increase the line-height to read it, was too cramped at 1.2.

remram · on Jan 30, 2021

Can I put backticks in inline code now? :)

What about making some of my code bold?

newlikeice · on Jan 29, 2021

did I miss link or is there a parsing script yet?

gritzko · on Jan 29, 2021

There is a C++ version which should be separated from a code base first... But the parser is Ragel-based, so contact me if you plan to make an implementation in any language Ragel supports.

aidenn0 · on Jan 29, 2021

> Yes, HTML is a widely supported standard, but it is hopelessly elephantine. Let's think, who can afford to develop/support a proper HTML engine? That is roughly one-and-a-half companies in the world. Hence the interest in a minimalistic hypertext markup language.

There are dozens of proper HTML engines. Implementing a proper DOM and CSS is the hard part, which is why we have few full-featured browsers.

geoelectric · on Jan 29, 2021

This won't work for my docs at all. I use all three unordered bullets to differentiate things in plaintext.

The opinionation that removes synonyms will kill any chances of this working with legacy IMO. Any significant doc base will hit myriad issues that come down to "I didn't like that form of it." I was initially OK with the idea of a parseable subset, which would be useful, but this goes too far with a "one way only" attitude. No thanks.

That said, someone should do what the project claims to do, without gutting the language. A formalized subset of CommonMark that only skips formally indescribable specs but otherwise preserves everything else would be useful in some situations. It might even become the defacto CommonMark spec if it's easier to implement.

jsilence · on Jan 29, 2021

Really wish the org format would gain more traction.

codethief · on Jan 29, 2021

> The unordered list markup symbol is a dash -. The other two Markdown options are * and +. But * is ambiguous and + is unpopular and there must be one way only!

THANK YOU for getting rid of the asterisk. As a former org-mode user it has always bugged me when people use the asterisk for lists, not headings. :)

chipotle_coyote · on Jan 29, 2021

I appreciate your viewpoint, but as someone who has used * out of habit for plain text lists long before Markdown was a thing, this is kind of a big ask. :)

ziml77 · on Jan 29, 2021

I like it. I mean I'm not sure how much I agree with some of the specific design decisions, but it's such an improvement to lose the ambiguity.

reificator · on Jan 29, 2021

Reference links as opposed to inline links is a huge plus, but like most others here I'm puzzled at the single-char restriction.

I tend to restructure documents often, writing things as they come into my head and moving pieces around to better convey my message. With named references, I can simply alphabetize them at the bottom and forget about them while I move sections around.

With single-char identifiers though, I'd feel an obligation to keep things in reference order, so one more thing to think about while I'm working.

On another note, I'm interested in this idea of reusing image links to preview CSVs and other formats. (graphviz comes to mind) Do you have any examples of this being used in a document and the tooling to stitch it together?

mst · on Jan 29, 2021

Name each link using an arbitrary emoji?

(I'm not even sure I'm joking)

svnpenn · on Jan 29, 2021

This is pointless. We can see the input and output:

http://doc.replicated.cc/%5EWiki/strictmark.sm?@text

http://doc.replicated.cc/%5EWiki/strictmark.sm

So obviously an implementation was made, but whoever wrote the article didnt bother to share the implementation.

kderbyma · on Jan 30, 2021

I appreciate all opinions as equally useless and this is just an opinionated less-usefull markdown.....I don't see the point at all....

markdown is meant for rendering .....

it should be as flexible as possible to write content....not restricted. I get the pain points of the devs ... never was the point of markdown. I feel like this engineers a new problem.... by removing an old solution