Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ironically one of the reasons markdown (and other text based file formats) were popular because you could use regular find/grep to analyze it, and version control to manage it.


I don't think anyone ever really expected to see widespread use of regexes to alter the structure of a Markdown document. Honestly, while something like "look for numbers and surround them with double-asterisks to put them in boldface" is feasible enough (and might even work!), I can't imagine that a lot of people would do that sort of thing very often (or want to) anyway.

If a document is supposed to have structure - even something as simple as nested lists of paragraphs - it doesn't seem realistic to expect regular text manipulation tools to do a whole lot with them. Something like "remove the second paragraph of the third entry in the fourth bullet-point list" is well beyond any sane use of any regex dialect that might be powerful enough. (Keeping in mind that traditional regexes can't balance brackets; presumably they can't properly track indentation levels either.)

See also: TOML - generally quite human-editable, but still very much structured with potentially arbitrary nesting.


> (Keeping in mind that traditional regexes can't balance brackets; presumably they can't properly track indentation levels either.)

You're right: Regular expressions are equivalent to finite state machines[1], which lack the infinite memory needed to handle arbitrarily nested structures [2]. If there is a depth limit, however, it is possible (but painful) to craft a regex to describe the situation. For example, suppose you have a language where angle brackets serve as grouping symbols, like parentheses usually do elsewhere [3]. Ignoring other characters, you could verify balanced brackets up to one nesting level with

  /^(<>)*$/
and two levels with

  /^(<(<[^<>]*>|[^<>])*>)*$/
Don't do this when you have better options.

---

[1] https://reindeereffect.github.io/2018/06/24/index.html

[2] As do any machines I can afford, but my money can buy a pretty good illusion.

[3] < and > are not among the typical regex metacharacters, so they make for an easier discussion.


I think that prospect (of programmatically structurally editing markdown files) would have made everyone burst out in laughter in 2000; if you want to programmatically alter stuff, put it into sexp's or some other syntax. SGML. Apparently human readable but really a tricky format leads to this sort of thing: https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...


> because you could use regular find/grep to analyze it

They were meant to be analyzable in some ways. Count lines, extract headers, maybe sed-replace some words. But being able to operate/analyze over multiline strings was never a strong point of unix tools.


Definitely, but it's neat nonetheless because more and more things are "structured Markdown" these days. Extremely useful for AI reasoning and outputs.


Man if we only had some type of markdown meant for machines to understand, that was specifically designed to handle arbitrary information nesting and tagging our lives would be so much better now.

We could have called it something like extended markdown language or something and use a wicked acronym like eXMaLa for it.

Shame no such technology exists and never did.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: