Use CommonMark parser for Markdown

• Mar 24, 2018 - 10:03
Type
musescore.org
Severity
S5 - Suggestion
Status
active
Project

Markdown is a simple markup language to add formatting to plain text documents, which can be easily parsed to convert to HTML or other rich text formats. It is popular on coding websites.

As there was no official standard, different websites developed their own Markdown "flavours", which use different syntax and support different features, and are therefore incompatible with each other.

Standard

Now there is an (unofficial) standard, in the form of CommonMark. CommonMark is used by many websites, including GitHub, who rewrote their Github Flavoured Markdown parser to be based on CommonMark.

Problems with the current parser

The current Markdown implementation on MuseScore.org is quite buggy and lacks support for various Markdown features. Swapping to a CommonMark-compliant parser is likely the quickest way to fix the issues.

Implementations

CommonMark implementations are available in many languages, including PHP.

Work is in progress to convert Drupal's Markdown module to CommonMark.

BONUS: There is a JavaScript parser for CommonMark, which would allow users to see a live preview while writing comments!


Comments

What particular bugs/problems does current markdown have? You didn't specify any.

It is good to follow the market and the web tendencies. But it would be nice to have reasons for each changes.

1) The current parser leaves a gap when the list level decreases

- List item (level 1)
- List item (level 1)
  - sublist item (level 2)
  - sublist item (level 2)
      - sub-sublist item  (level 3) # this line also needed more indentation than expected
  - sublist item (level 2)
- List item (level 1)
  • List item (level 1)
  • List item (level 1)
    • sublist item (level 2)
    • sublist item (level 2)
      • sub-sublist item (level 3)
    • sublist item (level 2)
  • List item (level 1)

2) Table headings are not aligned correctly

Heading 1 | Heading 2
:---:|:---:
Cell 1|Cell 2
Cell 3 | Cell 4
Cell 5 | Cell 6
Heading 1 Heading 2
Cell 1 Cell 2
Cell 3 Cell 4
Cell 5 Cell 6

3) Missing whitespace after code bocks

In my examples above I left a blank line between the code block and the rendered output, but the blank line itself is not being rendered, so code blocks kind of merge into the text below. (It would also be helpful if code blocks had a different coloured background to the rest of the text.)


4) Code blocks are not syntax-highlighted

So this is more of a feature request than a bug, but it would be nice to have, and CommonMark gives you this for free.

No problem.
It should be mentioned that strict CommonMark is lacking a few features that are already in use on MuseScore.org, such as:

  • Tables (see example above)
  • Custom anchor text for headings: # Heading {#anchor}
  • Automatic link creation from a plain URL: www.example.com

However, these features are commonly available in CommonMark libraries as "extensions" that go beyond the base specification. For example, GitHub Flavoured Markdown is now based on CommonMark, plus extensions.

Compliance with the CommonMark specification is basically the minimum requirement for any new Markdown parser.

There's another bug I was going to include in my list, but for some reason it wasn't there yesterday, but now it's back again. Either that or it was masked by some peculiarity in my comment syntax.

5) Can't use "<word>" in code blocks, not even as "&lt;word&gt;"

Markdown lets deliberately lets HTML code through so that people can use HTML to render formatting features not supported by Markdown, such as embedded videos, etc. This means that when writing Markdown, you need to escape characters that have special meanings in HTML, otherwise they will not be displayed by the browser (and may cause other unwanted effects). However, within a code block Markdown is supposed to do the escaping for you, so that it is easy to share HTML or XML (or MSCX) code snippets.

Markdown environment <word> &lt;word&gt; &amp;lt;word&amp;gt;
Plain text hidden rendered as <word> rendered as &lt;word&gt;
Code block hidden* rendered as &lt;word&gt; rendered as &amp;lt;word&amp;gt;

* incorrect behaviour, should be rendered as <word>

This means that it is impossible to write MSCX code in a code block on MuseScore.org.

We experimented with CommonMark and compared it with the current PHP Markdown filter. To get CommonMark working will take more time as opposed to fixing the few issue raised with PHP Markdown. Fixing means simply tweaking the css. Keeping you posted on the progress.

Status (old) active fixed
Status active fixed

@shoogle the issues with the PHP markdown filter are fixed. There might be regressions with other things though.

Status (old) fixed active
Status fixed active

Well, as per @Thomas the fix was to revert yesterday's changes, so marking the issue active again

I recently discovered the ability to add syntax highlighting to posts using these tags:
<code>, <pre>, <bash>, <cmake>, <cpp>, <html>, <js>, <php>, <python>, <qml>, <xml>.

For example...

<!-- Here is some XML code -->
<Foo bar="1">
  <Baz>Some text</Baz>
  <Flob/>
</Foo>

I never noticed this before. Is this a new feature or has it been around for a while?

It would be cool if this worked in Markdown code blocks too, like on GitHub:

```language
# this is a Markdown code block with
# syntax highlighting enabled for "language"
```

Even if switching to a CommonMark parser is really not an option (I'm curious to know why), the CommonMark specification contains lots of helpful examples that could be used as unit tests to catch bugs in the current parser.

It should be fairly easy to write a script to extract the examples from the specification and then run them through the MuseScore.org parser to see which ones are not handled correctly.

BTW, I'm curious to know why it wasn't possible to switch to a CommonMark parser?

  • Is it too different to the current parser?
    • In that case you only need to use it on the new posts. Old posts could continue using the old parser, or they could be permanently converted to HTML. The handbook would need to be converted to the new syntax, but that wouldn't take long for volunteers to do that manually.
  • Does it not support all the features currently in use on MuseScore.org?
    • It's true that certain features (e.g. tables) are not included in the CommonMark specification, but many of the existing CommonMark-compliant parsers implement these features anyway as extensions to the specification. You'd just need to pick a parser that has the features you need.