I realized today that some applications out there still use for non-legacy reasons. I really have no idea why anybody would do that in year 2020. It’s a very questionable decision security-wise, and it has no usability benefits either.

· · Web · 3 · 0 · 0

is more than two decades old and it made perfect sense back when it was introduced. It was somewhat of “HTML light” because at the time safely enforcing only a subset of HTML was very complicated. In 2007 I still witnessed MySpace fail at it, repeatedly.

But that’s a thing of the past. Today, very reliable HTML sanitizers exist. So whether your markup is HTML, Markdown or anything else, the approach is always the same: you do any necessary conversions, and *then* you run the HTML code through an HTML sanitizer. And that’s safe.

Now back to . It being seemingly simple means that most implementations don’t bother with HTML sanitization. Instead, the expectation is that you run a bunch of regexps to produce HTML code and it will just be fine. Except that usually it’s not:

I’ve seen several processors that already failed at flagging javascript: links. Or that had obvious flaws, allowing injection of arbitrary HTML attributes or tags. But even when obvious issues are taken care of, processing of nested tags is extremely tricky to secure.

It doesn’t mean that it is impossible. Also in 2007 I wrote a markup processor (custom syntax) with a proper parser producing a syntax tree, so that the generated HTML code would be inherently valid and safe. Lots of effort to solve a problem which has a generic solutions today.

With not serving a security purpose today and, in fact, often being a security risk – why does anybody still use it for a new project? It has the usability of raw HTML but way more undocumented edge cases. If people write markup by hand, is the way to go today…


Most of these examples arise from bad parsers however. You don't use a straight replacement regex here; you use a regex looking for the type of values you want.


For example, for the color tag, you're looking for either a word made of A-Za-z or an RGB value of pound sign plus six alphanumerics.

Good article though!

@WPalant 🤔 Friendica and Hubzilla (+later projects) use BBCode.

I see #Zap implementing BBCode sanitizing improvements in the code within the last day. I assume related, but also wouldn't be surprised if pure coincidence.
@WPalant Would you earnestly like to know a reason why #Friendica still uses BBCodes?
@WPalant I'm on the core #Friendica development team so I believe I can provide the insight: We offer a few custom BBCode tags related to the fact that Friendica supports multiple protocols that we would have to port to whatever other markup language we would choose.

For applications that only require basic formatting, Markdown is hands down the better candidate for the job, but I have no experience adding custom markup for Markdown so I have no idea what it would entail for us.

Security-wise, we're looking into using an HTML sanitization library so that we can precisely control what's being produced because we've had a recent report about an XSS vulnerabilty so I'm with you on that front.

@hypolite Ehm, I know, I created that report. Now guess what the context of this thread is. 😉

So it’s all about custom markup? Out of the top of my head, a generic solution with Markdown would be adding some custom HTML tags. These can be processed independently of the Markdown processor, either before or after the processor runs (security-wise the former is preferable but might not be flexible enough).

@WPalant Ha, I didn't make the connection between your GitHub account and this Mastodon one.

It is about custom markup but it also is about legacy reasons (but your original post was not about them). The Friendica project was started in 2010 by someone who didn't care about the right way to program and cared more about privacy than security or usability. So we have been constantly wrestling with legacy code and behaviors which has somewhat hampered our efforts to improve the software.

Thank you for the suggestion, we also have been looking into using a BBCode lexer to improve both the security and allow tag nesting but we've ran into specific compatibility issues that the upstream library reportedly fixed, but I haven't taken the time to go back to it.

@hypolite I wonder why (the connection). @WPalant is doing this exemplary well by listing profiles on the personal website to tell profile equivalence 🤓

@WPalant @bekopharm Because this is how the post looks on my Friendica instance:

Notice the different display name and profile picture than both his current on Mastodon and on GitHub.
@bekopharm There's nothing technical that can be improved about this. He decided to have two separate Mastodon accounts with different display names/profile pictures, I decided not to consult his Mastodon profile before replying to him.

Additionally, missing the connection was inconsequential because I would have replied the same to him whether I knew he posted the GitHub issue or not as we didn't go into the why on the issue.

@hypolite It was simply funny because I was certain that you looked up my Mastodon account coming from that issue I created. It never occurred to me that you stumbled upon this thread simply because somebody unrelated mentioned Friendica using BBCode. @bekopharm

@WPalant @bekopharm This post of yours was indeed brought to my attention because I follow @lightone. The coincidence is funny indeed!

@leip4Ier You probably missed these comments, might be of interest for you.

@WPalant Thank you for taking the time to look at some Fediverse projects and their #security! I've never noticed you mention any such cases via this Mastodon account, so am using this opportunity. Now, when there're so many interconnected projects, there may be multiple security issues. All the help from Fediverse #infosec community is much appreciated! 👾

Sign in to participate in the conversation
Infosec Exchange

A Mastodon instance for info/cyber security-minded people.