Merge Consecutive Blockquotes in Markdown

Original link: https://yihui.org/en/2023/09/merge-blockquotes/

logo.png

Older versions of Hugo use a library named Blackfriday to convert Markdown to
HTML, which is not compliant with CommonMark (unlike Goldmark in newer versions
of Hugo). One problem with Blackfriday is that blank lines inside blockquotes
are treated as part of the blockquotes. For example,

 > One > Two

will be converted to:

 <blockquote> <p>One</p> <p>Two</p> </blockquote>

With the CommonMark specs, it should be converted to two blockquotes because of
the blank line:

 <blockquote> <p>One</p> </blockquote> <blockquote> <p>Two</p> </blockquote>

To generate a single blockquote, the blank line must also start with > .

A lot of my old posts have this problem. I guess it might be solved by a Pandoc
(Lua) filter, but I do not need rigor, so I wrote an R function to process my
Markdown files instead ( like last time ):

 # add > to empty lines to merge consecutive blockquotes merge_bq = function(x) { if ((n <- length(x)) < 3) return(x) r = '^(>( >)* ?)(.*)$' i = grep(r, x) if (length(i) < 2) return(x) # exclude consecutive lines that start with > i = i[c(diff(i) > 1, TRUE)] if ((n <- length(i)) < 2) return(x) for (k in 1:(n - 1)) { i1 = i[k]; i2 = i[k + 1] # from line #i1 on, a line should not be blank unless # the rest of lines are all blank till line #i2 i3 = (i1 + 1):(i2 - 1) b = xfun::is_blank(x[i3]) if (any(b) && !all(tail(b, -which(b)[1]))) next p = sub(r, '\\1', x[i1]) # '> ' (or nested '> > ...') p = sub(' *$', ' ', p) # ensure trailing space x[i3] = paste0(p, x[i3]) } x }

The code is nowhere near elegant or easy to understand. I’m leaving a record
here nonetheless because I have written a lot of dumb code like this in my
career. I do not want to pretend that I’m always writing elegant code. On the
Contrary, most of code I write is perhaps quite dumb and boring.

Recently I talked with Peng Liu at JMP about our software development career
since he was hoping to organize a JSM session on this topic (which did not work
out, unfortunately). Although I did not plan to attend JSM, I thought for a
while what I could share if I were to give a talk. One thing that came to my
mind was how much formal training in computer science we need to be a successful
software engineer. I was thinking about this because I did not receive any
formal training, and I do not really know much about algorithms. If I were to
look for a software development job, I guess I’d be immediately rejected due to
my poor knowledge in computer science.

But I can write for loops (with care and patience) and use regular
expressions. I feel I have been making a living on them (thankfully the
imposter syndrome cannot beat
me). Usually I write code very slowly because my brain is slow. The above for
loop took me nearly two hours. On my mind, I was just wrestling with a few lines
of text—some start with > , and some are blank.

Some quick tests:

 merge_bq(c('> a', '', '> b'))
 > a > > b
 merge_bq(c('> a', 'c', '', '> b'))
 > a > c > > b
 merge_bq(c('> a', '', 'c', '> b'))
 > ac > b
 merge_bq(c('> a', '', '> > c', '', 'd', '', '> b'))
 > a > > > cd > b
 merge_bq(c('> a', '', '> > c', 'd', '', '', '> b'))
 > a > > > c > > d > > > > > b

All seem good, so I batch processed all my posts:

 # process all writable [R]md files files = blogdown:::list_rmds(pattern = blogdown:::md_pattern) files = files[file.access(files, 2) == 0] # writable for (f in files) xfun::process_file(f, merge_bq)

Thanks to Git, I could do a manual check on the
changes
before I commit them
(I did revert a few changes because blank lines actually meant to be blockquote
separators).

This article is reproduced from: https://yihui.org/en/2023/09/merge-blockquotes/
This site is only for collection, and the copyright belongs to the original author.