Author | Jonas Hietala
Translation | Nuka-Cola
Planning | Chu Xingjuan
I’ve been using Hakyll as a static site generator for nine years. Going back further, I mainly use Jekyll, and dynamic pages are probably implemented with Perl plus Mojolicious and PHP plus Kohana. But I have only vague impressions of these, there was no git at the time, so many traces of development were not found.
Now, I’ve finally made up my mind to move to my own custom site builder written in Rust. Through this rewrite, I mainly want to solve the following three problems:
First, slower and slower speeds. On my low-end laptop, a full site rebuild takes about 75 seconds (no compilation involved, just site generation). I only have 240 total posts on this blog, so it shouldn’t be that slow. Although it has a good caching system and only executes the updated watch command on post changes during editing, the overall execution speed is too slow.
Second, external dependencies. While the site generator itself is written in Haskell, it contains other dependencies in addition to numerous Haskell libraries. My blog helper script is written in Perl, I use sassc for sass conversion, Python’s pygments for syntax highlighting, and s3cmd to upload the resulting site to S3. It’s really annoying to manage and update so many dependencies, I want to get out of the hassle and get back to blogging content.
Third, set the problem. Related to a large number of dependencies, my blog site would sometimes go down and had to spend time debugging and fixing it. Sometimes I just have an idea in my head when the system crashes and the website builder has to be replaced immediately.
Some friends may ask, what else can crash such a simple website? Mainly still newer pots that tend to cause problems in unexpected ways. E.g:
-
After updating GHC, it cannot find the cabal package.
-
When running the Haskell binary, the system prompts:
[ERROR] Prelude.read: no parse
(Only on desktop, but works fine on my low-end laptop.)
Or the following Perl error:
Magic.c: loadable library and perl binaries are mismatched (got handshake key 0xcd00080, needed 0xeb00080)
(Only on laptops, works fine on desktops.)
-
When the Pandoc parameter changes between versions of Hakyll, it breaks the rendering of code in the Atom feed.
I know these aren’t too big of a problem, but I just want to write a blog post with ease, so getting it working is the number one goal.
1 Haskell triggers my mental infighting
I actually like Haskell, especially the pure function part. Also, I like the declarative approach Hakyll takes to site configuration. Take generating static (ie independent pages) as an example
match "static/*.markdown" $ do
route staticRoute
compile $ pandocCompiler streams
>>= loadAndApplyTemplate "templates/static.html" siteCtx
>>= loadAndApplyTemplate "templates/site.html" siteCtx
>>= deIndexUrl
Even if you don’t understand what $ and >>= stand for, you can still see that we are looking for files in the static/ folder, sending these files to pandocCompiler (to convert the original markdown format), and then to the template, Unindex urls afterwards (to avoid links ending in index.html).
How simple, how clear!
But I haven’t used Haskell in years, so every time I need to add slightly more complex functionality to a website, it takes a lot of effort.
For example, I’d like to add next/previous links to a post, but it’s difficult to do so easily. In the end, I had to take the time to relearn Haskell and Hakyll. Even so, the solution I figured out is very slow and relies on a linear search to find the next/previous post. Until now, I didn’t know how to set this up correctly with Hakyll.
I believe that everyone has a good way, but for me, such a small function consumes too much energy, and I can’t stand it.
2 Why choose Rust?
-
I like to use Rust, and the preference is basically enough to determine how to implement this kind of side project.
-
Rust is very performant, and text conversion should be exactly what it is good at.
-
Cargo is a great peace of mind. After installing Rust, you can execute cargo build and wait for the results.
Why reinvent the wheel? Because I want to be proactive and try out what kind of static site generator I can write. It shouldn’t be too difficult, I have full control of my blog site with it, and enjoy the flexibility of functionality far beyond an off-the-shelf generator. Of course, I also know that tools like cobalt can perform flexible type conversion of pages with any language. I just want to be flexible and have fun solving problems.
Regarding the details of the implementation, due to space constraints, I cannot fully review the entire build process in this article. Interested friends can click here to view the project source code.
(https://ift.tt/lRyYIp1)
Break the “hard bones”
At first, I was worried that I would not be able to reproduce the various Hakyll features that I am familiar with, such as the template engine, syntax highlighting for multiple languages, or the watch command that automatically regenerates edited pages and acts as a file server. View posts in your browser as you write.
But it turns out that every “hard bone” has a corresponding ideal tool. Let’s take a look at a few libraries that I use with outstanding effects:
-
Use tera as template engine. It is more powerful than Hakyll because it can perform complex operations like loops:
<div class="post-footer">
<nav class="tag-links">
Posted in .
</nav>
</div>
-
Use pulldown-cmark to parse Markdown.
For Markdown’s standard syntax specification CommonMark, pulldown-cmark works really well. It’s faster, but it’s not as widely supported as Pandoc, so I’ll have to extend support with other features. This issue will be discussed later.
-
Use syntect to achieve syntax highlighting and support Sublime Text syntax.
-
Parse metadata in posts with yaml-front-matter.
-
Use grass as a Sass compiler in pure Rust.
-
Use axum to create a static file server responsible for hosting your site locally.
-
Use hotwatch to monitor file changes, so you can update the page when the file content changes.
-
Parse the generated html with scraper. I need it for some of my tests and certain transformations.
-
Upload the generated site to S3 storage with rust-s3.
Even with these libraries, my Rust source file itself is over 6000 lines long. I must admit that Rust code can be a bit verbose, and I’m not very good at it, but this project was a lot more work than expected. (But it seems like all software projects do…)
Markdown conversion
Although only using standard markdown in the post can avoid this step, over the years my posts have covered a lot of functions and extensions that pulldown-cmark can’t support, so I can only solve it by coding myself.
preprocessing
I set up a preprocessing step to create a graph with multiple images. This is a general processing step in the following form:
::: <type>
<content>
:::
I use it for different types of image collections like Flex, Figure and Gallery. Here’s an example:
::: Flex
/images/img1.png
/images/img2.png
/images/img3.png
Figcaption goes here
:::
It will be converted to:
<figure class="flex-33">
<img src="/images/img1.png" />
<img src="/images/img2.png" />
<img src="/images/img3.png" />
<figcaption>Figcaption goes here</figcaption>
</figure>
How is this achieved? Of course with regular expressions!
use lazy_static::lazy_static;
use regex::{Captures, Regex};
use std::borrow::Cow;
lazy_static! {
static ref BLOCK: Regex = Regex::new(
r#"(?xsm)
^
# Opening :::
:{3}
\s+
# Parsing id type
(?P<id>\w+)
\s*
$
# Content inside
(?P<content>.+?)
# Ending :::
^:::$
"#
)
.unwrap();
}
pub fn parse_fenced_blocks(s: &str) -> Cow<str> {
BLOCK.replace_all(s, |caps: &Captures| -> String {
parse_block(
caps.name("id").unwrap().as_str(),
caps.name("content").unwrap().as_str(),
)
})
}
fn parse_block(id: &str, content: &str) -> String {
...
}
(The image and graph parsing section is too long, so let’s just skip it.)
Extend pulldown-cmark
I also extended pulldown-cmark with my own transform:
// Issue a warning during the build process if any markdown link is broken.
let transformed = Parser::new_with_broken_link_callback(s, Options::all(), Some(&mut cb));
// Demote headers (eg h1 -> h2), give them an "id" and an "a" tag.
let transformed = TransformHeaders::new(transformed);
// Convert standalone images to figures.
let transformed = AutoFigures::new(transformed);
// Embed raw youtube links using iframes.
let transformed = EmbedYoutube::new(transformed);
// Syntax highlighting.
let transformed = CodeBlockSyntaxHighlight::new(transformed);
let transformed = InlineCodeSyntaxHighlight::new(transformed);
// Parse `{ :attr }` attributes for blockquotes, to generate asides for instance.
let transformed = QuoteAttrs::new(transformed);
// parse `{ .class }` attributes for tables, to allow styling for tables.
let transformed = TableAttrs::new(transformed);
I’ve tried things like title downgrades and embedding naked YouTube links before, and it’s pretty straightforward. But now that you think about it, it might be better to embed the YouTube link in a pre- or post-processing step.
Pandoc can support adding attributes and classes to any element, which is very useful. So the following part:
![](/images/img1.png){ height=100 }
can be converted to this:
<figure>
<img src="/images/img1.png" height="100">
</figure>
This functionality is used everywhere, so I decided to reimplement it in Rust, only this time in a less general and hacky way.
Another conflicting feature I use in Pandoc is evaluating markdown within html tags. There is a problem with the rendering effect now:
<aside>
My [link][link_ref]
</aside>
I originally intended to implement this functionality in a common preprocessing step, but then I always forget to quote the link. So in the following example:
::: Aside
My [link][link_ref]
:::
[link_ref]: /some/path
link is no longer converted to a link, all parsing is done only within ::: .
> Some text
{ :notice }
This invokes a notification parser, which creates an <aside> tag (instead of a <blockquote> tag) in the above example, while preserving the parsed markdown.
While existing crates use syntect to add code highlighting, I wrote a feature myself and wrapped it in a <code> tag to support inline code highlighting. For example, “Inside row: let x = 2;” would read:
performance boost
I didn’t spend much time optimizing performance, but still found two performance points.
First, if you use syntect and include custom syntax, you should compress the SyntaxSet to binary format. Another point is to use rayon to achieve parallel rendering. Rendering refers to the process of parsing markdown, applying templates, and creating output files. The power of Rayon is that its efficiency in performing this task is only limited by CPU performance, and it is very easy to use (as long as the code is structured correctly). Here is a simplified example of rendering:
fn render(&self) -> Result<()> {
let mut items = Vec::new();
// Add posts, archives, and all other files that should be generated here.
for post in &self.content.posts {
items.push(post.as_ref());
}
// Render all items.
items
.iter()
.try_for_each(|item| self.render_item(*item))
}
To parallelize, we just need to change iter() to par_iter():
use rayon::iter::{IntoParallelRefIterator, ParallelIterator};
items
.par_iter() // This line
.try_for_each(|item| self.render_item(*item))
That’s it, very simple!
I also admit that the performance gains here are very limited, and the real performance improvements come mostly from the libraries I use. For example, my old site used an external pygments process written in Python for syntax highlighting, and now the alternative is a highlighter written in Rust. The latter is not only much faster, but also less difficult to parallelize.
sanity check
Maintaining my own website, I discovered that the original development project is so error-prone. Like accidentally linking to a page or image that doesn’t exist, or not using [my link][todo] to define a link reference, and always forgetting to update it before posting.
So, in addition to testing basic functions like the watch command, I parsed the entire site and checked that all internal links were present and correct (also verified the some-title section in /blog/my-post#some-title). External links are also checked, but I’m using manual commands here.
At the beginning of the article, I listed some of my previous setup problems. Let’s see how to solve it. During the generation process, I also adopted stricter inspection standards to avoid missing all kinds of weird errors as much as possible.
3 What is the effect?
At the beginning of the article, I listed some problems with the previous setup. Let’s take a look at how to solve it.
Now, on that low-end laptop, a full site rebuild (excluding compile time) takes only 4 seconds. The performance has been improved by 18 times at once, which is quite good. Of course, there is definitely room for improvement – for example, I use rayon to handle file IO, and I can definitely optimize it if I use an asynchronous mechanism; and I don’t use a cache system, so I have to regenerate all files every time I build (But by looking at it, I found the build process to be pretty smart).
Note that I’m not saying that Rust is necessarily faster than Haskell, only two specific implementations are being compared here. I believe that there must be masters who can achieve the same speed improvement in Haskell.
All my functionality is now implemented in Rust, no external scripts/tools to install and maintain.
As long as Rust is used in the system, cargo build will always be docile and trouble-free. I think that’s probably one of the most prominent advantages of low-key Rust – the build system doesn’t give people anything to do.
You don’t have to manually find missing dependencies, sacrifice some sub-features to be cross-platform, or cause havoc when the build system pulls updates automatically. Just lie down in the chair and wait for the code to compile.
4 Rust cured my mental internal friction
While I find it really easier in Rust to create features like series of articles or previous/next links, I don’t mean to say that Rust is easier to use than Haskell. I mean, Rust is easier for me personally to understand than Haskell
The biggest difference is likely to be practical experience. I’ve been using Rust lately, and I’ve had little to no experience with Haskell since I started a website in Haskell ten years ago. So if I haven’t touched Rust for ten years, it’s definitely going to be a pain to use again.
Overall, I am very satisfied with my attempt. It was a fun and rewarding project, and while it was more work than I expected, it did eliminate a problem that had plagued me for a long time. Hope my experience is helpful to you.
The text and pictures in this article are from InfoQ Architecture Headlines
This article is reprinted from https://www.techug.com/post/rust-cured-my-internal-frictiond779b490bf3c1172b14a/
This site is for inclusion only, and the copyright belongs to the original author.