Discussion on Design Ideas of Front-end SSR System

In the past few months, in addition to writing the back-end interface, most of the time I have been working on some front-end infrastructure, the establishment of technical solutions, and the coordination of the monitoring and alarm system of the entire link. In the entire link, SSR played a large part.

There are many articles dedicated to teaching you: how to make a SSR demo (including some library documents), and as a part of the system, in addition to how to develop, there are many more realistic problems waiting for us to go to Solve it, and this article will introduce some of the problems we encountered.

Why we need SSR

Speaking of history, the Web initially developed from an MVC to SPA, and then to SSR, which seems to confirm that fashion is a circle, and the result is still you. From a certain point of view, it is indeed a kind of upward spiral——

  • MVC: Most of the friends at the back-end Hello World level have seen it, such as PHP, JSP and other MVC development systems. If the front-end also understands the corresponding language, then you can start work.
  • SPA: We have successfully separated the front-end and the back-end, so that everyone does not need to develop together. From the perspective of system decoupling, this is a big breakthrough.

So, since the front and back ends are decoupled, why do we still need a server to render my html?

There are two issues discussed here, the first is called SEO, and the second is called above-the-fold rendering.

SEO

The history of SEO can be traced back to 1997, when the friends reading this article may not even be born. Its full name is Search Engine Optimization. With the birth of search engines, the optimization of websites for search engines has also become an accompanying problem.

For most students who have not actually built a website, but only develop in the company, this word may only exist in the document, knowing that there seems to be such a thing, and SSR is to solve this problem; but for more people who have built For the “webmaster” of a website, SEO itself is an epic battle of wits and courage with search engines.

If I really want to talk about SEO, as far as I know the history, I can also write an article to talk about it in detail, but here I don’t need too much space to introduce the development history of the entire Internet website, and briefly introduce the year 2000. The Bible that was circulated later: “Content is king, and external links are king” to show respect for an era.

Far too far, the key of this article is not to explain how I create content, but how I use technical means to make my content better exposed, which is also the energy that technology can generate for SEO.

So why exactly do we need SEO? Obviously, the search engine is a free traffic entrance except for the PPC ranking. The essence of SEO is how to make you occupy more keywords and make your ranking higher, so that you can get free traffic. In the first year of cost reduction and efficiency improvement, this is such a tempting thing. Even for large companies, capital investment can be saved.

Now, you may have an initial idea of ​​SEO, but the question is – what does this have to do with my SSR.

Now let’s think about another problem: if you need to write a high-performance general- purpose crawler, what might you do:

  1. One IP pool + multi-process, multi-thread
  2. Starting from the home page, explore every a tag in the DOM, recurse, and save the data

It is necessary to emphasize generality here, because the crawler we usually do to obtain data from a website may carry out customized data acquisition and development based on whether it takes the interface or renders it into the first screen DOM, but if I am a search engine crawler, it is very Obviously I’m just looking for eigenvalues ​​quickly and accurately, and I won’t waste time on website customization.

Now you know a key point: I need to parse the DOM, explore the tags. So we need to use SSR to make my html have data to grab – not bodyless content mode in CSR mode.

 <html> <body><div id="app"></div></body> </html>

Although Google’s crawler has become high-end and has learned to run some JavaScript, it is said that it will not wait for the completion of data acquisition, but will only run synchronous scripts – in any case, the controllability of the system is also a point we have to consider. When we say search engine, we are not just talking about Google. From this point of view, SSR is also a must part of us.

Writing here, I have found that SEO alone can already pull out a separate article, so I will not list “what data I actually put in to do SEO”, just click on it.

Above the fold rendering

I spent a lot of time on popularizing SEO, but for the front-end partners, some performance indicators may be more familiar (of course, in fact, SEO will also score your website according to performance indicators, which also has a certain optimization effect. If this There are follow-up articles that can be introduced later), and the first screen time is naturally one of them.

Similarly, this article is not to introduce performance optimization, so the definition and measurement of specific performance indicators will not be introduced too much here. This part of the front-end friends is already familiar with it, so the overall length will be much shorter than the previous SEO. (should).

If you are a student who does not understand performance indicators, here is a brief explanation: For the first screen, let’s put aside definitions and values, and think about such a scenario from the user’s point of view. For users, take CSR and SSR, respectively in Under what circumstances can you see valid data:

  • In the case of CSR, first return the html without content, then run JS, execute fetch, wait for the fetch to end, then render, and finally the loading ends, and the user sees the content of the page.
  • In the case of SSR, the necessary first screen data has been written into the html first. When the html is downloaded, the entire page is basically available (if you remove JS from your page, it can also be used), maybe the image has not been loaded. , but the overall web page structure and website content have already appeared, and users can see the content of the website at once.

Ideally, the user experience of SSR is of course better (it can also be seen here that the performance indicators are not groundless, nor are they eager for quick success, everything is considered from the perspective of user experience), but why is it said here? It’s an ideal situation, and I’ll explain it to you later.

SSR vs SSG

A problem that is easy to encounter when doing SSR is that some friends who don’t understand the difference between SSR and SSG may come over and ask you “why don’t you use SSG”, “how come I have never encountered the performance bottleneck you mentioned. I’ve been there”, “Why do you use so many CPU resources, I don’t have any”.

Obviously, people who say this just don’t know the scene where SSR and SSG really work. In the attitude of knowing what is and knowing why, let’s “politely discuss” (depending on whether the other party is polite or not): You know What is the difference between SSR and SSG?

If you can answer me: SSG is static to generate an html, and SSR is dynamic to render resources with the request, then our journey can begin.

If you understand SSG, you will find that these are two things altogether. If I use SSG to go straight to the page, then after the page is actually generated, I can drop it on the CDN, a proper static page. And this does not mean that the SSG page must be immutable. In fact, if it is a timed and fixed data set, then SSG can also save resources and achieve the two cores we want to solve above. The basic point – very simple, just run a job, even if there are some resources with thousands of people and thousands of faces, it does not need to appear on the first screen, then we can actually use SSG.

The responsibilities of SSR are completely different. It puts all the interface requests needed in the first screen into its own service, so we can easily deal with scenarios with thousands of people and extremely high real-time requirements, and do the above Two basic points (in fact, when you open bilibili, the entire homepage may be stitched together by algorithms, in this case, only the SSR strategy can be used).

It can be said that the discussion of SSR and SSG is meaningless without the business scenario. No one is a silver bullet. Only by analyzing it in combination with the business can you know what to use in what scenario.

Of course, in comparison, considering the entire link, SSG is much simpler and easier, so in the same way, I will not spend more time on SSG.

How to develop SSR

As mentioned at the beginning of the article, there are a lot of articles on the market to introduce: how do I develop an SSR application, but for the completeness of this article, here will also briefly introduce the SSR development process (with Vue3 + Vite) as an example.

Start with demo

Regarding the demo, it is relatively clear in the Vue and Vite documents, at least it is not a problem to write a demo:

For the development stage (of a demo), we only need to pay attention to the execution of the life cycle and make good use of useSSRContext . All the precautions are written in the documentation of Vue and Vite.

In fact, in this part, the question I want to discuss is – how should I choose an SSR framework, or should I use an SSR framework.

For Vue, it itself gives several options: Nuxt , Vite Plugin SSR , for a large-scale C-side project that requires continuous iteration, using this open source highly encapsulated framework may not be a good choice. (Especially when Nuxt is still in beta version), for teams with certain R&D capabilities and sufficient project scale, it may be a better choice to perform upper-level encapsulation based on the bottom layer. Important in large systems.

Engineering

Even if it is a demo, if you want to deploy it online, you will inevitably encounter front-end engineering problems. Even if this chapter is about demo, we will still introduce how our entire pipeline needs to be handled from an engineering perspective.

Aside from the simple and rude mode of “push files directly into the virtual machine, and then run” in the primitive society, what other options do we have in the docker society?

The first and most intuitive solution is to start after building a complete file list, and directly press the full file into the container for startup. This solution is very server-side thinking. The SSR container itself is a completely independent closed-loop system, which can be used to For grayscale and rollback, the container itself is a very good resource dimension. But this way I may not be able to guarantee that CSR and SSR must run the same code.

The second solution is: combine my CSR with SSR, SSR only builds the server, and the rest of the resources are generated in the construction of the CSR, the advantage of this is that I can guarantee that the version of the running code is always predictable , definitely the same.

A question that may be asked is: My running version mentioned above is expected to be the same copy, why do I need such a guarantee mechanism?

downgrade

This introduces the first issue of our actual development and demo differences, downgrading.

In actual development, we cannot guarantee that a service is 100% available. In order to ensure the maximum availability presented to users, we will perform some common operations on each service in the system: “fault tolerance”, “fuse”, “downgrade” . For an SSR service, an additional SSR is introduced, which is visible to the naked eye and adds an extra layer to the link. If there is a problem with the SSR service and there is no degradation, it will directly affect the user (the front-end service is used as the traffic entry, in fact, for the Availability requirements are extremely high, but front-end engineers do not need to care in most scenarios), this is not “one interface is unavailable”, but the entire website has become a state of 500. In the case of CSR itself, I go directly to static resources , there are qualitative differences in the QPS that can be carried (even you may directly push file storage + CDN for static resources). The ideal downgrade situation is: if there is a problem with the SSR service, then downgrade to direct access to static resources (CSR rendering). After all, we introduced SSR only to optimize traffic and user experience. If there is a problem with the SSR service, this part of the traffic is “optimized”, which is not worth the loss.

If we want to downgrade, we must ensure that my CSR and SSR are running the same code, if the versions are inconsistent, it may cause more unexpected situations – Boom!

This guarantee system can be a unified compilation capability provided by the release system: one compilation, two environments are released; or it can be business control (plan 2).

Someone is going to ask again, no matter how bad it is, my human flesh guarantee is probably only one or two bug fixes level, is it not a big problem – if your static resources are on CDN, the problem is very big, it may be As a result, the version of CSR was pushed, but because the version of SSR was different, the corresponding static resources could not be found, and the release version exploded again.

Of course, we can also introduce resource checks to ensure version consistency. In general, at this stage, the choice of technical solutions is more like a choice based on infrastructure capabilities.

So, how does scheme 2 connect CSR and SSR in series – we need a service similar to the configuration center to associate and obtain static resources (and manage versions), and download (inject) during the container startup phase to the local, as the starting basis. Since the pull itself is the resource compiled when the CSR is built, it can naturally be guaranteed to be the same version.

In addition, what are the advantages of option 2? Let’s assume that our service scale is large enough. Now that I have 100 containers, I need to update the corresponding resources in CSR and SSR when I send a web version. The network I/O of the configuration center is much faster than the container release. Thinking about it, this seems like a good plan?

Of course, this also has disadvantages. For example, we mentioned the use of containers for grayscale experiments. If we use scheme 2, we must require the configuration center to support grayscale.

At the same time, the SSR service also needs to be repackaged to install the latest dependencies when the dependencies are updated. If the dependencies change, then the solution 2 will not be so good, and even if the checking mechanism is not done well, it may cause the SSR service to stay in place. Explosion is not available in its entirety.

Service decoupling

If you think about it carefully, Scheme 1 seems to have no way to completely separate SSR and CSR services, while Scheme 2 ensures that SSR and CSR can be decoupled from the system design (after all, you only need to use network I/O to pull down resources) .

This introduces a new question: whether decoupling is necessary.

In modern design, we are accustomed to making a system or application as small as possible to reduce the risk of changes. Although this will bring some link management costs, in most cases, the benefits may outweigh the disadvantages. . In the SSR scenario, is it really necessary for us to make the SSR exist independently of the CSR code (for example, two repos can be maintained and managed separately).

From the entire development process, we can see that this may do more harm than good for SSR applications – we mentioned one of them above: “Need to synchronize update dependencies”, there is no manual synchronization between the two applications. Doubt is painful, lack of perception before launch, although we can develop independently, is independence necessary?

My SSR service itself should be attached to my CSR, and it is a close cooperation relationship. In the experience of “decoupling”, we will actually encounter a very serious problem: I developed under the CSR service. The code, when it goes online, there is a problem in the SSR environment, and the SSR cannot run. At this time, I have accumulated a version of the code, and at the same time, it is also in the stage of testing or even going online. The general students have already started to panic when they arrive here. : What’s wrong with this.

If the same SSR mode can be used for development in the development stage, such problems can be relatively much less (although some inconsistencies between dev mode and production mode will inevitably be encountered during development, but relatively fewer and more controllable). At the same time, we only need to maintain a list of dependencies. On the surface, this is a larger coupled system, but in fact the overall maintenance and governance costs are reduced.

Summarize

A small summary of the problems in the development stage:

Option 1: Full resources Option 2: CSR+SSR integration
advantage 1. Complete and stand-alone system
2. You can use the Docker feature to roll back directly
3. Can do grayscale based on container
4. The build package is always up to date (depending on the latest)
1. Can guarantee eventual consistency
2. You can use the CSR template to check whether the resource is valid
3. Publish and rollback faster
4. Support service decoupling
shortcoming 1. Runs independently, so eventual consistency is not guaranteed
2. It is impossible to detect whether the corresponding downgrade is really available (the consistency of CDN resources cannot be guaranteed)
1. CDN cannot guarantee resource access security
2. Network IO may cause startup failure
3. Cannot perform resource grayscale for SSR
4. May forget to update dependencies and cause rollovers

At the same time, we must also pay attention to developing in SSR mode as much as possible in the development phase (and also ensure the availability of CSR mode).

Of course, there is a lot of blank grayscale space between Scheme 1 and Scheme 2, and you can customize it according to your actual development and infrastructure.

How to improve system throughput

We assume here that your service must be used by some people, and there are many people – then, we will encounter a more difficult problem than demo, how to load the increasing QPS. The term QPS (Query Per Second) may be unfamiliar to the front-end. Simply put, it is the number of requests per second. In the interview stage, if SSR is written in the front-end resume, I will basically ask similar questions, and most people For coping with the growing volume, the most intuitive impression is: expansion.

Of course, this will blow up the chat. From the perspective of the system, we still have to look at the problem comprehensively to solve the bottleneck in the system throughput and help the system to run fast and well.

The essence of SSR

Before doing optimization, let’s think about the essence of SSR. The SSR we introduced earlier is actually more to interpret what it is from a business perspective.

  • CPU calculation: run JS and render the corresponding DOM structure
  • Network IO: Get the interface data required for the first screen

Of course, as of now, the bottleneck of our SSR service is most of the time on the CPU, which is what differentiates it from our other back-end services – most services in a business system are IO-intensive applications , there are very few CPU-intensive applications, which also brings us some additional considerations.

network IO

Let’s start with a simple one. For network IO, we naturally hope to save time in the entire request process – we introduced above “ideally, the user experience of SSR is better, because the first screen will come out first. ”, but if too much time is spent on internal network IO, then for the user, the time of the white screen becomes better hidden, but it becomes a negative optimization.

In response to this, it is actually a good solution. In IO-intensive applications, we have had too much experience:

  1. Use the intranet request interface: this is the literal meaning
  2. Reduce the volume of network packets as much as possible: This requires the joint efforts of the back-end and the front-end that provide the interface to negotiate the best way (including but not limited to only sending the necessary fields, using grpc, etc.)
  3. Effective use of cache: Utilizing the cache is essentially a common way, but in this scenario, you may need to think about “Do I want to use the cache?” and “How do I use the cache”.

In order to avoid the overhead of frequently establishing and destroying connections, we can use keepalive to avoid extreme situations. For example, we can use httpAgent to ensure the maximum number of sockets and avoid ensuring the security of the entire system in the case of abnormal traffic.

Please note that enabling keepalive requires ensuring that the other end of the request supports keepalive . You can test whether keepalive is supported by following: https://stackoverflow.com/questions/4242145/how-to-test-http-keep-alive-is-actually -working

In front-end applications, we may also set timeout , but it must be clearly distinguished: where does the timeout time come from – in most settings, in fact, it is the timeout calculated from the establishment of the connection Time, here is a conflict with the SSR scene.

We actually prepared a connection pool above, but if the request is too large and the number is not listed, the request should also be cancelled at the point, which is determined by the overall time of upstream and downstream, because in our SLB layer (that is, nginx), and a timeout period will inevitably be configured, which may be 1s or 800ms. After the time is up, those that are not listed should be cancelled directly in place. (Actually even if there is no pool, we may also encounter this problem in the link)

Summarizing verbally, we know that we urgently need a fetch method that generates context based on each request. Its implementation can be arbitrary, but at least he needs to support cancellation requests. This is very similar to context.WithTimeout in Golang.

Another common problem will be encountered here. For Node services, global variables will be shared, which is different from CSR that only takes effect for end users. Therefore, for this scenario, we need to create a new instance for each request. and use.

In Vue, we can use a separate instance of Vue in SSR plus inject / provide to do a request based fetch . This will become more and more important with the growth of the business scale (and the frequency of traffic attacks), and it will also help to design a system that conforms to the server-side thinking.

CPU

Let’s consider the problem of CPU computing. CPU is a topic that we are not easy to encounter. From the perspective of principle, it is “optimizing and improving the algorithm”, so that the entire DOM structure can be generated as quickly as possible; the essence of the generated core algorithm It is encapsulated in the method you use (such as Vite), so one point we can do is to simplify the content structure of our first screen as much as possible, so that every point of the CPU is used on the edge, if Invalid DOM, too much data, will undoubtedly increase the overhead of rendering.

Therefore, let’s summarize here. In terms of CPU optimization, there may be:

  • Use the fastest possible build scheme
  • Optimize DOM structure and bundle size

Of course, it is also possible that your business code itself has some code with high time complexity, which is also a point that can be optimized.

If you want to know where the main resource overhead of our system is, you can “run a point” and use the related tools of Nodejs to generate a flame graph:

A brief summary of how to use:

  • Start: node --prof index starts the program in sampling mode, and a log file is generated.
  • Generate flame graph: node --prof-process --preprocess -j isolate*.log | npx flamebearer

After generating the flame graph, we may still need to understand the flame graph, and then we can know where the bottleneck of our system is. There are also many articles about how to read the flame graph (even Mr. Ruan has introduced it), and I will only post it here. Links, if necessary, may be introduced in detail in future articles:

In addition, some people will also ask: Can we cache the SSR, whether it is a page-level cache or a component-level cache-for a general solution, this is a relatively straightforward solution. Indeed, it can effectively reduce the CPU overhead, but in any case, the design of “cache” must be determined according to the specific business model: whether my business has certain requirements for real-time performance; what is my cache granularity, timeout What is the time; even after I do the cache, what is my cache hit rate, it doesn’t make any sense to just say that the cache is on. – Not to mention that caching affects your entire development model at the same time, and may introduce additional development costs, which is also not a silver bullet.

For this scenario, we can also combine the SSG described above and consider what your first screen needs to make a reasonable trade-off, and finally get a solution that suits you.

Going Further: The Birth of the System

Obviously, for a system, SSR is not a service that exists in isolation. It needs to be combined with the entire link to ensure the service. In fact, we have mentioned the defense and degradation capabilities of some services above, but there are also some general capabilities. In terms of security, this point can be configured in combination with its own infrastructure to ensure that it can detect abnormalities in time and does not affect downstream services when abnormalities occur:

  1. Monitoring alarms
  2. Current limiting and frequency limiting
  3. downgrade

There are too many things to be said in detail in this party that are combined with the company’s infrastructure, so I will not expand it here. However, it is recommended that you understand your upstream and downstream when doing a service, so as to facilitate troubleshooting.

Summarize

This article roughly summarizes some of the challenges we encountered in the actual development process. Of course, with the growth of the volume, there will undoubtedly be some new challenges, and the above is also constantly mentioned, that is, “combining the actual Business Scenarios and Infrastructure”, otherwise it would be too much to talk about, so this article is just to introduce some ideas and popularize science by the way.

This article is reprinted from https://www.codesky.me/archives/frontend-ssr-system-design.wind
This site is for inclusion only, and the copyright belongs to the original author.