Original link: https://colobu.com/2022/12/04/Shopify-monolith-served-1-27-Million-requests-per-second-during-Black-Friday/
On this year’s Black Silk Friday abroad (similar to the domestic Double Eleven shopping season), Shopify has achieved outstanding results, and their engineering team gave the following technical data:
- MySQL: 14 million/second QPS (peak value)
- Metrics: 20 billion indicators/minute, 27G/second indicator data
- Service peak: 1.27 million/second service calls, 75.98 million times/minute
- 32 billion asynchronous services
- 24 billion webhhoks
- Kafka: 20 million messages/second
Shopify is a Canadian multinational e-commerce company headquartered in Ottawa, Ontario, and Shopify is also the name of the e-commerce platform owned by the company. Shopify offers online retailers a suite of services “including payments, marketing, shipping, and customer engagement tools to simplify the process of launching an online store for small merchants.” Shopify was founded in 2004. The three founders of the company originally planned to open an online store for ski equipment (Snowdevil). Due to dissatisfaction with the e-commerce products on the market, one of the founders, the programmer Lutke, decided to develop a Set your own program. Lütke uses the open source framework Ruby on Rails for development, and the Shopify platform was launched in June 2006. In 2015, Shopify was listed on the New York and Toronto stock exchanges.
Some people always ask why big factories mainly use certain languages? For example, Tencent Baidu prefers C/C++, Ali likes Java, Byte loves Go, some companies use Python, and so on. In fact, there is not much reason in many cases. The company’s initial goal is to survive, use the most familiar programming language to quickly generate prototype products, and gain market recognition. In the process of rapid expansion, it is generally based on the most basic language. And platforms, there are very few redesigns and re-starts in the middle of development. At most, some teams will introduce other programming languages during the rapid expansion of the company, but the main programming language and platform still have the shadow of the results of the initial stage. .
Back on topic, we see one of the counts published by Shopify for this year’s Black Friday:
Service peak: 1.27 million/second service calls, 75.98 million times/minute
What you may not know is that Shopify, like stackoverflow, is not a fan of microservices. Their business platform is a single program, and all business platform codes are in one warehouse.
Can a single program achieve millions of services per second, especially complex e-commerce services?
This is a misunderstanding of many people, especially today’s monolithic architecture seems to be labeled as “bad”. There is still a market for monolithic programs in some scenarios, especially for businesses like Shopify. They have implemented modular monolithic programs. In fact, they are not deployed in the form of one server and one application, but hundreds of monoliths can be deployed. Programs can also be scaled horizontally through sharding and other methods.
Of course, I am not advocating the monolithic architecture in organizing this article, and I think they have a historical background in adopting the monolithic architecture. Through modular organization, component-based development also meets their current scale and business direction. But once the scale continues to expand and the company conducts diversified operations, it will still encounter various problems such as scale, just like what they encountered in the past few years. After more than a year of governance, they have found a solution to the single structure However, one day in the future, they still have to continue to manage.
Then this article looks at their experience through the information shared by their engineering team.
Monoliths vs Microservices
According to the definition on Wikipedia, the various functions of a monolith program are intertwined, rather than architecturally independent components. For a Shopify monolith, the code that handles shipping calculations coexists with the code that handles checkout, with little to stop them from calling each other. Over time, this led to a high degree of coupling between code that handled different business processes.
Advantages and disadvantages of monolithic programs
advantage:
- It is easy to implement and has no complicated architecture, especially for RoR (Ruby on Rails) used by Shopify. All codes are in one library, which is easy to reference.
- With just one codebase, managing and publishing is easy. All code and functions are searchable in the code repository. There is only one test and release pipeline. The data is stored in a shared database, and the cross-table query of the database is also convenient.
- Because the functions of a single program are released to one place as a whole, the deployment architecture is also simple, and a set of environmental standards is enough. Database, web service, background task, Redis, Kafka, ES, etc. are all in one set.
- Because it is a monomer, you can directly call other components instead of accessing other services through rpc or web service api
shortcoming:
With the expansion of scale, the single program also faces uncontrollable problems, and Shopify also encountered problems in 2016.
- Applications become brittle, new code has unintended effects, and seemingly innocuous changes can trigger a cascade of unrelated test failures.
- Tight coupling of code and lack of boundaries, making tests hard to write and slow to run on CI
- Even for the development of a simple change, you need to understand the complex context, especially for new employees = black eyes. Complex monolithic applications result in a steep learning curve.
Shopify engineers believed that all the issues they encountered were a direct result of the lack of boundaries between different functions in the code. What they have to do is to reduce the coupling between different business domains, but the question is how to do it?
Of course, I think there are still some business areas that may need large-scale expansion, but some businesses do not, so what if some business expansion is carried out? A certain business needs to be upgraded or downgraded, and a single program needs to be lifted as a whole? If there is a problem with one business that causes the program to be abnormal, will all the businesses be abnormal? All of these, Shopify engineers did not answer, or that this is not the most important question for them.
Why not use microservices?
Shopify engineers also looked at the microservices architecture. A microservices architecture is an approach to application development in which a large application is built as a set of smaller services that are deployed independently. They believe that while microservices can solve the problems they have, they create another set of problems.
Transformed into microservices, they must maintain separate test and deployment pipelines, and require additional infrastructure overhead for each microservice. Each microservice is deployed independently, and the access between them needs to cross the network, which brings delay and reduces availability. And large refactorings across multiple services can be tedious, requiring changes to all dependent services and coordinated deployments.
To summarize, their concerns were:
- Complex testing and operation and maintenance
- Requires more deployment costs
- Microservices Bring Network Latency and Reduced Availability
- Now it is very troublesome to change it all to microservices
Items 1 and 2 may not be a problem. As a listed company, this cost should not be a problem. The third point is the inevitable problems caused by microservices, and there will be trade-offs in the choice of architecture. The fourth article belongs to the situation where riding a tiger is hard to get off. My structure is already like this. Let me push it back. For a listed company that has served so many customers, the risk of a complete change in this structure is indeed too high. In addition, they did not consider the larger-scale impact of future business development and the needs of new businesses.
In order to reduce the impact of Article 4, microservices can be applied gradually, such as extracting the checkout service into microservices and gradually evolving. But Shopify engineers did not do this, but started from the modularization of a single program to solve it.
modular monolithic program
They wanted a solution that increases modularity without increasing the number of deployment units, allowing us to get the advantages of monoliths and microservices without too many disadvantages. Isn’t this just wanting to have both the fish and the bear’s paw, is there such a good thing?
A modular monolith is a system where all code serves a single application and there are strictly enforced boundaries between different business domains.
The way Shopify implements modular monomers is Componentization. At the beginning of 2017, they set up a capable team to solve this problem.
They reorganized their code structure:
It can be seen that they divide a single program into multiple components according to the business, each component is a separate small RoR program, and the final component is a ruby module, and they are no longer coupled.
Further decouple business domains, define clear boundary interfaces between them, domain boundaries are identified by public APIs, and have exclusive ownership of associated data. They analyze the calls between components, and specially made a tracking system for analysis and advancement.
It can be said that Shopify engineers have done a lot of work in the direction of modular single programs.
In fact, I feel that they are doing this and are evolving towards a microservice architecture. Of course, they may not be transformed into microservices in the future, but the evolution method is very similar to that of microservices. First, the division of business domains and rectification of boundaries are carried out. A single program.
Shopify data monomer
The architecture of the Shopify database is also very interesting. They also deal with a large number of database accesses, just like this year’s Black Friday indicator, with a peak value of 14 million/second QPS. How to deal with it?
Initially they used sharding to be able to scale the database horizontally and continue to grow.
1
2
3
4
5
|
Sharding.with_each_shard do
some_action
end
|
But performance and scalability are lost. If any of our shards were to go down, the entire operation would not be available across the entire platform. In 2016, we sat down to restructure Shopify’s runtime architecture. They realized that simply sharding the database wasn’t enough, each shard needed to be completely isolated so failures didn’t turn into platform outages. They introduced pods (not to be confused with Kubernetes pods) to solve this problem. A container consists of a set of stores that reside on a fully isolated set of data stores. Each unit of work (web requests and delayed jobs) is assigned to a single Pod. This means that only one Pod needs to be online to process a request.
Shopify assigns a pair of data centers to each Pod. At any time, one of them will be the active data center, and the second will act as a recovery site, which is master-standby disaster recovery. They also developed a tool called Pod Mover that allows us to move pods to their recovery data centers within a minute without dropping requests or jobs.
If the business can be decomposed into such, what are you afraid of scaling up? If there is a need for scale, just expand the POD, and there is isolation between the PODs, and each POD is a monomer.
Generally speaking, Shopify adopts a pragmatic approach, combined with its own company’s historical structure and development characteristics, and explored a set of effective methods, which dealt well with Liuli Ajing Hongfeng in this year’s Black Friday.
References
- https://www.reddit.com/r/programming/comments/z90juf/shopify_monolith_served_127_million_requests_per/
- https://shopify.engineering/shopify-monolith
- https://shopify.engineering/deconstructing-monolith-designing-software-maximizes-developer-productivity
- https://shopify.engineering/a-pods-architecture-to-allow-shopify-to-scale
- https://stackshare.io/kirs/decisions
This article is reproduced from: https://colobu.com/2022/12/04/Shopify-monolith-served-1-27-Million-requests-per-second-during-Black-Friday/
This site is only for collection, and the copyright belongs to the original author.