Original link: https://lawrenceli.me/blog/cloudflare
CloudFlare is a relatively well-known CDN service provider in the industry. It provides services including DNS resolution, WAF firewall, CDN acceleration, and DDoS protection. Later, it launched a series of functions that are more convenient for developers: CloudFlare Workers, KV, Zero Trust Tunnel, WARP… Everything is to provide a safe and fast Internet environment. If Vercel provides the infrastructure for front-end developers, then CloudFlare provides the infrastructure for the back-end traffic of tens of millions of websites.
Earlier this year, Cloudflare mitigated a record-breaking 71 million requests/second DDoS attack .
Vercel & CDN
Three years ago, I migrated my blog from WordPress to Vercel . As old users may remember, the domain name at that time was still *.now.sh
. The feeling of using Vercel for the first time can be said to be a treasure, but it may seem naive today-if you don’t know how to apply for a domain name from 0 to 1 to deploy a full-stack Web project, it is difficult to understand Vercel’s experience. What complex work is done behind the platform. This is the consistent practice of American companies as I understand it—they always cover up huge, sophisticated, and complex technologies or infrastructure under a simple, elegant product appearance. And I will keep vigilance and observation every time, if I do it, how can I achieve it? Later I learned Kubernetes.
Back to CloudFlare. From the first time I bought a domain name (2015) until now, I have all arranged the resolution rights on CloudFlare. Due to a network problem of Vercel , the domestic network was suddenly unable to access some of Vercel’s domain names in 2021 due to some kind of force majeure. Although my blog only has a small number of visitors every day, as a blog mainly in Chinese blog, it is still necessary to maintain the smooth flow of domestic network access. According to the new CNAME value provided by the official, I changed the resolution record on CloudFlare, and it was resolved smoothly. It was then that I noticed an option in the DNS Records console that had been overlooked before:
Proxied
I turned it on curiously, that is, from the gray DNS only
to this orange Proxied
. At that time, I didn’t realize that, in fact, CloudFlare has completely taken over all the traffic of my website from that moment and performed anycast ( Anycast); in other words, I added a layer of CloudFlare CDN on top of the existing Vercel CDN. Yes, this ushers in two problems:
- Vercel background warning CNAME resolution exception
- cache time issue
- Client IPs are all identified as CloudFlare IPs, and all requests to Vercel will be sent from CloudFlare data center
It can’t be that no one does it like me, right? In fact, Vercel does not recommend using another layer of CDN on top of it . Later, I also found a solution in turn: For CNAME, Vercel will regularly visit the resources under .well-known
path under the website and path to identify the configuration and verification website control information including CNAME and HTTPS certificate, so we can Directly in CloudFlare’s WAF, use this type of path as a whitelist to allow WAF to skip other security rules and let it go directly. For the client IP, you can refer to Available Managed Transforms to put some original client information in the request header. In terms of cache time, you still need to be familiar with some standard HTTP negotiation protocols on MDN, set different TTLs for different resources in a fine-grained manner, and take advantage of CloudFlare CDN and the browser’s own caching as much as possible – just a blog, isn’t it a bit of a cannonball? up?
CloudFlare’s holistic defense spans all defenses it can from L3 to L7. A request for website traffic proxied by CloudFlare will go through the following sequence:
These traffics are screened layer by layer internally, as well as some rules defined by ourselves, and finally reversed to the source site. Therefore, when deciding to use any CDN product, it is necessary to properly hide the source site IP of the server, and try not to expose any historical analysis values, otherwise all defenses will be in vain. If the IP of the source site has been exposed, you can only replace it with a new address in time. After the new rules are entered, CloudFlare’s global network will immediately apply the rules and take effect in real time. This is more or less thanks to the high-performance gateway OpenResty , which is open sourced by agentzh Zhang Yichun .
CloudFlare Workers & Serverless
We can deploy “functions” one by one on the “edge computing nodes” of the public cloud, and expose Sockets to the functions on these nodes to realize the ability to directly deploy scalable HTTP services without ignoring the underlying server. Of course, this requires these functions to be as stateless as possible. When there is no request and idle for a certain period of time, these function processes will simply disappear to free up computing resources, until the next event drives them to restart quickly and continue to provide services. This is the so-called Serverless.
It was also very surprising to learn about Serverless for the first time. AWS Lambda can commercialize the function so much (FaaS), and Vercel has also achieved out-of-the-box use on top of it. With CloudFlare’s existing data center, CloudFlare also launched their Serverless solution – CloudFlare Workers . The difference is that, compared to the original Vercel Serverless Function, CloudFlare Workers can handle persistent connection requests such as Server Sent Event and WebSocket. Although the follow-up Vercel Edge Function can also be implemented, but it can support too few Node.js Modules.
Not long ago, CloudFlare open-sourced the Workers runtime workerd .
There are many use cases for CloudFlare Workers. For example, implement a simple short URL redirection service , GitHub Proxy , and a lot of ChatGPT API Proxy implemented by each… It is convenient for too many domestic users.
Deno, another brand new runtime written by Node.js author Ryan Dahl for JavaScript in the past few years, also has a similar serverless service , the experience is also very friendly, and it also supports Web Standard API .
In order to achieve more data persistence functions of Serverless, they also launched their own KV storage implementation services, or Serverless databases.
profit model
Just like Vercel and Netlify, Cloudflare adopts a business model of “free trial, paid value-added”. CloudFlare CEO Matthew Prince once answered this question on StackOverflow: ” How can CloudFlare offer a free CDN with unlimited bandwidth? “:
- More free users means more data to better help protect paying users
- The source of many large customers is precisely because the employees of these companies are free users of CloudFlare, and they recommend CloudFlare to the company at work
- The move of free is to promote, which can reduce the cost of recruitment and hire the most powerful engineers in the world
- While free users experience new features, it can also help the testing of this new feature, shortening the iteration cycle
- The chicken-and-egg problem of bandwidth costs: Only a large number of users can have bargaining power when facing telecom operators around the world
In 2019, CloudFlare was listed on the New York Stock Exchange, stock code: NET. The issue price was US $15, and the current price is US $63, an increase of 320%. Voiceover: Is it too late to buy it now?
In China, it is currently cooperating with JD Cloud, which is limited to enterprise users. One-third of the Fortune 500 currently use CloudFlare, with plenty of room to grow. After OpenAI’s ChatGPT went online, CloudFlare gained a lot of exposure and defended against a large number of abusive users and potential threats.
values
CloudFlare has received some criticism for adhering to the principles of net neutrality.
A typical event is that CloudFlare terminated its service to 8chan due to public opinion and legal pressure. CloudFlare claims that it is a private company, and that half of CloudFlare’s revenue comes from outside the United States, so it is not bound by the First Amendment to the US Constitution, and its customers are the entire Internet market. Due to the large volume of business, some websites that contain terrorism and hate speech will inevitably use its services. This is also the problem faced by most large Internet companies. Similar to the Kuaibo Wang Xin incident, they are unwilling to act as content arbitrators. Since the birth of the Internet, the pace of law has always been unable to keep up with the development of technology.
CloudFlare’s efforts on the TLS protocol
Client Hello – SNI
Let’s talk about some technical progress. Many readers know the existence of Server Name Indication (SNI), which is a field sent from the client to the server in the initial Client Hello stage of the TLS/SSL protocol, and the content is the host name or domain name of the website. Quoting CloudFlare’s image explanation:
SNI is a bit like mailing a package to an apartment building instead of a single-family home. When mailing mail to someone’s single-family house, the street address alone is enough to send the package to the recipient. However, when a package enters an apartment building, the apartment number is required in addition to the street address. Otherwise, the package may not reach the recipient or be delivered at all. Many web servers are more like apartment complexes than single-family houses: they host multiple domain names, so the IP address alone is not enough to indicate which domain name a user is trying to access….. When multiple websites are hosted on a single server and share a single IP address , and each website has its own SSL certificate, the server might not know which SSL certificate to present when a client device tries to connect securely to one of the websites. This is because the SSL/TLS handshake occurs before the client device is instructed to connect to a website via HTTP.
It is somewhat similar to Host
request header in the HTTP protocol (if you have configured multiple virtual hosts with Nginx on the same server, you should be familiar with it), but SNI works on L4 and is completed before the TCP handshake. It was not part of the TLS protocol at first, and was added to the TLS protocol as an extension field in 2003 ( RFC 6066 ). Clients such as modern browsers already support this field.
We can use WireShack to capture this field. Apply this filter condition ssl.handshake.extensions_server_name
, try to capture a packet and send a TLS request
openssl s_client -connect lawrenceli.me:443 -servername lawrenceli.me -state -debug < /dev/null
It can be seen from the results that SNI uses plain text for transmission, which leads to a problem – even if the traffic encrypted by TLS/HTTPS still exposes the domain name we are visiting in plain text. “So what? Isn’t DNS also exposed?” Good question – DoH addresses plaintext risk of DNS requests ( RFC 8484 ). Therefore, in fact, the only data leakage risk of TLS at present is only this field. CloudFlare has moved out two solutions: ESNI and ECH .
We can use Chrome’s switch chrome://flags/#encrypted-client-hello
to enable the browser’s ECH client support. The security information of HTTPS traffic can be viewed through the Security Tab of Chrome DevTool. We can use this link to test the client’s support for this solution. Of course, these need to be configured on the server side to be fully enabled. That’s the end of the topic.
Client Hello – JA3
Another practice of using Client Hello for security protection is TLS client fingerprint: JA3 & JA3S. This design was inspired by information security expert Lee Brotherston’s research on TLS fingerprinting .
For specific implementation details, please refer to Salesforce’s open source JA3 .
In short, the byte array sent by the client during the TLS handshake process, that is, some fields and extensions in the Client Hello stage, is spliced in a fixed way, and a unique string is generated based on the digest MD5, which is called the JA3 fingerprint. Different browsers or TLS clients have different fingerprints. In a large amount of data sampling, CloudFlare is able to count which requests come from botnets, robot crawlers, Python libraries, or normal user browsers based on this data (JA3 & JA3S, the latter includes server-side fingerprints in the Server Hello stage) , or mobile phone access. This also explains the fact that when many students write crawlers, it is invalid to use the HTTP protocol to replace User-Agent
request header, because CloudFlare’s defense is at the lower-level L4 TLS stage. CloudFlare’s TLS fingerprint authentication WAF is also deployed on the web side of ChatGPT. I also found related code implementation on GitHub, and bypassed this defense by replacing the TLS Client. For most people, this is already a big anti-climbing threshold; and CloudFlare can change the WAF policy at any time to invalidate the old fingerprint.
end
OpenAI’s ChatGPT gives a good demonstration of CloudFlare, and I recommend CloudFlare to readers. On the one hand, because it has been providing permanent personal free service, on the other hand, it is its ease of use and global vision. I have also used a WAF product from a certain domestic manufacturer. The interface is confusing and messy. I don’t know why it is charged when I look at the bill. The routine is too deep and the price is high (maybe I am too poor).
This article is transferred from: https://lawrenceli.me/blog/cloudflare
This site is only for collection, and the copyright belongs to the original author.