Detection and identification algorithm of network proxy behavior characteristics based on traffic and network connection

Original link: https://www.blueskyxn.com/202205/6060.html?utm_source=rss&utm_medium=rss&utm_campaign=%25e5%259f%25ba%25e4%25ba%258e%25e6%25b5%2581%25e9%2587%258f% 25e4%25b8%258e%25e7%25bd%2591%25e7%25bb%259c%25e8%25bf%259e%25e6%258e%25a5%25e7%259a%2584%25e7%25bd%2591%25e7%25bb%259c%25e4% 25bb%25a3%25e7%2590%2586%25e8%25a1%258c%25e4%25b8%25ba%25e7%2589%25b9%25e5%25be%2581%25e6%25a3%2580%25e6%25b5%258b%25e4%25b8% 258e%25e8%25af%2586

基于流量与网络连接的网络代理行为特征检测与识别算法

Detection and identification algorithm of network proxy behavior characteristics based on traffic and network connection

foreword

2022 Children’s Day is coming, let’s celebrate╰(○’◡’○)╮

This article discusses bilaterally from the perspectives of proxy and cybersecurity offensive and defensive technologies.

? I gave it to you, what? ? people and how to prevent ? Do you still want to teach? (But the current products don’t seem to be very resistant to living.)

I hope you don’t fall down by Children’s Day╰(○’◡’○)╮ What should be repaired, what should be repaired, what should be run, should be run, should be replaced, should be spared.

Related technology introduction

According to references

  1. Analysis and understanding of TLS handshake protocol – analysis of an HTTPS request traffic packet
  2. SSL/TLS protocol parsing! What is SNI? SNI Identification?
  3. Encryption Basics Three TLS/SSL HTTPS

  4. How does SSL work? | SSL Certificates and TLS

You can simply learn about the relevant technologies yourself

To put it simply, in the current general situation of ordinary users

The following content falls under the scope of “encryption and confidentiality”:

  1. Path path
  2. Content web content
  3. Data content of cookies and other conversations
  4. User Agent User Agent
  5. HTTP Methods request method [not too sure, I remember it]

The following content does not belong to the scope of “encryption and confidentiality”:

  1. The IP, Port of the web visitor
  2. IP, Port of the web provider
  3. The host domain name of the web page (via SNI, the ashes of ESNI who tried to fight this have been lifted)
  4. DNS lookup for the domain name of the web page (unless encrypted DNS is used)
  5. time of visit
  6. The certificate of the webpage (without private KEY)
  7. data transfer volume
  8. data transfer bandwidth

At the same time, the certificate of the web page will expose the following content

  1. Issuer, CA organization
  2. the object being issued
  3. Issue Date, Expiration Date
  4. The domain name range supported by the certificate (especially the multi-domain, pan-domain shared certificate)
  5. certificate public key

Network proxy behavior characteristics for traffic and network connections

If you want to know how to identify, you must know yourself and the enemy. According to the current mainstream proxy model, except for IPsec, OpenV*N, L2TP and other outdated products, the common ones are S*s, S*R, V*y, V*s , T*n, X*y, you know all these things, don’t ask me what this is, I’m not dao.

(The following domestic refers to the mainland China and inland areas to which the PRC belongs to the People’s Republic of China, excluding Hong Kong, Macao and Taiwan and some territorial disputed areas, and the outside is the area on the earth other than the “domestic areas”, including Hong Kong, Macao and Taiwan)

1-4 Link example is the general access method:

1. Links for normal domestic users to access unrestricted domestic websites

  • Zhang San –> Domestic ISP –> Domestic website server

2. Links to restricted domestic websites for normal domestic users

  • Zhang San –> Domestic ISP –> Security Gateway (DNS hijacking/pollution, HTTP hijacking/blocking, IPPort/TCP/UDP/IP blocking/resetting, BGP hijacking, man-in-the-middle attack) –> Crawling
  • Zhang San –> Silly Browser, Silly System –> Crawling

3. Links for normal domestic users to access unrestricted overseas websites

  • Zhang San –> Domestic ISP –> Cross-border and other security gateways (release) –> Overseas ISP –> Overseas website server

4. Links for normal domestic users to access restricted overseas websites

  • Zhang San –> Domestic ISP –> Cross-border and other security gateways (DNS hijacking/pollution, HTTP hijacking/blocking, IPPort/TCP/UDP/IP blocking/resetting, BGP hijacking, man-in-the-middle attack) –> Crawling
  • Zhang San –> Silly Browser, Silly System –> Crawling

The 5-8 link example is an unconventional approach to restricted access to overseas websites:

5. Directly use the overseas server as a proxy to access the link of restricted websites

  • Zhang San –> Domestic ISP –> Cross-border and other security gateways (release) –> Overseas ISP –> Overseas proxy server –> Overseas ISP –> Overseas website server

6. Use the domestic server to transfer the overseas server as a proxy to access the link of the restricted website

  • Zhang San –> Domestic ISP –> Domestic server –> Domestic ISP –> Cross-border and other security gateways (release) –> Overseas ISP –> Overseas proxy server –> Overseas ISP –> Overseas website server

7. Use the cross-border network dedicated line as a link to access restricted websites

  • Zhang San –> domestic ISP –> domestic transit server –> cross-border dedicated line segment –> overseas server –> overseas ISP –> overseas website server

8. Use VNC and other remote desktop technologies to achieve remote access to restricted websites

  • Zhang San –> Domestic ISP –> Domestic transit server –> Cross-border and other security gateways (release) –> Overseas remote desktop server –> Overseas ISP –> Overseas website server

Of course, in addition to cross-border and other security gateways, domestic ISPs themselves also have some censorship and blocking functions, such as DNS hijacking

Through the example of link 1-8, normal people should know how this link goes.

Through the above part, normal people should know what is the encrypted and non-encrypted content in the HTTPS/TLS environment

Then to achieve detection and identification, it is necessary to make targeted breakthroughs for key link nodes, bypass methods, and non-confidential content.

The idea of ​​the algorithm and the example method

traffic pattern

In any proxy behavior, downloading and online viewing of large media files requires a long time, large flow, and large bandwidth to download

There are several breakthrough points

  1. Whether it is a common/heavyly used domain name (for example, Bilibili’s digital media overseas node, Steam international node download domain name)
  2. Is it a common/heavyly used IP (eg CloudFlare CDN IP, Akamai CDN IP)
  3. Average bandwidth during the point-to-point period (for example, 1080P traffic is 5-10M, 4K video traffic is 20-50M, and IDM downloads are full at full speed)
  4. Traffic symmetry (requires the server or its upper-level network management to supervise) (for example, the general traffic and bandwidth of the proxy server are peer-to-peer)
  5. Visitors, usage and access time of the domain name + IP
  6. Detect whether the connection belongs to ordinary, normal, and compliant use through active detection (such as checking the corresponding content of the network, checking whether the domain name and its associated domain name/IP/ASN are included/whether there is a black and white list record/is there any keyword/is there a real name? /Where is the registrar/Domain extension risk level)

Practical application means of comparison include but are not limited to

  1. Block strange domain name suffixes (such as cf, ga, gq, tk, ml five free domain names, such as xyz, de and other cheap domain names, such as me, cc, top and other domain names with a lot of black history)
  2. Block strange primary domain keywords (eg airport, v*n, FQ, v*y)
  3. Block strange subdomain keywords (such as HK, TW, SG, US, Azure, AWS, Hinet, IPLC, IEPL, v*y, lv, yun, cu, cm, ct, ddns, az, cn2, gia, 9929, dmit, do, vu, vir, rn, pr, cloud, emby, drive, cdn, gd)
  4. Block the average bandwidth of multiple time periods is very close (for example, continuous 5~50M, continuous running full)
  5. Block IP and ASN with black history (such as CFCDN, AWS, Linode, DO, Vultr, Oracle that are abused daily)
  6. A small number of individuals consistently visit strange websites
  7. The content of the website is stupid and the traffic performance is abnormal (such as the periodic table, SpeedTest, pagoda start page, WordPress, Whmcs and other websites that do not have continuous/large traffic)

At the same time, for machines that can be monitored (including but not limited to, Tencent Cloud domestic and foreign servers regulated by China, and domestic and foreign servers of China’s three major operators), traffic identification can be performed by deploying network management, network security, and firewall equipment. Whether you are equal from top to bottom, you can’t say it’s difficult, you can only say that you have a hand.

network connection mode

First of all, according to the link example, the “domestic transit server” that may be used can be detected first

No matter what kind of dedicated line or an ordinary computer room, there must be network management/network security/firewall equipment. You can analyze it through this part, for example, look at these foolish users:

QQ%E5%9B%BE%E7%89%8720220526173447.png

QQ%E5%9B%BE%E7%89%8720220526174253.png

QQ%E5%9B%BE%E7%89%8720220526173428.png

Maybe this is unscrupulous. Some write the Path directly using HTTP/WS, and even TM with keywords. It can only be said that this is really a master trick, and those who use HTTPS, who are they fooling the domain name?

So for HTTP/WS, nothing is kept secret. As long as you use this in mainland China, no matter whether there is a cross-border or not, it is all plaintext and ostentatious. Of course, it is not necessarily all sent. After all, all WS are free of flow still playing.

For HTTPS/WSS, some contents are encrypted, but there are still a lot of non-encrypted contents, such as

  1. Certificate (whether it is a spam, high-risk certificate, such as CF, self-signed, test , LE/TA and other free certificates, expired, domain name mismatch, black domain name sharing)
  2. Domain name and its content (consistent with the attack direction of the domain name above, analyze from the aspects of public network data, black and white history, negative records, active detection, main/subdomain keywords, manual marking, etc.)
  3. Server IP and ASN (consistent with the attack direction of IP/ASN domain name, analyze the black and white history of IP/ASN, abuse situation, connection situation, active detection and port scanning, etc.)
  4. Visitor IP (for example, in the recent DNS query and HTTP access of the IP, whether there is access to sensitive words or restricted content, such as Google, VPN, Youtube, whether the IP access object has a proxy access model*)

Note: The proxy access model*, typically has

  1. Global proxy: basically no access in China, continuous single (or a small number of different) access to a certain IP or domain name abroad, there may be DNS plaintext query of domestic and foreign websites
  2. Bypassing the mainland: normal domestic access, continuous single (or a small number of different) access to an overseas IP or domain name, there may be DNS plaintext queries of domestic and foreign websites
  3. Bypassing restricted websites: normal access at home and abroad, occasional single (or a small number of different) visits to a certain IP or domain name abroad, there may be DNS plaintext queries of domestic and foreign websites

For unknown (encrypted) traffic, it can only be said to dance at the tip of the knife, second only to the high risk and easy identification of plaintext HTTP.

Summary of the algorithm

You can refer to Stripe Rader’s risk control multiplication scoring rules to assign weights according to your actual situation

The reference scoring items are

  1. Server’s IP
  2. ASN of the server
  3. Other websites with the same IP server
  4. CDN provider identification
  5. domain name suffix
  6. Search engine indexing of domain names
  7. The real name of the domain name
  8. Domain name filing
  9. the registrar of the domain name
  10. Domain name registration time and validity period
  11. DNS resolution provider for domain names
  12. The relevant DNS resolution set of the domain name
  13. CNAME identification for DNS resolution of the domain name
  14. Primary Domain Keyword
  15. Keywords for subdomains
  16. Multi-period bandwidth analysis
  17. Actively detect the corresponding code
  18. Keyword Recognition for Actively Probing Corresponding Content
  19. Feature recognition of corresponding content of active detection
  20. Issuer of the certificate
  21. Issue time and validity period of the certificate
  22. The issuer of the certificate
  23. The shared domain name of the certificate
  24. Visitor’s IP
  25. Visitor’s DNS query
  26. Visitor’s historical behavior
  27. Visitor’s proxy access model
  28. unusually large flow

At the same time, traffic symmetry, corresponding content identification, proxy access model and other contents need further research by users.

Single-point breakthrough on cross-border dedicated lines

Because many people advocate how stable, anti-risk, not afraid of peak period and high wall period, in fact, the core issue, sellers are much more aware than buyers

As a cross-border dedicated line seller’s announcement stated:

In recent days, I have frequently received users’ concerns about IPLC disconnection and disconnection. Unified description.

**Cloud IPLC dedicated line (including forwarding), the intranet and public network are controlled by professional switching and routing equipment, the intranet is stable and free from attacks. IPLC inlet to outlet stable operation. We have repeatedly stated that there will be a protocol block between the user’s local and the entrance public network (the situation is that the entrance cannot be connected). In this case, if you change the local address and recover, 90% of the local operators will block you (including the same local and entrance. location). You can also use the mobile data connection test. In such cases, you need to change the protocol and replace the local ip to solve (encrypt, encrypt, encrypt the section from the local to the entrance public network, don’t think that using a dedicated line operator will not block you, don’t forget that you need to go to the public network from the local to the entrance .)

It just so happened that someone named Nathosts was advertised recently. I don’t know how to advertise it. In short, the result was that the line provided by his family was pulled out and could not be used. This is also the main risk of cross-border private lines: directly end the private line.

Because the single-point object is relatively obvious, the cross-border dedicated line must have a domestic entrance, and this entrance is very likely to have a real name, although the real name is very likely to be fake/purchased/not authentic (but there are also sales managers, other companies see The money is used for the dedicated line. After all, if you don’t play too much, it’s not a big problem to block your machine in time), but in China, it means that the probability is within the control range, such as computer room inspection, network police seizure, etc., even if N Layer forwarding also needs to have a domestic entry for users to connect to, and then there must be a public network detection path when users go to the traffic entry. For example, the classic case is the user of the HTTP plaintext request in the figure above.

About the wrong seal

Once the risk control system improves the level of risk control, there will inevitably be misjudgments, such as bill cutting, rejection, and payment that are common in Stripe and other foreign merchants.

Generally speaking, the restriction can be achieved by a variety of methods, and the step restriction is also the main behavior of the current GFW (except for the sensitive period).

For example, starting from irregular blocking of orientation, DNS pollution, IPPort blocking, domain name blocking, etc., and finally blocking IP and domain name.

As long as it doesn’t have a major impact, you can do it. At present, people are not afraid at all. GitHub’s RAW library, Cloudflare’s cdn/page/worker, and the main domain name of jsdelivr have been released for a while, and some have been released. If you want, just shut it down, and then add DNS pollution, domain name blocking, and high-frequency anti-fraud in Jiangsu, Zhejiang, and Quanzhou. Obviously, you don’t take foreign forces in your eyes (˵¯͒ བ¯͒˵)

end

Have a nice time ♂ Yue’s Children’s Day, and also think about how to guard against it, don’t wait to be directly put on the wall.

This article is reprinted from: https://www.blueskyxn.com/202205/6060.html?utm_source=rss&utm_medium=rss&utm_campaign=%25e5%259f%25ba%25e4%25ba%258e%25e6%25b5%2581%25e9%2587%258f% 25e4%25b8%258e%25e7%25bd%2591%25e7%25bb%259c%25e8%25bf%259e%25e6%258e%25a5%25e7%259a%2584%25e7%25bd%2591%25e7%25bb%259c%25e4% 25bb%25a3%25e7%2590%2586%25e8%25a1%258c%25e4%25b8%25ba%25e7%2589%25b9%25e5%25be%2581%25e6%25a3%2580%25e6%25b5%258b%25e4%25b8% 258e%25e8%25af%2586
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment