Research on Privacy in the Internet

Original link: https://mabbs.github.io/2022/07/16/privacy.html

There is no right to protect privacy, we can protect it ourselves

cause

Some time ago, it can be said that it was the era of box leakage. Various organizations and companies are scrambling to send out their own user privacy data, so that to a certain extent, it can be said that people who are not prepared will default to real-name Internet access. (I’m not ready either ?) . But in fact, if you are prepared, unless the other party has great ability, you will not be able to find people who actually operate on the network. So this time I want to talk about the privacy protection scheme in principle.

Premise of protection

Because it is too easy to disclose privacy, it is generally difficult for us to cover everything if we want to protect it. We can only protect individual identities or behaviors. If we really want to ensure complete privacy, Then I’m afraid I can’t stay on the earth and go online. Therefore, it is more difficult especially in places like China. After all, there are many real-name things, and it is very simple to associate them.

Reasons and Channels of Privacy Leakage

For reasons of disclosure to ordinary visitors

Generally speaking, it is difficult to disclose privacy if you just visit the website. After all, the main way to obtain private information is to associate with key information, such as mobile phone number, email address and IM account, etc., which are convenient to associate with the contact information of real people, so for other access This information is inherently invisible to the reader. Generally, the information that can be leaked to visitors is basically only what is written on the website that can disclose information, such as registering an account, posting, sending a message, etc. Of course, generally speaking, the reason why ordinary people open the box is because Sending something will attract other people’s attention, and if you don’t send something, there is no goal.

In this case, the leakage association is very simple and easy to avoid. Generally, this is to check the sent information and public personal information. Maybe not everyone will disclose their mobile phone numbers, but most IM accounts will be disclosed, maybe it is Tencent’s system is not very secure. There are quite a lot of social engineering databases such as QQ number checking and binding to mobile phones, especially when it takes a long time. The more commonly used QQ number can be said to have leaked 100% of the mobile phone number. It is easier to check people by mobile phone number. For ordinary people, you can find everything in terms of transfer inquiries and inquiries through express order leakage. I have also checked myself, and it can be said that it is easy to find out, plus I The commonly used username, I can only say that I am actually surfing the Internet under my real name?.

Of course, in addition to the above-mentioned ordinary social engineering solutions, there are also more advanced technical solutions, such as MITM attacks, phishing attacks, and third-party reference leakage, etc., but these difficulties are still quite big, mainly because the current encryption technology is mature, which can be greatly improved. The difficulty of these attacks.

For reasons of disclosure to webmasters

As a person visiting on the Internet, you are naturally aware of any visit to the target website by the webmaster. After all, your visit must be known to the website program, otherwise how can the website program return the correct information to the user ? In this case, more information is leaked, and the defense is more difficult. The first thing to leak is the IP address. After all, when visiting a website in the TCP protocol, the data must know the visitor’s address in advance, just like the visitor can easily know the IP address of the website. For most webmasters, the private information that IP addresses can know is not very much. The most important thing is to know which city you are in. No matter how accurate the estimation is, it will be up to the community. In the era when IPv4 has been allocated, most operators NAT is used, so there may be many users under one IP. Of course, it is different for operators. They can accurately find the physical location of the visitor according to the time of address allocation and the port used. By the way, if this website allows other users to use third-party images and other resources, it can also obtain the visitor’s IP.

In addition to the IP, there is also the information entered by the user himself, such as information that is invisible to other visitors, it is naturally clear to the website administrator, this is really too simple, in fact, the same is true for phishing websites It stands to reason that the information that an unsuspecting person may enter is completely accurate, so that the data leaked when the website is compromised is also accurate.

The above information can be said to be some explicit information, in addition to some implicit privacy information, such as access habits, access time, and terminal information accessed, etc., although these information are taken out separately. It doesn’t make sense, but it is possible to locate a person by sorting it out in a unified way. Like the precise placement of many advertisements, this is the principle. Moreover, these implicit private information will be more difficult to remove, and relatively more difficult to use.

Causes of leaks against ISPs

As the connecting ISP between the user and the website, the way they obtain private information is equivalent to MITM. When the data was not encrypted in the past, all the information was completely transparent to the ISP, and there was no privacy. Fortunately, mature encryption technology can effectively prevent them from obtaining user information. But even in the age of encryption, improper handling can still reveal your own information. First of all, we need to know what they will know. Although the content accessed may be agnostic to the operator, in most cases people do not use encrypted DNS these days, so the domain name of the website that the user visits will be leaked in the first place. In addition, even if encrypted DNS is used, unfortunately, TLS1.2 and previous versions have a feature called SNI, which is completely plaintext, and the operator can also know the website you visit based on this. In addition, it is the IP of the access. Generally speaking, the current statistics can easily know which IP corresponds to which domain name. Fortunately, there are CDN companies such as Cloudflare. Many websites point to the same IP, so that it is impossible to infer the user based on the visited IP. Which website is visited (of course, the CDN company itself must know clearly, including the content of access and input and output), so operators have almost no way to know which website the user is visiting only through IP.

Solutions to privacy breaches

identity forgery

First of all, for the above problems, we can find that most of the privacy leaks are caused by the users themselves, mainly the information they input. Generally speaking, if we encounter some websites that must enter our own information, we can set up a complete virtual network identity in advance. Chinese mobile phone numbers are real names. If you use a mobile phone number that is not a real name in other countries, the operator can still locate it through the base station, while many other services in China must use the mobile phone number to log in. This is the so-called front desk anonymous, backstage real name. Of course, Chinese services cannot be used. We can also consider using foreign ones, such as Google Voice. The mailbox is relatively better. Registering an mailbox is not a complicated thing. Although bank cards are difficult to fix, we should not need bank cards for most websites. If you really want to pay, you can consider digital currency. Many foreign payment platforms also support it now. After the above information is ready, we only need to fill in the information we have prepared where we need to use the private information. In addition, it is best to use a password manager for passwords. All website passwords use random characters. The best device to save passwords is to use hardware keys, and the one that is easy to destroy to avoid phishing incidents.

link forgery

In the above situation, we can clearly see that in addition to the private information leaked by ourselves, a large part is caused by active access. There are many solutions to this situation. The more common one is to use a proxy. Of course, a layer of proxy is not enough. , the first layer of proxy is actually similar to the ISP without it, and it can be easily associated. It needs to be connected at least two layers, so that the IP during access can be decoupled from the IP actually used, otherwise the operator will check the target. Visitors to a website accessing IP can know who is visiting by looking at the traffic.

Of course, it is difficult for people like us who don’t have much money to want to build multiple layers of proxies. After all, a server is not cheap. At this time, we can use Cloudflare as the middle layer (although Cloudflare is easy to track), so that the ISP’s Tracking is very difficult, basically you can only find out that the user is visiting a website protected by Cloudflare, and know nothing else.

If you don’t even have a server, you can use Tor. This network is equivalent to a multi-layer proxy maintained by a community. Usually, because there are many people using it, the security may be better than that of your own. Of course, if possible, you can stack your own server with Tor, which can also avoid some websites discriminating against Tor exit nodes and possibly encountering the problem of honeypot nodes. In fact, I2P is more secure in comparison, but the experience… it is a bit of a stretch, no matter how privacy we have to consider the experience.

In addition, if you want to be safer, it is best to buy a server that is used as a proxy and also hang Tor, and use digital currency when paying to avoid the problem of betrayal by the server or information leakage.
Another very important problem is DNS and SNI. To solve this problem, first of all, DNS has to go through a proxy, but that’s fine. There is another kind of DNS called DoH that can also ensure security to a certain extent, at least the operators can’t see it. It’s here, but DoH’s service providers can still see it, so it’s better to go through an agent. For SNI, it is mainly because some websites do not support ESNI or ECH, so there is no way. Anyway, as long as the exit node is not associated with itself. In addition, it is best to use a transparent proxy instead of a system proxy, because some software does not want to use a proxy, such as when using a function with penetration, such as live broadcast or voice calls, if it goes through the website server, it will cost more traffic, so in order to avoid because of The proxy failure caused by these situations, using a transparent proxy is a better choice.

falsification of conduct

This is the more complicated part, because it may cost more to do it. In order to prevent website administrators from knowing the behavior of visitors, we generally need to do some other operations in addition to the above operations, such as the need to use a pure system, and cannot operate anything related to reality on this system, and at any time A transparent proxy must be hung, and once an incorrect operation is done, it must be destroyed in time. It is said that Subgraph OS is very good (it is best not to use Windows), sometimes administrators can find your associations through your inattentive places, so it is best to use a virtual machine or another physical machine, my opinion is to use a tree The Raspberry Pi is very good. The system is equipped with a TF card. When necessary, the card is directly pulled out and folded and destroyed. It is very convenient. The memory is burned with a lighter and the data disappears (dangerous).

Using the above method can avoid privacy leakage caused by accessing the environment, but there is still time. For example, if I want to publish something, the time is generally when I press the publish button. Your home window sees the article appearing on his screen the moment you press enter, so it’s easy to find your relevance to what you posted, so if you really want to consider it, it’s a matter of time. It is also very important. For example, when I want to send this article, I don’t want others to know when I sent it. I can write a script on my server and automatically commit&push it at a certain point in time, so everyone I don’t know when I sent it. Comments are similar, but the operation is more complicated, and browser automation tools may be used, which is more troublesome to operate.

Summarize

From the above, it is still very difficult to achieve perfect privacy, so the whole process of privacy is basically unrealistic on earth. If you just want privacy when publishing or viewing certain information, you can use the above scheme. We can make up an identity, install a Linux system you like on a Raspberry Pi, go to a coffee shop, connect to public WiFi, and use numbers on it. The currency is linked to Tor to buy a VPS at the server provider, then use Tor to connect to it, set up a proxy on it, and then use this Tor + proxy to access the website, register with the edited identity, look at the location you want to publish, and then use the VPS to write a script to select Send out what you want to say through an automated script at your favorite time. When necessary, you can find an incinerator to throw the Raspberry Pi in and destroy it, and our privacy campaign is over.

This article is reprinted from: https://mabbs.github.io/2022/07/16/privacy.html
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment