The five-year lawsuit finally lost: crawling LinkedIn data is “completely legal”, and the army of trillions of reptiles is ready to move

The internet crawler wars never end.

This can be regarded as a landmark ruling in the history of the reptile struggle. On Monday, a U.S. court ruled that data analytics firm HiQ upheld its case against LinkedIn and found that the collection of personal data from public websites was entirely legal.

LinkedIn is a professional social networking platform under Microsoft. Users can create personal profiles on the LinkedIn website, including educational background, work experience, skills and other information. HiQ is a data analysis company that crawls public data from LinkedIn, organizes and analyzes it, and sells the processing results to related companies.

LinkedIn owns the data, but the data itself is provided to LinkedIn by users. In the era of big data, some Internet platforms have accumulated a large amount of user data and established their own resource advantages: in the competition with other Internet companies and platforms, the more user data the better, the easier it is to attract more users. thus in a more favorable position. This snowballing effect makes Internet companies often regard data as a core asset in competition.

The lawsuit has been fought two or three times, and finally it is biased towards the public interest

Before the case, the data was available to anyone who visited LinkedIn’s website. LinkedIn sent HiQ a cease-and-desist letter after data analytics firm HiQ had been crawling LinkedIn’s website data for a long time, citing the Computer Fraud and Abuse Act (“CFAA”) in the letter .

In 2017, HiQ preemptively, as the plaintiff, sued LinkedIn to prevent it from copying the public profile of LinkedIn users through legal, technical and other means, and also applied to the court for a temporary injunction.

Although HiQ has implemented a web crawler on the LinkedIn website, the judge of the US court believes that this kind of crawling behavior does not violate the law, because the data on the LinkedIn website is public data. For public data, even if it violates the robot agreement set by the other party, it should be is permitted by law.

It’s like pushing into an unlocked store during the day, and it doesn’t count as trespassing. Therefore, the court not only did not find HiQ’s crawler behavior illegal, but even found LinkedIn’s anti-reptile technology illegal.

The magistrate in charge of the case granted HiQ a preliminary injunction preventing LinkedIn from interfering with HiQ’s data scraping efforts during the trial. Judge finds that the Computer Fraud and Abuse Act, which criminalizes access to a protected computer “without authorization” or “beyond granted access”, does not apply to HiQ’s collection of publicly available data from LinkedIn’s website .

Faced with the unfavorable situation, LinkedIn chose to appeal. As early as 2019, the Court of Appeals upheld the lower court’s ruling in HiQ v. LinkedIn in 2017 that web crawling was not “unauthorized access to protected computers”, and the ruling still upheld the original judgment . LinkedIn again opted to appeal. But two years later, the Ninth Circuit still sided with HiQ and sent the case back to the Northern District of California.

LinkedIn, of course, refused to accept this and subsequently appealed to the U.S. Supreme Court. In March 2020, LinkedIn asked the Supreme Court to review the Ninth Circuit ruling. The company defended that its use of technical means to block web crawling and to send a termination notice letter should be deemed to meet the requirements of normal authorization mechanisms. In fact, as a social media site owned by Microsoft, LinkedIn has been working hard to prevent the results on the site from being directly viewed by the outside world, but it does not want to cut itself off from search engines due to excessive closure.

LinkedIn’s lawyers wrote in the Supreme Court complaint, “According to the Ninth Circuit, unless the site is completely blocked with a password mechanism, any business that decides to partially disclose the content of the site – including Ticketmaster, Online retailers like Amazon, and even social networking platforms like Twitter — will be exposed to invasive bots deployed in batches.”

“Once you choose to block with a password, the website will not be able to be retrieved normally by search engines, resulting in people not being able to discover the information through the most important information acquisition channels on the Internet.”

On June 3, 2021, the U.S. Supreme Court narrowed the scope of the Fraud and Abuse Act in a similar case, Van Buren v. U.S. Government. Nathan Van Buren is a Georgia police officer with authority to search computer records of license plates for law enforcement purposes. He fell into the trap of the FBI and searched the records for private purposes (at the request of an FBI informant who offered to pay thousands of dollars for the information). In the end, the US court sentenced him to 18 months in prison. The bill has been criticized for not clearly defining “unauthorized” and “beyond authorization”.

The U.S. Supreme Court said in Van Buren that a mere violation of the terms of service did not meet the “beyond authorization” condition set forth in the Fraud and Abuse Act. However, the U.S. Supreme Court has yet to give a clear answer on whether credential-based locking mechanisms are sufficient as the only way to determine “unauthorized” access.

Two weeks later, the U.S. Supreme Court decided to send HiQ v. LinkedIn back to the Ninth Circuit, hoping to re-examine the scope of the Fraud and Abuse Act in conjunction with the precedent of Van Buren. However, from the results, although the Court of Appeal referred to the precedent of Van Buren, it finally made a ruling to maintain the original judgment two years ago.

In its ruling, the Ninth Circuit noted that “an essential characteristic of public websites is that publicly visible portions of them are not restricted; in other words, those portions will be open to any visitor with a web browser.”

“That is to say, if these computers hosting public pages are considered houses, then the public website equipment is not set up with any “front door” at the beginning of deployment, and naturally there is no such thing as raising or lowering the access threshold. Therefore, the Van Buren case strengthens We found that the concept of “unauthorized” does not apply to public websites.”

However, the court ruling did not resolve the grievances between HiQ and LinkedIn, but simply prohibited LinkedIn from continuing to interfere with HiQ’s collection of its public website data, and said it did not support claims against HiQ’s analytics business under the Fraud and Abuse Act. The real core issues behind the case, such as unfair competition and privacy violations, have not yet been resolved.

In an emailed statement, a LinkedIn spokesperson said the company would not drop its lawsuit and would continue to seek a reasonable outcome in court. “We are disappointed with the results, but this is a preliminary ruling and the case is far from over. We will continue to work hard to protect LinkedIn members, especially their ability to control their personal information on the site.”

Impact of the case

Data scraping is now widely used in social life, not only in commercial use, but also in academic research and so on. Therefore, the judgment of this case has also received great attention. The ruling in the case was cheered and praised by the U.S. media, which saw the Ninth Circuit’s decision as a “major victory” for archivists, academics, researchers and journalists.

The case also touches on the debated attribution of data and privacy to some extent. From the perspective of the Ninth Circuit Court of Appeals, its ruling supports that the user is the owner of the data, and that the platform only uses the data under the user’s authorization, rather than fully owning the data.

On Reddit, netizens ridiculed the LinkedIn spokesperson’s explanation for the appeal: “Such an explanation is presumptuous, if not absurd, users who provide data never get feedback from the platform”, “The claim to protect customer privacy. Exaggerated”, “Who would now believe that such an explanation makes sense?”…

On the other hand, data scraping is also an important part of the modern Internet ecology. According to Akamai’s statistics, nearly 40% of the global Internet traffic is occupied by crawlers. In the second quarter of 2021, the number of crawler attacks worldwide reached 70 billion, a year-on-year increase of 15%. The ruling of the US court also means that it is in line with US law for tens of billions of crawlers to crawl public information from online retailers and social networking platforms.

The laws of China and the United States are different, and crawler technology should be used with caution

Perhaps it is precisely because of the important status of data that in recent years there have been endless disputes about data at home and abroad. In China, there are not a few cases of unfair disputes caused by the behavior of reptiles. DeHeng Law Firm once published an article titled “Climbing into “Unfair Competition” Bugs, It’s Expensive.” In the article, they said that they searched and screened the magic weapon of Peking University with keywords such as “crawlers”. There are 49 reptile-related cases since 2016, most of which are criminal cases, involving copyright infringement, illegal business operations, infringement of citizens’ personal information, fraud, extortion, etc., as well as some civil and commercial law cases, mainly involving Copyright and unfair competition disputes.

One of the typical cases is the Dianping v. Baidu case.

In 2016, Baidu used a large number of crawlers to capture the comment information of Dianping, and displayed it in Baidu Map, and was later sued by Dianping to the court. The court held that Baidu’s actions violated generally accepted principles of business ethics and good faith, and constituted unfair competition.

In the second-instance judgment of Dianping v. Baidu, the judge clearly pointed out: “In a free and open market economic order, business resources and business opportunities are scarce, and the rights and interests of operators cannot be protected as strongly as legal property rights. The owner must properly tolerate the damage as a result of competition. In this case, the protected interests claimed by Hantao are not absolute rights, and the damage does not necessarily mean that legal relief should be obtained, as long as the competitive behavior of others is justified in itself, the conduct is not blameworthy.”

While technology is neutral, there are boundaries to technology applications. At present, the data ownership of the platform cannot be clearly defined, so the process of defining legal responsibility is still relatively complicated. Therefore, with the development of Internet technology, the word “reptile” has gradually taken on a “pejorative” color in the Chinese context.

For programmers who write web crawlers, if they crawl data that should not be crawled, there is a possibility of breaking the law. The existence of the joke of “well written reptiles eat early in prison” also shows that we need to be cautious about crawling technology. Just like the LinkedIn platform, there are generally two options for obtaining public data: using crawler/scraper (free but risky), using API (not free but safe), if we must use these public data, we need to make a careful choice.

Reference link:

https://www.theregister.com/2022/04/19/scraping_public_data_linkedin/

https://news.ycombinator.com/item?id=31075396

“Where are the boundaries of data scraping? “: http://rmfyb.chinacourt.org/paper/html/2020-03/19/content_166271.htm?div=-1

“Climbing into the “unfair competition” bug is expensive”: http://www.dehenglaw.com/CN/tansuocontent/0008/023370/7.aspx?MID=0902

The text and pictures in this article are from InfoQ

This article is reprinted from https://www.techug.com/post/five-years-of-lawsuit-finally-lost-crawling-to-get-the-uk-data-is-completely-legal-and-the-trillion- reptile-army-is-ready-to-move.html
This site is for inclusion only, and the copyright belongs to the original author.

The lawsuit has been fought two or three times, and finally it is biased towards the public interest

Impact of the case

The laws of China and the United States are different, and crawler technology should be used with caution

Leave a Comment Cancel Reply