+1 (718) 878-4993 services@datahen.com

Major Victory for Social Media Scraping

Major Victory for Social Media Scraping

In a recent United States district court case, a Federal Judge in northern California has ruled that companies can harvest publicly available user supplied data on social media sites. Judge Edward M. Chen defended the interests of the public and the free flow and use of information on the open web as part of a copyright lawsuit between hiQ Labs, Inc. and the LinkedIn Corporation. This aspect provides many opportunities.

It enables data scientists, media analysts, and developers to continue to access publicly available data without burdensome technical or legal restrictions from social media sites. This is great news for companies who have the foresight to develop innovative products and solutions with the sea of public information that social media brings to the internet.

Scraping isn’t Hacking

hiQ is a startup that has produced a software using innovative algorithms which determine when someone is seeking new employment on LinkedIn. These determinations are based on information that users provide or change in connection to their use of their public LinkedIn profile.

LinkedIn took issue with hiQ’s acquisition of data made publicly available through their service, leading them to take measures to reduce or restrict their activities. hiQ took the case to court to defend their data harvesting activities as an integral part of their business model.

When hiQ scrapes publicly accessible data from certain areas of LinkedIn, it aggregates this data into a report that is sold to employers. With these reports, employers can gain a certain level of insight about which of their employees might be seeking new employment.

LinkedIn viewed hiQ’s acquisition and use of this data as a violation of the CFAA (Computer Fraud and Abuse Act). The CFAA is a piece of legislation aimed at fighting hackers and was signed into law back in 1986. Based on this, LinkedIn issued a cease-and-desist letter to hiQ admonishing their activity and cautioning that such activities could expose them to liability under the CFAA. When the case went to court, LinkedIn pursued this angle and claimed in court that hiQ was indeed violating the CFAA by accessing servers which hosted their data and illegally harvesting it.

The court provided a narrow interpretation by putting public websites on the level with a publicly accessible storefront business with open signage. According to which, when open for business, those who venture into the store are not viewed as trespassers. With this opinion, hiQ’s web scraping bots were not intruding on LinkedIn’s servers, but had as much right to that information as any other member of the public. The ruling was that hiQ has the right to scrape any information that is publicly accessible. That is as long as it isn’t contained on a password protected webpage.

If it’s Public, Then It Can Be Scraped

The northern California Federal District Court explained “A user does not “access” a computer “without authorization” by using bots, even in the face of technical countermeasures, when the data it accesses is otherwise open to the public.”

LinkedIn’s argumentation opposing hiQ’s bots under the assertion that they breach the privacy of the users of LinkedIn was decimated by Judge Chen. The court, on this topic, responded “LinkedIn argues that it acted solely out of concern for member privacy, but, as discussed above, that argument is put in question by the fact that LinkedIn itself makes user data available to third parties.”

Judge Chen explains that if the CFAA were to be applied in a fashion according to the manner LinkedIn seeks that this would be “Conferring on private entities such as LinkedIn, the blanket authority to block viewers from accessing information publicly available on its website for any reason, backed by sanctions of the CFAA, could pose an ominous threat to public discourse and the free flow of information promised by the Internet.”

This ruling reflects Judge Chen’s interpretation of the CFAA in a way that protects data harvesters from legal liability on the grounds that such activities restrict themselves to data that is generally open to the public. Data that is hidden behind a password authentication process, on the other hand, does not fall into this category.

In this case, LinkedIn was prohibited from “preventing hiQ’s access, copying, or use of public profiles on LinkedIn’s website (i.e., information that LinkedIn members have designated public).” In addition, they were prohibited from enacting any sort of mechanism, whether of a technical or legal nature, and to remove those already in place. Therefore, hiQ is allowed to harvest data from LinkedIn profiles that can be accessed without actually logging in with a user name or password.

While this ruling exists as a preliminary injunction, it serves as protection of the rights of the parties while the case progresses. The Ninth Circuit Court of Appeals holds jurisdiction over this case, and it ruled on a similar case involving Facebook last year. For this reason, it is possible that the Ninth Circuit could apply a different interpretation, to the detriment of hiQ. A cease-and-desist letter from Facebook to a data scraper in 2016 actually resulted in their being found liable for a violation of the CFAA.

Meanwhile, Judge Chen sees merit in hiQ’s claims on the basis that the company that was scraping Facebook was mining data from password-protected portions of the site. While this was done with the permit of the password holder, it was nevertheless frowned upon by the court. In hiQ’s case, their data scraping activities are limited to publicly accessible information and do not directly breach the privacy of users by collecting privileged information.

 

web scraping

 

SUBSCRIBE

Submit a Comment

Your email address will not be published. Required fields are marked *