Data Crawling: How To Get To The Legal Side
With the recent Facebook data scandals and the start of the GDPR law, you might be wondering that: “is web crawling legal or not?”
Because after all, you have a business. And every business, no matter whether a SaaS startup, an eCommerce Store or any other service provider, needs access to valuable data.
Of course, accessing restricted, private and unauthorized data is wrong and will always be illegal. But what about the public data on the internet that’s freely available for anyone to access?
1. The product inventory list of your competitors on their website.
2. Prices of inventory items on display across various eCommerce stores.
3. Public reviews that mention your own business that are meant to be seen.
If that’s the kind of data you want to collect, is it still legal?
The answer, unfortunately, is not a simple yes or no.
To understand the whole picture regarding what makes data crawling services legal or illegal, we first need to take a quick look at what data scraping actually is.
Intro To Data Scraping: A Highly Common Activity That Takes Place On The Internet
Data scraping is the act of downloading a web page’s data and taking specific information from it.
For example, suppose you wanted to start your own movie streaming service. For that, you’d need data such as movie bio, cast list, the year it was released, it’s rating etc.
But there are millions of movies released since they came into existence.
What are you going to do? Manually write the bio, cast list and year of release by typing the information or copy-pasting it yourself?
You can easily use a web scraping service to extract that data from a public source like IMDB and automate the process of adding that information to your movie streaming service.
This means a data scraping service simply copies data from an existing source to a file or database of your choosing.
That’s its whole purpose. To copy-paste data from one source to another.
Why Are Data Crawling Services Accused of Being Illegal?
Just like everything else in the world, the activity of performing data crawling can be used for malicious and unethical purposes as well.
Here are some ways how:
1. It can be used to access and acquire private, unauthorized data that is not public
2. Data crawling can generally be done without asking for permission of the owner of data and in complete violation of a website’s terms and service
3. Data scrapers can put heavy loads on a website’s servers by asking for data more times than a human does.
However, it is important to note that your friendly data scraping service providers don’t do the above activities.
They are always careful of breaking any rules and laws because, in the case of a violation, it is the data crawling company that will suffer the consequences in the form of a cease and desist letter or a lawsuit.
This is one of the reasons why you shouldn’t be afraid of requesting data from them.
As a company, they’re more aware of what’s right and wrong and will tell you if web crawling is legal for the data you want – and if it’s not legal, they will politely refuse to collect said data and suggest alternative ways for you to get the information you want.
If you want to know what kind of data crawling these services are not allowed to collect, here is a list of reasons why your data scraping service provider might refuse to collect data for you:
1. If data is copyrighted: If the data you want to get collected is copyrighted, it can’t be scrapped because it is being ‘copied’. However, if the data is a creative work, then generally it is the format in which data is presented is what is copyrighted. If you are scraping ‘facts’ from the work and presenting it in an original way, that is allowed and won’t get you sued.
2. If the owner secures the data: You cannot get data collected that is secured behind some digital obstacles such as username/password or access code. Collecting that kind of data can risk in a lawsuit.
3. If the TOS (Terms of Service) explicitly says you can’t scrape data: If a website has written under its terms of service that data collection is not allowed, you risk being fined for collecting that data since it is done without the owner’s permission.
Why Your Data Crawling Service Is Within The Law
Have you heard of the legal battle of hiQ Labs vs. LinkedIn?
First, here’s an introduction of hiQ Labs in their own words:
“hiQ Labs is a data science company, informed by public data sources, applied to human capital.”
Basically, hiQ is a company that scrapes public profiles of LinkedIn members and turns that data into insights that helps companies identify key employees that are at the risk of being recruited away and to map skill sets of existing employees.
Last year, LinkedIn sent them a cease and desist letter asking hiQ Labs to stop scraping their data under the ‘excuse’ of the Computer Fraud and Abuse Act of 1986 (CFAA) and their terms of service.
The CFAA is an old law meant to stop hackers from illegally trespassing into private computers to access unauthorized, private data.
But here’s the thing:
hiQ Labs only scraps data from the public profiles on LinkedIn – data that’s easily accessible even without logging in LinkedIn.
This means that hiQ was in no violation of the CFAA law. It was however in violation of their terms and service.
But since hiQ was accessing only public data that LinkedIn shared with other people and services as well, the judges ruled in favor of hiQ, the data scraping company.
The biggest lesson to take away from this case is that scraping data that’s publically available on the internet is an absolutely legal practice and will not get you into trouble.
You can learn more about this case by searching on Google. Hopefully, after a short time, your question i.e. is web crawling legal or not – will be answered.
So if you need valuable data and insights that can help your company grow – or if you need data that you want to use to provide more personalized service to your customers, don’t be hesitant and ask your data scraping provider to collect that data for you.