Disclaimer: this is through a legal process under the review of a content researcher, all I'll be saying shouldn't be taken as a legal decision but the outcome of the research I made. As such, an Attorney should be considered for more clarification.
Let's first get familiar with the two terms below:
What are web scraping and web crawling?
1. Web Scraping
This is a form of downloading a Website's data automatically, thereby extracting the needful data from it. The data extracted using this means may be stored anywhere such as databases, files, sold or transferred etc.
Example: web scraper used to extract a weather forecast information from the national weather forecast centre in order to perform some further analysis.
2. Web Crawling
This is also a means of a downloading Website's information using an automated process in order to follow and extract the hyperlinks it contains.
Data collected using Web crawling are stored within a database or index for easier accessibility.
Example: Googlebot that crawls and index web pages on the google search engine
I believe you can now see the difference between a Web Crawler and a Web Scraper.
So lets get back to our main topic (web Scraping) and find out why it is perceived as illegal.
Why does web scraping look illegal?
Web Scraping is looking more illegal especially with the increase in more participation.
Some of the reasons are:
1. For business motive
A good example is an email extractor software. Email marketers sometimes with the help of an email extractor software secure thousands of email addresses for marketing purposes. With the increase in demand for email addresses, the use of email extractor software is increasing. Some known providers include Google (email extractor software plugin for Chrome browser plugin) and webemailextractor.com etc.
2. Copyright Infringement
Email scraping is sometimes used without a regard to the terms and conditions of the source webpage.
Website scrapers sometimes tend to abuse the method more often especially the amount of requests they send to the target website. If the requests are more heavier than the human traffics, a denial of service could occur.
Is web Scraping legal or Illegal?
You can't finalise that it's totally legal or illegal but on individual circumstances. Besides you can scrap your own website.
However, crawling of web pages not belonging to you is where the issue comes, as you need to seek a permission and be familiar with their terms of services on most grounds.
Although, not many websites have the time to sue you for Scraping their pages but few could do or block your session.
Case study: LinkedIn v. Doe Defendants.
When LinkedIn sued 1-100 people for anonymously Scraping their website with the following charges.
- Violation of California Penal Code.
- Breach of contract.
- Violation of the Digital Millennium Copyright Act (DMCA).
- Violation of the Computer Fraud and Abuse Act (CFAA).
The outcome of the trial is worrisome considering the number of charges raised just for a website Scraping.
When you look at it through the legal perspective, legal jargons could be adopted to get someone convicted even when technicalities and common sense had proven otherwise.
How to do in proper way?
- Use a reasonable crawl rate i.e about 1 request per 8-15 seconds
- Adhere to their Terms of Service (ToS).
- Respect the rules of their robots.txt files.
- When the terms of service or robots.txt file does not favour you to go ahead, seek for the site owner's permission.
- Do not totally believe advices coming from the Internet ( this one inclusive ), until you seek the advice of a qualified Lawyer.
I hope you understand the fact that web Scraping isn't illegal at all, but you need to follow the right steps to do it the proper way.
Knowledge is the key.