Smart Ways of Image Scraping for Web Harvesters
Gone are those days when a picture was worth a thousand words. Now a picture is worth a few hundreds of likes, thumbs up, tweets and shares. And the more- the better. People generally are extremely protective about finding their images used online without permission.
This blog post aims at explaining why web harvesters scrape images from websites, to what extent is that legal as well as industries that can benefit from scraped images. In addition, I will also illustrate how to use copyrighted images in a way that is legally right while still being respectful to the creator’s ownership rights. Shall we?
Image scraping is probably one of the most neglected subjects in the computer vision field. Meanwhile having that skill under your tool belt can give you many advantages whether your intention is to build an image search engine, to find relevant images in the collection of photos, develop computer vision application or larger projects like scraping product images from e-commerce websites. It all starts with the images themselves.
And where do these images come from? Well, if you’re lucky, you might be able to find the images you need in an existing image database like CALTECH-256, ImageNet, or MNIST. But as many of you know, it’s hard to find a dataset that will suit all your needs or perhaps you are given a task to create your own custom dataset. In that case, God bless the image scraping, as it’s the only solution you are left with.
Typically competitor websites from industries like e-commerce, hotel & travel, real estate, event planning, etc. benefit most from sustainably-scrapped images. However, scraping images in bulk might be an issue: it is always best to be reasonable about the volume of the images you are going to use that are obtained using this method.
Every online creator has a very vivid idea about the values of using images to get the reader’s attention, flavor a commentary with a visual component or illustrates a point of view using infographic benefits as well as the difficulty to come across the perfect image that can prove a point. Having the correct image and being able to use it in the right place can boost your post in a magnificent speed helping to create a story-line that could not be created with just words.
Unless of course, you are presenting images taken by you, you will have to use those created (and therefore owned) by someone else. The number of sources you can scrape an image from is unlimited. As any scraping beginner will say, if the image is online- it is scrapable. While generally you will not be allowed to use copyrighted work without owner’s authorization, there is one tiny but important legal construct with the help of which millions of people see and share images online- it is called fair use.
Main factors assessed to determine fair use of an image usually include
- the purpose of use which can be nonprofit, educational, scholarly or research use;
- market effect– if there is no perspective effect or it was not possible to obtain permission to use the image
Besides these factors, it is worth considering that images of people may involve rights of privacy, state/federal laws. To avoid this legal maze consider using photographs of people taken in larger public scenes; avoiding images of famous people especially engaging in private activities: Beware- publicity rights limit commercial uses.
When you have the slightest doubt whether the scraped image is copyrighted or not, it is better to play safe and ask for appropriate permission. You will be surprised to find out how many people are OK with allowing their website’s images to be used by a non-competitive source. Always make sure it is covered by protected purposes of fair use or seek your attorney’s advice (especially when there is significant investment in your project). Fair use might allow you to use copyrighted images but it’s not worth having your site taken down if the owner disagrees as there are no significant cases that establish clear-cut rules when it comes to the fair use of images on the internet.
It is essential to understand that people like photographers and graphic artists license and sell their work for a living and by scraping and using images left and right we are interfering with their legal right of ownership and distribution of the result of their hard work.
Fair use laws were created long before digital communication made the sharing of images this easy. While it is unlikely that an average blogger will be sued for using a copyrighted image, but be careful not to become an exception of the rule.
If you have decided to be “smart” and try to avoid the law by making changes in the copyrighted image, it most certainly won’t work. Altering the image by cropping, adding a text, applying editing software doesn’t negate the existing copyright, neither will giving credit to the owner. Copyright law gives the copyright holder the right to decide where their work is published and maybe they don’t want their work on your site, in your book, included in your newsletter or distributed to your social media network. So don’t assume that a simple “shout out” or link back will do the trick.
In conclusion, as convenient and easy it is to scrape images from websites or hire a service that will do it for you, it comes with its share of responsibility and legal issues that cannot be neglected. One thing is clear, if you respect the copyright law and try to stay away from using the images to benefit a competitor source, image scraping, as well as website scraping, on the whole, can be a valuable skill for your business.
What issues do you think can be associated with scraping images from websites and what can be done to avoid them? Comment below and let us know!