+1 (718) 878-4993 services@datahen.com
Smart Ways of Image Scraping for Web Harvesters

Smart Ways of Image Scraping for Web Harvesters

Gone are those days when a picture was worth a thousand words. Now a picture is worth a few hundreds of likes, thumbs up, tweets and shares. And the more- the better. People generally are extremely protective about finding their images used online without a permission. This blog post aims at explaining why web harvesters scrape images from websites, to what extent is that legal as well as industries that can benefit from scraped images. In addition, I will also illustrate how to use copyrighted images in a way that is legally right while still being respectful to the creator’s ownership rights. Shall we? Image scraping is probably one of the most neglected subjects in the computer vision field. Meanwhile having that skill under your tool belt can give you many advantages whether your intention is to build an image search engine, to find relevant images in the collection of photos, develop computer vision application or larger projects like scraping product images from e-commerce websites. It all starts with the images themselves.  And where do these images come from? Well, if you’re lucky, you might be able to find the images you need in an existing image database like CALTECH-256, ImageNet, or MNIST. But as many of you know, it’s hard to find a dataset that will suit all your needs or perhaps you are given a task to create your own custom dataset. In that case, God bless the image scraping, as it’s the only solution you are left with. Typically competitor websites from industries like e-commerce, hotel & travel, real estate, event planning, etc. benefit most from sustainably-scrapped images. However, scraping...
4 Factors to Remember Before Scraping News Websites

4 Factors to Remember Before Scraping News Websites

In our increasingly digital age, more and more businesses are conducted online. Oftentimes, a company’s website is the main conduit through which consumers keep contact with the company. This is especially true for news websites, as it’s an area that was digitized the most in the past years. The decline in print sales and advertising has put a significant pressure on news groups and companies to try and find new sources of income. The initial spike of incoming revenues from digital advertising has been plummeting year after year connected with low mobile costs. Although on the one hand this means greater volume and accessibility for customers, one the other hand it means greater risk and growing complexity of content “stealing”. In this blog post we will talk about news website scraping practice: the purpose and intentions of it; outline legal ways of obtaining content and the importance of staying a “good web scraping citizen”. Most people think that you need to learn programming language to start scraping news websites for information, but it’s not necessarily true.  The first thing you will learn as a journalist is that you can have 2 competitive advantages over your colleagues- being able to work faster and managing to get more relevant information than others. In both cases- scraping is the quickest solution. Of course, using a website’s API can be an alternative option. In fact, if all you need is to interact with the system, APIs are great because almost every major system you come across has a developed API or at least intention to develop one, but if your main purpose is...