Friday, 3 May 2013

Web Scraping: How It Affects Your Site (and Business)

Web scraping is when a site is "scraped" or mined of content to be reposted on another site. Read the glossary definition of Web scraping. Essentially, Web scraping is stealing.
How Your Content is "Scraped"

There are really just two ways that your content will be scraped.

    Manually - by simple copy and paste by one of your readers
    Automatically - by a tool or program (commonly called a "bot") created to crawl the web and harvest all content that fits within certain parameters

How to Protect Your Content

Although there are a number of tools and applications to help limit or even prevent site scraping, there really is no way to stop it.
Technical Ways to Slow Down the Web Scraping Bots

    Block an IP address
    Block bots with tools like CAPTHCA services that verify a human is the operator
    Commercial anti-bot services
    Well written JavaScript and robots.txt files can limit entry by many bots

The Problem: There is a way around every technical block. And there is no way to stop a reader from simply copying and pasting your carefully crafted blog post and publishing it on their own site.
The Only Real Way to Beat the Web Scrapers

The best thing to do, is include site links within the text copy, so when they copy it, it will actually send traffic back to your site. When they copy/paste the post, they almost never remove links ... so with in-copy links you'll actually benefit. Who can't benefit from new in-bound links and traffic? A little SEO help never hurt anyone.

To discover my articles and blogs posted all over the internet used to fire me up. But there really isn't any need to worry about it. As long as you publish your post first, Google will index your post as the original and theirs as the copy or duplicate content.

My content gets copied all over - sometimes its a compliment - other times they are trying to benefit from our content - but either way its impossible to stop it. Even though you have the legal right to your content, it is too much work to actually address it.

Some bloggers and writers ask readers not to copy - or to at least give attribution back to the main site. While this might work sometimes, the fact is that most web scrapers don't really care about polite requests. That's why I like to take matters into my own hands and embed numerous links into each piece. Not only does it do wonders on my sites, it also helps balance the scales when a web scraper lifts my content and publishes it on their own site.

Source: http://onlinebusiness.about.com/od/searchengines/a/Web-Scraping-How-It-Affects-Your-Site-And-Business.htm

No comments:

Post a Comment