I’ve been a loyal reader of Paul for quite sometime now and i’m really happy he made it to Yahoo! as an intern. I was digging my RSS feeds today when i spotted Paul’s post on the new Yahoo! Social Media Podcast on Yodel Anecdotal. Among the obvious talking on Social Media, Richard Crowley suddenly said: “The Internet doesn’t forget does it?” and Paul agreed.
The short dialog went like this:
RICHARD CROWLEY: The Internet doesn’t forget does it?
PAUL STAMATIOU: That’s right.
DOREEN BLOCH: No, it doesn’t. Go ahead.
I was shocked by the statement and I’m afraid i have to disagree with all three of them. The Internet does forget if we want it to.
Remove sites or individual URL from search index
It’s quite easy to remove a web site or individual pages from a search engine’s index. Google offers a service for removing a URL from its index. There’s also the option to remove Usenet posts from Google Groups. Google states that “URL removal system will be removed from the Google index temporarily for six months” which gives plenty of time to vanish anything such as outdated URLs or embarrassing pictures.
On the Yahoo! campus, there’re easy to follow suggestions on how to remove pages from the index.
There’s always the option to use a robots.txt file to prevent web spiders and bots from accessing all or part of a website. Again Google offers help and tools on robots.txt. So does Yahoo! with info on robots.txt
There’s also an free online robots.txt generator here.
Use the noindex meta tag
A way to remove specific pages from getting indexed is by using the noindex meta tag. Place the meta tag into the
section of the web page you want to get out of the index.
Here’s an example that prevents all robots from indexing a page and following outgoing links on your site.
The internet does forget (if you want to)
The Internet is exploding with thousands if not millions of pages created every day. User generated content is a must and will continue to grow now that broadband and online tools for file sharing and socializing are easier than ever to use.
Some want to keep their pages secret, away from the public eye and search index on Google or Yahoo!. Thankfully, there’re tools and techniques to easily remove individual URL or pages and at the same time preventing bots from crawling specific parts of the web.