1) Here's a basic "robots.txt":
User-agent: *With the above declared, all robots (indicated by "*") are instructed to not index any of your pages (indicated by "/"). Most likely not what you want, but you get the idea.
Disallow: /
2) you may not want Google's Image bot crawling your site's images and making them searchable online, if just to save bandwidth. The below declaration will do the trick:
User-agent: Googlebot-Image3) The following disallows all search engines and robots from crawling select directories and pages:
Disallow: /
User-agent: *4) You can conditionally target multiple robots in "robots.txt." Take a look at the below:
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.htm
User-agent: *This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance.
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/
5) There is a way to use Disallow: to essentially turn it into "Allow all", and that is by not entering a value after the semicolon(:):
User-agent: *Here all crawlers should be prohibited from crawling our site, except for Alexa, which is allowed.
Disallow: /
User-agent: ia_archiver
Disallow:
6) Finally, some crawlers now support an additional field called "Allow:", most notably, Google. As its name implies, "Allow:" lets you explicitly dictate what files/folders can be crawled. However, this field is currently not part of the "robots.txt" protocol, so use it only if absolutely needed, as it might confuse some less intelligent crawlers.
Per Google's FAQs for webmasters, the below is the preferred way to disallow all crawlers from your site EXCEPT Google:
User-agent: *Finally this file (robot.txt) must be uploaded to the root accessible directory of your site, not a subdirectory (eg. www.mysite.com/robot.txt) it is only by following the above rules will search engines interpret the instructions contained in the file.
Disallow: /
User-agent: Googlebot
Allow: /
Free, facebook, tips, Links, blogging, Downloads, Google, facebookTips, money, news, apps, Social, Media, Website, Tricks, games, Android, software, PIctures, Internet, Security, Web, codes, Review, bloggers, SAMSUNG, Worldwide, Contest, Exitic, Phones, facebookTricks, hacking, London, Olympics, SEO, Youtube, iOS, Adsense, gadgets, iPHONE, widgets, Doodle, twitter, video, Deals, technology, Aircel, Airtel, iPAD, Angry, Birds, BSNL, TechLife, GMAIL, Idea, Microsoft, SmartPhones, Stress, Buster, Windows, Yahoo, Infolinks, Nokia, Scam, Uninor, browsers, Amazon, Euro, CUP, Chat, IDM, JOBS, Modem, Music, Reliance, Results, SSC, Tata, Docomo, bing, freebie, mobile, placements, AIEEE, AlertPay, Chrome, College, Competetive, Exam, Dehradun, Extension, FireFox, GPRS, HTC, IMPACT, Info, MTS, Mark, Zukerberg, Paypal, Promotional, Post, Torrent, UTU, Unlocking, VodaFone, Wall, Paper, apple, books, engineering, iCAR, iTunes, pinterest, rovio, AVG, Admit, Card, Adobe, Affiliate, Marketing, Akhilesh, Amul, Girl, BlackBerry, ChromeBook, Clixsense, Coupon, Digitallife, Discovery, Emoticons, Festival, GATE, GIMP, Income, Tax, International, JSS, JailBreaking, Kindle, Linux, Local, MAX, PAYNE, Mac, Mango, Memory, Speed, Nexus, Online, Shopping, Raakhi, Report, Rising, Stars, Sample, Science, Sony, Syllabus, TabletBooK, Teamviewer, Templates, Dark, Knight, Rises, USA, UPMT, Virgin, Xperia, ZTE, challan, counselling, course, btech, funny, iMOVE, registration
source:http://linuxpoison.blogspot.com/2009/02/13578175711797.html