Many a times in hacking you need to know the web directories. But its hard to find it out as a server might
contain uncountable no. of Directories in it.
But a major hole in this security is the Robots.txt
google-bot
What is Robots.txt
Robot.txt is the file in the [ wwwroot ]of a server to define the Bots the Functionality on the website.
There are many Bots on the internet like the most famous are Google Search Engine BOT aka Google
Spider , Yahoo Search Engine Bot and many others.
What robots.txt do is that it gives orders to the Bot on how to spider on the website….
Now you may ask what’s the use of Robots.txt file.
Well it is used by webmasters to add functionality to the upcoming bots on there website and also to hide
the directories in the website and where the Bot should not go and spider.
Analyzing Robots.txt For Hacking Stuff
Well its really simple, the first question you would ask is Where is robots.txt Located ?
The answer is its in the [ WWWROOT ].
Don't Understand till yet , its in the main Directory.
Lets take the example of Hacker The Dude Website ;)
http://kaleem-hacking.blogspot.com/robots.txt
Go Ahead and type it in the Address bar of your Browser then what do you see,
2009-11-16_215602
Do you see that , this is the Robots.txt for the Hacker the dude website Now lets first Analyze this
Robots.txt
First Line :-
User-agent: Mediapartners-Google
This means that the above statements are given for the Google Search Engine Bot i.e. Google Spider.
Second Line :-
Disallow:
This mean that nothing is disallowed to the Google Bot, Remember these Orders are given to the Google
bot only not other bots.
Third Line :-
User-agent: *
This means that now all the bots coming to the blog will follow these rules.note that previous rules were for
only Google Bot.
Fourth Line :-
Disallow: /search
This means that all the bots will not spider the files under the directory /search in the following Blog.
Fifth Line :-
Sitemap: http://kaleem-hacking.blogspot.com/feeds/posts/default?orderby=updated
This is basically my blogs sitemap. Not very important.
Working Demo
Now lets test Robots.txt files of various well known websites.
1. Mine Favorite = Google !!
http://www.google.com/robots.txt
Now you would see some very useful links in it for example, in termas of the hacking its very useful to
know more about our Victim.
Allow: /profiles
Disallow: /katrina?
Disallow: /tbproxy/
Hell, Google Knows Katrina Kaif :D
Conclusion
Now that you have seen the working demo and the uses and the read the whole article then you would be
pretty sure on how we are going to find the vulnerability in a website without even first hacking it.
contain uncountable no. of Directories in it.
But a major hole in this security is the Robots.txt
google-bot
What is Robots.txt
Robot.txt is the file in the [ wwwroot ]of a server to define the Bots the Functionality on the website.
There are many Bots on the internet like the most famous are Google Search Engine BOT aka Google
Spider , Yahoo Search Engine Bot and many others.
What robots.txt do is that it gives orders to the Bot on how to spider on the website….
Now you may ask what’s the use of Robots.txt file.
Well it is used by webmasters to add functionality to the upcoming bots on there website and also to hide
the directories in the website and where the Bot should not go and spider.
Analyzing Robots.txt For Hacking Stuff
Well its really simple, the first question you would ask is Where is robots.txt Located ?
The answer is its in the [ WWWROOT ].
Don't Understand till yet , its in the main Directory.
Lets take the example of Hacker The Dude Website ;)
http://kaleem-hacking.blogspot.com/robots.txt
Go Ahead and type it in the Address bar of your Browser then what do you see,
2009-11-16_215602
Do you see that , this is the Robots.txt for the Hacker the dude website Now lets first Analyze this
Robots.txt
First Line :-
User-agent: Mediapartners-Google
This means that the above statements are given for the Google Search Engine Bot i.e. Google Spider.
Second Line :-
Disallow:
This mean that nothing is disallowed to the Google Bot, Remember these Orders are given to the Google
bot only not other bots.
Third Line :-
User-agent: *
This means that now all the bots coming to the blog will follow these rules.note that previous rules were for
only Google Bot.
Fourth Line :-
Disallow: /search
This means that all the bots will not spider the files under the directory /search in the following Blog.
Fifth Line :-
Sitemap: http://kaleem-hacking.blogspot.com/feeds/posts/default?orderby=updated
This is basically my blogs sitemap. Not very important.
Working Demo
Now lets test Robots.txt files of various well known websites.
1. Mine Favorite = Google !!
http://www.google.com/robots.txt
Now you would see some very useful links in it for example, in termas of the hacking its very useful to
know more about our Victim.
Allow: /profiles
Disallow: /katrina?
Disallow: /tbproxy/
Hell, Google Knows Katrina Kaif :D
Conclusion
Now that you have seen the working demo and the uses and the read the whole article then you would be
pretty sure on how we are going to find the vulnerability in a website without even first hacking it.
ConversionConversion EmoticonEmoticon