It seems to not be documented very well, but it is in the wget FAQ: wget -erobots=off http://your.site.here When you get right down to it, it's impossible to prevent someone from automatically downloading your site, unless of course you pull the ethernet cable. ;-) Josh "James P. Gray" <gray_james@dwc.edu> writes:
You could create a robots.txt file in the http root directory to tell automated tools not to look at some or all of your content. wget for instance does not have a flag to ignore robots.txt (based on the output of wget --help). Of course this assumes that the client is written to respect the robots.txt so it won't stop someone who is really serious. Just google robots.txt and you'll find more info. Be aware that search engines will also ignore your content.
Best of luck!
Mike Frysinger wrote:
On Thursday 05 January 2006 00:47, Aramico wrote:
Of course not. People still can see the web pages, but they can not take all the file (copy the files and its subs dirs) simulatantly.
you didnt understand what he was telling you
you cant lock out wget but allow a web browser, there's no way for you to know for sure the client is wget ... the program allows you to easily change the reported user agent to say 'firefox'
you could use mod_rewrite or something to check the useragent header, but that would only stop people who dont know their tools -mike _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug