URLiSt
[Download
Now | Requirements | Online Documentation
]
URLiSt will generate
a list of full URLs to every object on your website. This is useful if you want
a list of all your URLs to submit to search engines, or if you want to build
a "Site Map" page. It's a dirty hack so it might not work on your
system.
Requirements:
- PHP 3.0 or newer
- unix server
with the "find" command enabled
- The larger your
website, the more RAM this script will suck up
Documentation
Set the $docroot
variable, then upload the script in ascii mode to your website's document root
(e.g. your public_html directory). Then execute the script, either through the
web or via telnet. URLiSt will perform a recursive "find" command,
using the output to create a complete list of URLs for your site. The results
will be saved in a file called urls.txt, with one URL per line.
If you have a large
site (hundreds of pages or more) on a shared/virtual server, I don't suggest
using URLiSt because it tends to eat a lot of system resources. The script will
run for as long as it takes to perform a recursive find command on your entire
directory structure, then it loads the whole list into RAM for processing. With
a large site, your find command could take several minutes and generate megs
of results - that means the script will run for awhile and use several megs
of RAM!!
Eventually I might
redo URLiSt so that the output of the find command is saved to a file (instead
of RAM), then processed line-by-line (instead of all at once). That would be
much less resource intensive.
|