Creating a mirror of a website in Linux.

GNUs wget command line program is a very popular for downloading single files from a server. It is much more powerful than that and offers some really cool features.One of them is the mirror feature. Suppose you want to mirror a website say: http://www.abc.com . In its very basic form you can use it as follows

$ wget -m http://www.abc.com

However , this can be troublesome as the links on the mirrored website will be pointing to the actual links and not on the relative links. to fix this add the option -k to the command to fix this as follows:

$ wget -mk http://www.abc.com

Another issue is of bandwidth. You are going to put some strain on the remote server if your planning to mirror a website directly.Hence one of the ways in which you can purposefully slow down your download is by using -w option as follows:

$ wget -mk -w 20 http://www.abc.com

This will delay the requests to the server by 20 seconds.add the suffix m for delay in minutes,d for delay in days.

Rsync is equally good in mirroring websites … but you need ssh access on the remote server. By using wget u can mirror the public files on the remote server.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s