I recently needed to sunset my old Drupal 6 blog at saml.rilspace.com, as Drupal 6 is since long not getting security updates, and might not even keep working with newer versions of PHP and MySQL. I was thus delighted to find that one can create a full static copy of such a site by using the well-known wget tool available in pretty much all *nix operating systems. By creating a static HTML-archive of the site, I can keep the website for years to come, to keep links from breaking, without risking to get hacked through some vulnerability in my rusty old Drupal-installation.
The command to create a mirror into a folder under the current directory:
wget -P . -mpck --html-extension -e robots=off --wait 0.5 <URL>
To understand the flags, you can check `man wget` of course, but some explanations follow here:
- -P - Tell where to store the site
- -m - Create a mirror
- -p - Download all the required files (.css, .js) needed to properly render the page
- -c - Continue getting partially downloaded files
- -k - Convert links to enable local viewing
- --html-extension - Add the .html extension after file names. This is important since when serving the plain files, a web server such as NGinx need the .html extension to know that the files should be sent directly to the user's browser, not offered as a file to download. See below for how to redirect from old to new links.
- -e robots=off - Don't read the robots.txt file. Not sure exactly how this one works, but I got a lot of errors if not including it.
- --wait 0.5 - It is better to not overwhelm the web server where your site is hosted, by waiting a little between each page download.
After finishing this command, you will have a folder with static HTML-files and other files, that you can just upload to your web server instead of your CMS.
Finally, you might want to add this rule to the Nginx config, to make sure the old non-.html URLs are redirected to the .html variant:
location / {
if ($request_filename !~* (/|(.+)\.(html|css|js|gif|png|jpg))$ ) {
rewrite ^(.+)$ $1.html permanent;
}
}
Add these lines in the appropriate server config in the relevant file, such as /etc/nginx/sites-enabled/default.
What the rule does, is that for all URLs which are not the home page (/) or static files with any of the common file extensions, it will redirect to the same URL with '.html' padded on at the end.
That's it! Visit my now archived old blog at saml.rilspace.com for an example if you wish!
Samuel