Scraping Archive.org is a method used by many to build their private blog networks quickly. Generally the idea is to replicate old domains that still carry some power (PR, DA, PA, etc) and use it to their advantage for re-ranking the old domain, as well as ranking new domains. Old websites on Archive.org can also be used for other resources such as content, images, etc. The best tool I’ve found for this task is Archive.org Scraper by BlazingSEO. This tool will download of the files from the web archive and strip out the extra coding added by Archive.org. The output is a .zip file with all the sites files (with the correct hierarchy of folders). After unzipping the file, you can simply drag and drop the files using FTP to replicate the old website. This works best with static-type sites. WP sites are a bit harder to replicate and will most likely require additional steps.
For this example, I am going to use the domain “http://hats-shop.com”. I browsed to the site and saw there is no website up. Since it seems like a pretty good domain name, I’m assuming there will be Archive.org data for it.
There a couple of different formats supported for the tool. Simply pasting just the domain will result in the oldest Archive.org entry to be scraped. If there is a specific year you need, you can add a space and the year at the end. In order to scrape an exact entry, you can add a space and the Archive.org ID at the end.
So for this example, here are 3 different formats that can be used:
- http://hats-shop.com (oldest entry)
- http://hats-shop.com 2011 (random entry from year)
- http://hats-shop.com 20130929135048 (exact ID entry)
When scraping based on a specific year, make sure there is an entry for that year in the archive. In order to find an exact ID, search for the domain on the web archive and click on the entry you’d like to scrape. In the browser’s navigation bar, you will find the exact ID (highlighted in the image below).
For this example, I’ll be scraping the oldest entry. Enter the domain with “http://” into the “Domain List” input box and Click “Submit”.
The site will begin being scraped and the log will show the progress.
When the scraping is complete, the “Saved Files” box will show a downloadable .zip file for the site.
From here, you can download the file to your PC, unzip it, and either upload it via FTP or use the files for resources.