Technology used in Web Archiving

Technical information about our web archive.

The University of Melbourne Web Archiving Service is technically supported by the Internet Archive's Archive-It service. Archive-It uses a number of open source components including: the Heretrix web crawler to collect web content, the NutchWAX indexing engine to provide search services, and Wayback to provide the user interfaces. Content is collected and stored according to international standards for digital preservation and access.

The overall approach is compliant with the PROV requirements documented in PROV Advice to Agencies 20a: Web-generated Records Version 1, 2007, and relevant national standards and best practices developed by the National Archives of Australia and National Library of Australia.

Archive-It is a web harvesting solution which systematically retrieves each page on a specified domain and saves a copy.

The limitations of this method of web archiving are detailed by the Internet Archive in their 5 Challenges of Web Archiving page.