Overview

What is the web archiving program?

Since 2008, the University of Melbourne's Web Archiving Program has captured and preserved over 800 University websites using the Internet Archive’s Archive-It service.

The program is currently administered by Records & Information.

How do I search for archived content?

Archived content can be found on the Internet Archive's website using their Wayback Machine tool. Simply enter the URL you wish to find and press enter.

You can also browse and search via the University of Melbourne collections in the Archive-It portal.

How is content selected for archiving?

All key University domains are currently included in regular captures. Generally new content is proposed for capture by University staff.

Note: Content selection will be under review during 2024. Existing domains (URLs) will continue to be captured. With regards to new domains, only content which is publicly available and which is classed as 'Permanent' in the University Records Retention and Disposal Authority (RDA) will be captured. Duplicate domains will be deactivated, when identified and where appropriate.

How is content captured?

Collection groups

Domains are grouped into collections based on whether it is a Faculty, Department, School, Centre, Institute website, or administrative function based website such as Student Administration, Student Services, External Relations and Information Management.

To find a website, you do not need to know what collection a website is a part of. Instead, you only need to enter in the URL to find out whether or not it has been archived.

Capture timings and quality assurance

Captures (or crawls) are set to run for different time periods to ensure that a crawl has time to capture all the pages on all the websites within a collection on a regular basis.

What technology is used?

Tools

The program uses the Archive-It web harvesting solution, which systematically retrieves each page on a specified domain and saves a copy. Content is collected and stored according to international standards for digital preservation and access.

The Archive-It service uses a number of open source components including:

  • Heretrix web crawler to collect web content
  • NutchWAX indexing engine to provide search services
  • Wayback to provide the user interfaces.

Limitations

Archive-It works most efficiently when capturing static pages of text. Although Archive-It is flexible, there are some limitations and the following types of content cannot be captured by the tools:

  • Pages requiring Single sign-on (SSO) access
  • Some media, such as video and images
  • Pages with dynamic content such as databases and directories with search features.

Is University web content archived by other organisations?

Yes. Collecting publicly accessible web content for a variety of purposes is undertaken by a number of external agencies including:

Name To find University web content...
Internet Archive Use the URL of interest as a search term, eg, http://web.archive.org/web/*/www.unimelb.edu.au. Alternatively, use the Wayback Machine to locate content of interest
Google Cache Use the following query string: cache:www.unimelb.edu.au (or replace the www.unimelb.edu.au with the URL of your own choosing)
National Library of Australia's PANDORA web archive Use the National Library's Trove discovery service to search for content

Note: The University of Melbourne exercises little if any control over the behaviour of these organisations, and is not responsible for their information management policies and procedures, or the availability of their services.