Web Archiving Program Overview

The University’s Records Management Procedures states under part 8:

“…Records Services is responsible for implementing a web archiving program to ensure public University web pages are preserved as University records. Budget divisions and work units responsible for web pages must check that their sites are archived as part of this program. They should also notify Records Services before removing sites.

Content on internal University sites are records, and content owners must manage it according to the provisions that apply to all records…”

Records Services uses the Internet Archive’s subscription based service Archive-It as its web archiving tool. Technical details about Archive-It are available here.

A whole of domain archive consisting of the Unimelb domain, and its hundreds of unimelb subdomains, is captured on a quarterly basis in January, April, July and October. The quarterly captures only provide a record of what was on a University website at the time of the capture and do not provide specific evidence of how many times a page may have been updated in between each quarter’s captures.

Selective captures are run for websites identified as being required to be captured on a more frequent basis, or for websites that are decommissioned between the quarterly captures. An example of a selective capture is the home page which is captured on a daily basis and the social media capture which is captured weekly.

In order for the websites to be captured, they have been grouped into collections based on whether it is a Faculty, Department, School, Centre, Institute website, or administrative function based website such as Student Administration, Student Services, External Relations and Information Management. The organising of websites into collections is a requirement of Archive-It, as well as a practical necessity, to try and capture all 300 plus web sites in one big capture all at once would take weeks to run. Instead, by grouping websites into smaller collections the captures are completed quickly. To find a website, you do not need to know what collection a website is a part of, you only need to enter in the URL to find out whether or not it has been archived.

Captures (or crawls) are set to run for different time periods to ensure that a crawl has time to capture all the pages on all the websites within a collection on a regular basis.

After a crawl has been completed, a quality assurance audit is conducted on each seed that has been archived to check that the way it appears in the archive is the way it appears live on the web.

Web Archiving Program Award

Each year, the Public Records Advisory Council (PRAC) of Victoria, offer the Sir Rupert Hamer Records Management Awards, recognising excellence and innovation in the Victorian public sector.  The Awards are named after Sir Rupert Hamer who was the Victorian Premier when the Public Records Act was passed in 1973 and when the Public Record Office Victoria opened its first office and repository.  In a ceremony held at Parliament House on Thursday 28th May 2009, the University of Melbourne Web Archiving Program was awarded a Certificate of Commendation in the large government agency category.

http://prov.vic.gov.au/government/sir-rupert-hamer-awards/past-hamer-award-winners

Web Archiving Program Background

The Web Archiving Program was initiated by Records Services and began life as the Web Archiving Working Group (WAWG) in 2002 which recommended the establishment of the Web Archiving Strategy Project (WASP).  WASP commenced in mid 2003 and ran until 2007 and had 3 major phases.  The first phase involved research and development of a Web Archiving Policy, the second phase focussed on a pilot of software solutions (such as PageVault, TRIM and Pandas) and the third phase - which was to be the implementation phase - was postponed for some time.  In 2007 a business case was developed for a Technical Web Archiving Solution, the outcome of which led to the purchase of a subscription to the Archive-It web application from the Internet Archive.  The Web Archiving Program in its current form officially began in January 2008.