reading notes: what exactly are we trying to capture?

Davis, Corey. “Archiving the Web: A Case Study from the University of Victoria.” Code4Lib Journal, no. 26 (2014). Accessed November 4, 2014.

 ‘It’s not your grandfather’s web anymore.’

-Negulescu and Rosenthal qtd. in Davis

I found this article to be a great starting point for my exploration of web archiving. Davis provides excellent background on web archiving as well as the areas of interest in developing a web archiving program. In particular, I appreciated Davis’ explanation of why dynamic websites are so difficult to capture well. In light of this difficulty, Davis wonders if we might be able to encourage website creators to build sites that are “optimized for web archiving.” Overall, that kind of task seems daunting. However, it might be possible to work with local website creators (at one’s institution) regarding needs for web archiving.

Davis also briefly discusses the nature of web documents and websites as objects to collect – are they archival objects with original order or discrete objects? This is a key question to grapple with when collaborating with colleagues (from library land and archive land) on development of web archiving initiatives.

Things for future consideration:

Description and arrangement: What kind of metadata fully captures the context of the site overtime? How would web archives of a university domain fit into existing institutional records? Would it need to? How should web archives be represented in archival description? Are most web archives at this point just stand-alone topical collections?

Use cases for web archives: Do you need to have expressed need for web archives before investing in the efforts? If you build it, will they come?  I think it’s important to look at existing collecting strengths and policy – and archive the web accordingly. Maybe the really big question is – how should frequency of crawls be determined?

The big question raised: Davis asks: “what exactly are we trying to capture? … This database—which represents the majority of the project’s human effort—arguably has more value than the website itself.”

I’m just going to keep thinking about that for now…

The reading notes posts found on this blog are intentionally question-filled and causal. Each notes post serves as a sort of open journal record of my professional development reading as the MIT Libraries Fellow for Digital Archives. See the introduction post for more on this series. I welcome suggestions for future readings—current or archival!


