Back in early November I attended a two day meeting on web archiving that was fantastic. To top it off the meeting took place at the University of Michigan, so I got to marvel at the changes to little downtown A2, catch up with friends, and visit my family.
The meeting provided introductions to web archive analysis methods and technology. I also left with new questions regarding the ways we document, describe and use web archives. The meeting had four keynote speakers and several concurrent panel sessions. I wish that this had been a single track meeting! So many good conversations happening simultaneously and lots of active Q/A time.
Jefferson Bailey, of the Internet Archive, discussed how the social and technical success of the internet might inform web archiving work and community. He emphasized distributed communities, how web archiving interacts with old and new methodologies of archives, and APIs for facilitating better networks. He also discussed several thing happenings at the Internet Archive. He showed a new crawler in develop at IA called Brozzler. He discussed possibilities for seed level WARCs, crawl logs, better understanding of metadata needs, and researcher services.
Abigail Grotke, Library of Congress, talked about the past 15 years of development of LC web archives. It was so interesting to learn how LC tracks nominated URLs, permission emails, responses and collections. She also discussed how LC is focusing on ‘entity’ based collection rather than URL based as URLs for an entity change overtime.
Juan Cole, University of Michigan professor, opened the second day with his perspective on historical analysis, research dissemination online, and the web as an archive. The talk also led to a discussion of stability of archives, citations that include unstable URLs, citation chains, and more.
The closing keynote was Ian Milligan from the University of Waterloo. He discussed the critical importance of web archives to historians studying the 1990’s. The internet provides unprecedented access to the lives of ordinary people, but also many challenges in terms of analyzing and working with large corpus of web archive data. I really enjoyed this talk and for a hot second I was frantic that my teenage live journal might still be live and available to researchers… isn’t not, thank goodness.
Privacy and permissions weren’t discussed explicitly by speakers, but the topic came up several times during Q/A. If there is a Web Archives 2016 meeting privacy and ethics of collecting might be a central topic. I also hope there will be more discussion of preservation of web archives.
Thanks to the organizers and all the speakers for a great event.
On the Tweets!