This list of resources is shared as a compliment to a presentation I gave at the NDSA New England meeting on September 25, 2015. The presentation discussed the MIT Institute Archives’ efforts to acquire websites without a hosted service. I talked about how technology is important, but policy development and planning are key activities that can be accomplished even if new technology isn’t possible right away. The presentation also highlighted the tools we’re finding useful that are easy for an archivist with limited programming skills to use (web recorder, wget and web archive player). I’ve previously talked about some of these activities on ArchiveHour, see that post here.
P.S. Every time I think I’ve got a handle on the essential web archiving resources, I find out about something new. I also realize that a lot of work has gone into web archiving development long before it was something I first learned about in 2013. With this in mind, it’s quite possible that a lot of good stuff is missing from the following list — please add resources you love in the comments or alert me of my ignorance via contact page. =) thank you!
- International Internet Preservation Consortium (IIPC) website – What is web archiving?
- IIPC blog post (2015), Ian Milligan – “So You Want to Get Started In Web Archiving?” Provides an excellent list of blogs to follow.
- Archive-It Web Archiving Life Cycle – the examples are specific to Archive-It service and partners, but in any case the life cycle breakdown and concepts are helpful to think about the range of activities and policy that go into a web archiving program.
- DPC Technology Watch 13-01, (2013), Maureen Pennock “Web-Archiving”
- NDSA 2013 Web Archiving in the United States survey report
- Columbia hosted web archiving meeting June 2014 recorded some sessions – available on YouTube
- The Future of Web Archiving panel at the Digital Preservation 2014 meeting is available on YouTube.
- “Capture all the URLs” (2014) paper by Alexis Antracoli, Steven Duckworth, Judith Silva, Kristen Yarmey.
- Archiving the Web: A Case Study from the University of Victoria (2014) Code4Lib paper by Corey Davis
- Development of University of Michigan web archives, 2011 SAA paper by Mike Shallcross. Find it here.
- IIPC NetPreserve blog
- SAA Web Archiving round table
- #webarchiving hashtag
- British Library web archiving blog
- Internet Archive blog
Tool Information & Guides
Web Recorder & other tools created by Ilya Kreymer
- WebRecorder.io Twitter
- See how Rhizome does embedded WebRecorder captures of Vines
- Web Archive Player
wget – if you’re like me, this tool might require some googling to figure it out! Use version 1.14 or newer to use a command to create WARC file output.
- wget manual
- ArchiveTeam wget information – here and here
- Wget command suggestions
- Wget install on Mac blog post
- Homebrew can be used to install wget (on Mac). I haven’t tried this yet.
- Nicholas Taylor presentation slides on a variety of web archiving tools (2012)
- Wayne State University graduate student project – assessing web archiving tools and services – presentation by Kim Schroeder at Digital Preservation 2014
- IIPC tools & software list
- Practical E-Records blog posts about using wget (2010) and heritrix (2010)
- Archive-It (hosted service from the Internet Archive)
- Blog post on partnership between Archive-It and CDL Web Archiving Service (WAS)
- NCSU Social Media Archives toolkit
- NCSU Lentil Instagram capture tool
- Memento project
The following institutions provide collection policies, frequently asked questions, and other program information via their websites. Thank you, folks!
- Stanford University Library Web Archiving website. Check out the archiviability guide.
- Columbia University Library Web Resources program website
- NYARC FAQ page & Quality Assurance wiki & blog posts
- University of Michigan Bentley Historical Library – Check out the guidelines information, describing web archives in finding aids, web archiving curation guide
- University of North Texas Libraries software and processing section of web archiving website
- Duke University Libraries – web content policy
- Emory University MARBL collection policy includes web archives
- Michigan State University Archives – collection policy for websites
Know of others? Add them in the comments, please!
- Web Science and Digital Libraries research from Old Dominion University – blog
- University of Waterloo historian and researcher – Ian Milligan blog
- Web Archives for Historians blog – maintained by Ian Milligan and Peter Webster
- bibliography of academic works related to preserving the web and/or using web archives
Web Archives 2015: Curate, Capture, Analyze, hosted by the University of Michigan Library in November.
Thanks to UMass Dartmouth and Brown for hosting the NE NDSA meeting this year.
Updates – resources found after this post went live:
- ALA connect webinar in April 2015 by Lisa Snider.