records, wikipedia, & digital libraries

I’ve been fortunate to attend a workshop and two conferences this month — see below for some quick recaps of the events.

Records Management in the Round: Re-purposing your Archival Expertise to Start a Program

  • This was an New England Archivists workshop led by Sarah R. Demb, Senior Records Manager/Archivist, Harvard University Archives and, Sarah A. Polirer, CA, CRM, Manager Corporate Research, Cigna Corporation. The day long workshop provided a introduction to a variety of topics like: the role of records management, benefits of a RM program, identifying records, retention schedules, planning for RM program and records surveys, and more. I learned so much!

Mass History 2016 – Putting History on the Map Together

  • This was a one day meeting at the College of Holy Cross in Worcester, MA. The meeting is organized by Mass Humanities. My colleague Greta and I participated in a session on “digital tools” by talking a bit about the value of Wikipedia edit-a-thons for archives/libraries. You can view our poster here.

Joint Conference on Digital Libraries

  • This conference was a great opportunity to learn about the perspectives of researchers in using digital collections and evaluating/improving digital library systems. In particular, I enjoyed the web archive related presentations and the WADL (web archiving and digital libraries) workshop. There was also an interesting session on archiving “born-digital” (meaning web based) news.
  • Some odds and ends from this workshop include:
    • Mention of using Storify to summarize a web archive collection prompting a tweet to this slidedeck about the project.
    • Unshorten utility for expanding shortened urls
    • Stephen Bury talked about Frick’s new “digital lightbox” access system. I didn’t catch a link to the system or if it was open to the public — but here is some info on the tool.
    • Vinay Goel from the Internet Archive showed off the new and soon to be released keyword search for the Wayback Machine. The feature will search website homepages.
    • Laura Wrubel gave some updates on the development of Social Feed Manager. The project is developing functionality that will incorporating provenance information in the metadata output!



fellow update: archives roadshow II

Authors of this post are: Dana Hamlin, Greta Kuriger Suiter, Jessica Venlet , and Chris Tanguay.

In December 2015 a few of my colleagues put together a fun event for our fellow library colleagues called the Archives Roadshow. The goal was to share some information about the work we do and the collections we steward. The first “episode” walked through explaining finding aids and providing examples of what it’s like to process collections from neat and easy to messy and time consuming. This post recaps the second installment (episode two, if you will) of the Archives Roadshow that occurred April 28, 2016 for preservation week.

This was a fun event and I’m grateful to my colleagues for asking me to present. And, yes, the presentation definitely included the Antiques Roadshow theme song. Read on for a recap of our presentations!

2016-04-28 15.28.47

A very staged presentation photo of me. 🙂

Continue reading


Back in early November I attended a two day meeting on web archiving that was fantastic. To top it off the meeting took place at the University of Michigan, so I got to marvel at the changes to little downtown A2, catch up with friends, and visit my family.


You know you’re in A2 when…

The meeting provided introductions to web archive analysis methods and technology. I also left with new questions regarding the ways we document, describe and use web archives. The meeting had four keynote speakers and several concurrent panel sessions. I wish that this had been a single track meeting! So many good conversations happening simultaneously and lots of active Q/A time. Continue reading

web archiving resources for NDSA NE crew (and anyone else reading this!)

This list of resources is shared as a compliment to a presentation I gave at the NDSA New England meeting on September 25, 2015. The presentation discussed the MIT Institute Archives’ efforts to acquire websites without a hosted service. I talked about how technology is important, but policy development and planning are key activities that can be accomplished even if new technology isn’t possible right away. The presentation also highlighted the tools we’re finding useful that are easy for an archivist with limited programming skills to use (web recorder, wget and web archive player). I’ve previously talked about some of these activities on ArchiveHour, see that post here.

P.S. Every time I think I’ve got a handle on the essential web archiving resources, I find out about something new. I also realize that a lot of work has gone into web archiving development long before it was something I first learned about in 2013. With this in mind, it’s quite possible that a lot of good stuff is missing from the following list — please add resources you love in the comments or alert me of my ignorance via contact page. =) thank you!

Get Started

  • International Internet Preservation Consortium (IIPC) website – What is web archiving?
  • IIPC blog post (2015), Ian Milligan – “So You Want to Get Started In Web Archiving?” Provides an excellent list of blogs to follow.
  • Archive-It Web Archiving Life Cycle – the examples are specific to Archive-It service and partners, but in any case the life cycle breakdown and concepts are helpful to think about the range of activities and policy that go into a web archiving program.
  • DPC Technology Watch 13-01, (2013), Maureen Pennock “Web-Archiving”
  • NDSA 2013 Web Archiving in the United States survey report

Continue reading

fellowship update: SAA 2015


Sunrise run in Cleveland to start the conference day off right.

Last week was the annual meeting of the Society of American Archivists in Cleveland. This was my first SAA experience. It was overall good, but I really do not love conferences that consist of only concurrent sessions. So much FOMO, it’s not even right. But I managed to see several good sessions over the three days I was in Cleveland. The following highlight a few of the sessions I attended and some tweets.

One of the plenary speakers was Daniel Horowitz Garcia from StoryCorps. He gave a wonderful and moving talk about the power of the stories and diverse voices that archives can preserve and share. The theme of the talk reminded me of something the plenary speaker at NEA 2015 said — “focus on what is made possible by the work.” Rather than talking endlessly about tasks, rules and tools, archivists need to talk most about what is made possible by the work we do.

Continue reading

fellowship update: tool time

One of my goals for the fellowship is to increase overall familiarity and understanding of practical application for various tools useful to digital archives. I try to set aside time each week for testing and learning. Some of the testing I’ve done relates to my work with the Digital Sustainability Lab. Below are a few tools I’ve worked with recently and some others on my “up next” list.

image of gardening tools from 1920s magazine ad

Digital archives toolkits aren’t all that different from gardening toolkits. Gathering, planting and weeding, watering, harvesting… (image: flickr user biodiversity heritage library)

Recently Explored and Tested:

Archivematica (1.4) – Archivematica changed a bit since I last used it in 2013! I’ve been learning more about the storage service options and trying out the new arrangement feature.

ePADD  Email archiving, processing and access from Stanford Libraries. Once we’ve had a chance to work with ePADD more, I’m sure I’ll do some posts here and on Engineering Future of the Past. This tool has me dreaming about a future where all digital archives appraisal and processing incorporates natural language processing and data visualization.

wget (with WARC) – Configuring this tool nearly defeated me. But I got it working on a Mac and have successfully crawled a website with WARC file output! This tool was part of a series of tests for the DS Lab, so I’ll definitely post a more detailed account soon.

TableauThis visualization software is something my fellow Fellow, Christine, and I are working with for our joint project. Once we’re done, I’ll probably share about our project and the work Christine did to get our data visualizin’ with Tableau.

Webrecorder – web archiving for all! I talked about web recorder last month in this post.

Up next:

BitCurator – MIT Libraries is a member of the BitCurator consortium and I’ve used BitCurator a bit in the past. I think it’s high time I increase my familiarity with what’s included in BitCurator and how it might fit into different processing situations.

Lunchbox from NPR – This isn’t a tool that’s really specific to digital archives workflows, but the Waterbug tool could be really useful for prepping images for sharing on social media.

Open Refine – messy data is something that is likely here to stay and I want to know more about how to clean up data efficiently.

MDQC and BWF MetaEdit – AV Preserve tools for checking and adding metadata.

reading notes: podcast edition

Now that I’ve officially left 2008 behind and upgraded to a smart phone (and apparently contributed to bonkers Apple profits), I’ve been on the hunt for archive and library related podcasts. An obvious first choice is the More Podcast, Less Process podcast from the Keeping Collections project (a METRO project). So far there are ten episodes and I elected to start listening with episode seven because I’m a rebel!

The episode from early 2014, Humans.txt.mp3-The Web Archivists are Present, is focused on web archiving pursuits and challenges. All the usual suspects are discussed: staffing needs and skills, difficulty of crawling dynamic sites, challenges with getting full captures, deciding what to capture and how to scope crawls, topical vs. institutional web collections, facilitating searching and access, permissions and robots.txt decisions.

Even better the discussion places these topics in two specific institutional contexts – Columbia University Library web archives (established program) and the New York Art Resources Consortium (new program). Both institutions are Archive-It partners, so web collecting is discussed within the Archive-It model. At the very end, the group raised a couple questions I found particularly worthy of consideration: Are website collections really archival collections? Can web archives ever be organic collections or are they always artificial collections created by the web archivist?  …Which, for me, begs the question, how much does it really matter either way? Another possible way to approach this specific discussion might be considering the merits, similarities and differences between web archiving as collection development and web archiving as records management.

I don’t have much else in the way of critique or further discussion, but I always enjoy learning about how other professionals are developing web archive programs (because it’s no simple task!). Go listen! Other web archiving things I’ve viewed, read or scanned recently:

Do you know of any other archive-library related podcasts? This is what I’ve found so far:

Today’s coffee: Starbucks Veranda

The reading notes posts found on this blog are intentionally question-filled and causal. Each notes post serves as a sort of open journal record of my professional development reading as the MIT Libraries Fellow for Digital Archives. See the introduction post for more on this series. I welcome suggestions for future readings—current or archival!

reading notes: dynamic modes of recordkeeping

Anderson, Kimberly. 2013. “The Footprint and the Stepping Foot: Archival Records, Evidence, and Time.” Archival Science, 13:4, p. 349-371. DOI: 10.1007/s10502-012-9193-2

If archivists can learn to recognize records in all their forms, perhaps archivists can stop trying to acquire or create records that can be separated from their community of origin. If we move towards the citizen or community archivist model, the archives becomes a clearinghouse of sorts in which seekers are referred to the community for access, rather than capturing or translating records for use in the archives. P. 361

Article Overview

There is so much going on in this article! I recommend taking some time to read the full article if it’s of interest to you. Here goes my attempt at summarizing the article as well as my take-aways:

This article proposes a redefinition of what constitutes an archival record. By bringing into focus non-Western modes of thinking on concepts of time, record keeping, and relationships between creator/expert and record, Anderson questions the inclusiveness of archival collections (in the United States) and calls for archivists to acknowledge the need to begin identifying records without imposing Western concepts of ‘recordness’ on dynamic practices of recordkeeping. An example of a dynamic practice of recordkeeping would be dance or ritual (Anderson refers to this as kinetic or oral records, see p. 364). Continue reading

reading notes: what exactly are we trying to capture?

Davis, Corey. “Archiving the Web: A Case Study from the University of Victoria.” Code4Lib Journal, no. 26 (2014). Accessed November 4, 2014.

 ‘It’s not your grandfather’s web anymore.’

-Negulescu and Rosenthal qtd. in Davis

I found this article to be a great starting point for my exploration of web archiving. Davis provides excellent background on web archiving as well as the areas of interest in developing a web archiving program. In particular, I appreciated Davis’ explanation of why dynamic websites are so difficult to capture well. In light of this difficulty, Davis wonders if we might be able to encourage website creators to build sites that are “optimized for web archiving.” Overall, that kind of task seems daunting. However, it might be possible to work with local website creators (at one’s institution) regarding needs for web archiving.

Davis also briefly discusses the nature of web documents and websites as objects to collect – are they archival objects with original order or discrete objects? This is a key question to grapple with when collaborating with colleagues (from library land and archive land) on development of web archiving initiatives.

Things for future consideration:

Description and arrangement: What kind of metadata fully captures the context of the site overtime? How would web archives of a university domain fit into existing institutional records? Would it need to? How should web archives be represented in archival description? Are most web archives at this point just stand-alone topical collections?

Use cases for web archives: Do you need to have expressed need for web archives before investing in the efforts? If you build it, will they come?  I think it’s important to look at existing collecting strengths and policy – and archive the web accordingly. Maybe the really big question is – how should frequency of crawls be determined?

The big question raised: Davis asks: “what exactly are we trying to capture? … This database—which represents the majority of the project’s human effort—arguably has more value than the website itself.”

I’m just going to keep thinking about that for now…

Today’s coffee: New England Coffee        

The reading notes posts found on this blog are intentionally question-filled and causal. Each notes post serves as a sort of open journal record of my professional development reading as the MIT Libraries Fellow for Digital Archives. See the introduction post for more on this series. I welcome suggestions for future readings—current or archival!