farewell, fellow.

Well, you probably surmised from the title of this post that my fellowship with MIT Libraries is coming to an end this month. I’m happy to announce that this fall I will begin working as the Assistant University Archivist for Records Management and Electronic Records at  UNC – Chapel Hill!


The Wilson Library at UNC

My time as a fellow has been full of opportunities to grow as a professional through a mix of mentorship, guided work, and independent exploration. It is bittersweet to leave the MIT community (and  Boston and my lovely Somerville neighborhood), but I’m so looking forward to taking the next step in my career and getting to know the folks at UNC Libraries.

This post marks the last fellowship post for Archive Hour! I may post here again from time to time, but for now I will take a break until 2017. In the meantime, check out the fun and informative UNC Archives blog – For the Record!

Thanks for reading and farewell!


fellow update: archives roadshow II

Authors of this post are: Dana Hamlin, Greta Kuriger Suiter, Jessica Venlet , and Chris Tanguay.

In December 2015 a few of my colleagues put together a fun event for our fellow library colleagues called the Archives Roadshow. The goal was to share some information about the work we do and the collections we steward. The first “episode” walked through explaining finding aids and providing examples of what it’s like to process collections from neat and easy to messy and time consuming. This post recaps the second installment (episode two, if you will) of the Archives Roadshow that occurred April 28, 2016 for preservation week.

This was a fun event and I’m grateful to my colleagues for asking me to present. And, yes, the presentation definitely included the Antiques Roadshow theme song. Read on for a recap of our presentations!

2016-04-28 15.28.47

A very staged presentation photo of me. 🙂

fellowship update: viz for strategy

There are many workflow and policy decisions to be made in the acquisition, processing and preservation of digital archival content. When it comes to preservation, determining a strategy for file format preservation is very important. Here at IASC, we’ve recently implemented Archivematica and with this tool comes the need to make specific decisions about ingest and preservation actions for various file types in order to build our processing workflow.

This requires making sense of a large amount of existing digital archival content (a backlog, if you will). We want to easily see things about the collections like: file format types in all collections, file format type by collection, and mismatched extensions. By easily identifying file formats, we can begin to work with Nancy McGovern (in Libraries preservation unit) to determine workflow options for Archivematica and consider digital preservation strategy with an detailed understanding of formats already in our collections.

So, how’d we do it? Well, Kari initially ran a DROID report of one of our storage areas and wanted to visualize the data. Excel was used to create a pie chart of puids. When I saw this, I thought that Tableau Desktop (a visualization software I’m using for another project) could show the data in a more dynamic way. Using Open Refine, I cleaned up the DROID report a bit and parsed collection IDs from file paths into a separate column. From there, I used Tableau to create a several different views of the data. The visualizations are interactive and allow a user to filter and hover over data points for further detail. The images below provide two examples.


This shows formats within a specific collection.


A look at last modified dates by year for a variety of Microsoft office file types across all collections.

In addition to giving quick insight about our collections, the visualizations also raise a lot of questions regarding seemingly strange files or mismatched extension issues. One nice thing about Tableau is that the underlying data is always just a click away. We can go to the spreadsheet and take a closer look at specific files if needed.

Tableau has been pretty easy to learn so far. It’s all drag and drop based to arrange the underlying data into a variety of visualization options. Tableau even suggests the best visualizations based on dimensions and measures used. I still have a lot of learn about Tableau. My fellow Library Fellow, Christine, is organizing a MIT Libraries Tableau group. I hope the group and continued experimenting with Tableau will help IASC make the most of these visualizations. Next up might be some of our reference and reading room stats!

(Also – check out the U-M Bentley Historical Library post for more ideas, tools and techniques for identifying and characterizing sets of files. I’m hoping to try their methods out too.)

year one: new england

October marked the first year of my two year fellowship! I thought about making this ‘year one’ post a recap of my work, but I posted quite a bit about work already over the last few months. What I haven’t done enough of … is read for professional development. Alas, this isn’t a reading related post either. It’s a picture tour of some New Englander highlights so far.


winter happened in February. my poor little car.

web (dot) mit (dot) edu

As I’ve spent time looking over portions of the http://www.mit.edu domain, I’ve noticed that some websites are located at web.mit.edu and some are mit.edu. Just based on looks, the web.mit.edu websites seemed to be older and as sites were updated the URL was also updated. But why was web.mit.edu ever in use? Well, a librarian colleague who has been part of the MIT community for many years helped solve this mystery for me!

The story goes that when the World Wide Web arrived on the scene in the 1990’s the MIT student group SIPB snagged www.mit.edu URL right away! SIPB, which is a volunteer student computing group (around since 1969), created a wonderful site that you can view via the Internet Archive (snapshot from 1997).

It’s hard to say if this IA playback of the site is completely accurate in design, but the information is fun to look through (like this timeline – web fever has hit!). Only later did the group give over the www.mit.edu domain to MIT… thus the mix of web.mit.edu and mit.edu URLs.  I don’t know the exact date when MIT started using http://www.mit.edu as the hompage URL (or at least redirecting http://www.mit.edu to web.mit.edu), but in the Wayback Machine the change seems to occurs around late 1999 – 2000.

Web history, it’s fun!

While perusing the archived webpages, I noticed that the MIT homepage used to featured some really fun and pretty designs and logos. Sometimes the homepage was designed by someone from the MIT community. This isn’t something the current website does. So glad IA captured the homepage over the years.

fellowship update: summer presentation series

Have you ever noticed just how many tools and projects include the word archives? ArchivesSpace, Archivists’ Toolkit, Archivum, Archivematica, Archive-It, Archon… And as if those aren’t enough to keep up with, there are a plethora of other tools to consider  …. ePADD, BitCurator, atom, Aeon, ContentDM…

The features and functionality of the various tools can overlap or can be different yet complementary. The software development support and options for hosted services vary widely. The use cases and placement within workflow is fluid often depending on institutional context and content types. This is no huge issue for professionals actively engaged with learning, testing and implementing the various tools. But what about folks who don’t work with these tools every day, but need to know about them? How can our colleagues keep it all straight?


communicating the shades of digital archives and preservation tools through summer presentation series. (Flickr user Alex Ford)

Well, there is likely no one solution to this, but communication is a big deal in complex organizations. One communication effort for IASC has been the digital archives blog, Engineering the Future of the Past (EFP). This summer, Kari also launched a series of presentations for MIT Libraries staff on digital archives and preservation tools. Kari opened the series by talking about the overall digital archives ecosystem, possible workflow options, and tool integration ideas. Then she hosted a few sessions focused on the following tools: Archivematica, ArchivesSpace, atom, and BitCurator. For slides and other details, check out the post on EFP.

On August 28, I presented on ePADD as part of this summer series. I discussed how email can be challenging for archivists and then gave an overview of my experience testing ePADD so far. I hope to share my slides soon (probably on EFP).

If you’re actively communicating with colleagues about emerging digital curation workflows and software, I’d love to hear about your strategies.

fellowship update: tool time

One of my goals for the fellowship is to increase overall familiarity and understanding of practical application for various tools useful to digital archives. I try to set aside time each week for testing and learning. Some of the testing I’ve done relates to my work with the Digital Sustainability Lab. Below are a few tools I’ve worked with recently and some others on my “up next” list.

image of gardening tools from 1920s magazine ad

Digital archives toolkits aren’t all that different from gardening toolkits. Gathering, planting and weeding, watering, harvesting… (image: flickr user biodiversity heritage library)

Recently Explored and Tested:

Archivematica (1.4) – Archivematica changed a bit since I last used it in 2013! I’ve been learning more about the storage service options and trying out the new arrangement feature.

ePADD  Email archiving, processing and access from Stanford Libraries. Once we’ve had a chance to work with ePADD more, I’m sure I’ll do some posts here and on Engineering Future of the Past. This tool has me dreaming about a future where all digital archives appraisal and processing incorporates natural language processing and data visualization.

wget (with WARC) – Configuring this tool nearly defeated me. But I got it working on a Mac and have successfully crawled a website with WARC file output! This tool was part of a series of tests for the DS Lab, so I’ll definitely post a more detailed account soon.

TableauThis visualization software is something my fellow Fellow, Christine, and I are working with for our joint project. Once we’re done, I’ll probably share about our project and the work Christine did to get our data visualizin’ with Tableau.

Webrecorder – web archiving for all! I talked about web recorder last month in this post.

Up next:

BitCurator – MIT Libraries is a member of the BitCurator consortium and I’ve used BitCurator a bit in the past. I think it’s high time I increase my familiarity with what’s included in BitCurator and how it might fit into different processing situations.

Lunchbox from NPR – This isn’t a tool that’s really specific to digital archives workflows, but the Waterbug tool could be really useful for prepping images for sharing on social media.

Open Refine – messy data is something that is likely here to stay and I want to know more about how to clean up data efficiently.

MDQC and BWF MetaEdit – AV Preserve tools for checking and adding metadata.

fellowship update: getting on the same (web) page

“Web Archiving” DCP SPRUCE Digital Preservation Illustrations

Web archiving is a new endeavor for the MIT Institute Archives and Special Collections (IASC) and I am lucky enough to be able to take the lead on developing a website acquisition process for the archives. As with any other initiative, the work involves a lot of collaboration with colleagues. The following sections highlight some of the activities currently in progress for this project….

Outline; or, evolving list(s) of next step activities

I’ve created a loose project outline that is basically an evolving list of activities grouped into some categories (e.g. making the case for web archiving, collaboration and communication, policy and procedures, planning for acquisition and metadata integration, web acquisition tools and services, access). The tasks and categories don’t necessarily represent a strictly linear process, but help me remember the wide range of elements that help to create a thorough web acquisition workflow. (This blog post, for example, fits within communication!)

In order to make the case for web archiving and set groundwork for moving forward, the first task was to write an informational document that defines types of web archiving and explores the IASC’s vision for how web archiving can be part of the digital archives program. The following is an excerpt from the document:

The International Internet Preservation Consortium (IIPC) states that archiving internet content is “the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use.” In general, the goal of a web archiving program should be to “capture and preserve the dynamic and functional aspects of Web pages – including active links, embedded media, and animation – while also maintaining the context and relationships between files” (Antracoli, et al. 2014). The use of tools, software or hosted services to collect (also referred to as copy, harvest, or crawl) websites is an essential technical step of a web archiving program, but selection and use of a tool(s) shouldn’t be the first or only step of a holistic web archiving effort.

The document continues on to explore some policy and visioning considerations that should ideally come before tool selection or acquisition. One of the most important steps is understanding the purpose of collecting websites. Two common types of web archive initiatives include: general collections/subject area development and archival records collections. For the IASC, the approach for web acquisition is within the area of archival records. This means that we are interested primarily in websites that are records of MIT (mostly, the mit.edu domain). Beyond this initial document describing web archiving, we’ll need to continue to document things like vision and scope, appraisal guidelines (which are, of course, informed by existing collection policy), rights and permissions procedures, workflow and frequency of capture, access for researchers, and preservation. We’re using the Archive-It Web Archiving Life cycle model to help structure the planning and documentation development.

Appraisal of mit.edu domain; or, understanding the place of website records 

In addition to exploring a web archiving vision for IASC, I’ve also begun to survey parts of the www.mit.edu domain for sites that are good candidates for acquisition (intellectually, not necessarily technically). In some ways this process is a gap analysis as I consider things like: does this website fit an existing collection? when did IASC last received materials from the office or department? does IASC have any digital content for this collection currently? is this website a replacement of the physical record types that IASC used to receive? It’s not realistic to expect to cover the entire domain with this method, but I still think it’s worth it to spend some time appraising (and documenting appraisal) of a range of websites within the domain. I hope that this process can help us prioritize websites to capture and better understand how websites fit into our collections and finding aids. Throughout this process, I’ve been checking in with Liz, our Archivist for Collections, as her knowledge of the collections and organizational history of the Institute is invaluable!

Technology exploration; or, how this project plan isn’t linear! 

It seems like most U.S. archives and libraries engaged in web archiving are using a hosted service. And that is probably because web acquisition is so complex and difficult to do at scale (e.g. all of mit.edu). MIT Libraries doesn’t currently contract with a web archiving service and we are still exploring options for web acquisition, access and preservation.

But that doesn’t mean we don’t have immediate web archiving needs to address in the meantime! We recently had a request from an Institute office to start archiving a student handbook website. This request pushed us ahead, so without all the planning in exact place, and deviating slightly from the plan, we found a solution to meet immediate, small-scale needs.  After considering a few options, WebRecorder.io released a beta version in May. This tool is super easy to use, creates WARCs, and has a partner project for offline WARC playback (Web Archive Player). We are currently using WebRecorder.io Beta for small-scale web acquisition on a selective basis.

I am so very excited about this and even thought this tool is providing a timely solution – we are not abandoning technology exploration, documentation or policy work! I will be posting more on WebRecorder.io and program documentation on this blog and Engineering the Future of the Past over the next few months.


connect the clues! exploring a personal archiving workshop

In April I organized  a personal digital archiving workshop for the MIT Libraries IAPril period and Preservation Week. The events were open to the MIT community and visitors. A mix of participants from across the community and visitors from Simmons GSLIS attended. The basis of the workshop comes from the Find the Person in the Personal Digital Archive murder mystery activity from the Society of Georgia Archivists, ARMA Atlanta and Georgia Library Association (hereafter the SGA workshop). Kari learned about the SGA workshop through a post on The Signal and we’ve been looking for an chance to try it out. The workshop is great because it provides an opportunity to share personal digital archiving guidance and a chance to discuss the role of digital archives at MIT Libraries to the broader community.

Screenshot of the workshop handout that provided a few high level strategies for getting started with personal archiving

Screenshot of the workshop handout that provided a few high level strategies for getting started with personal archiving

The SGA workshop is a murder mystery activity that provides participants with a USB drive containing files put together by the workshop organizers. The scenario is that the USB was found at the scene of a crime and the participants must explore it to learn about the crime and the person who lost the USB drive. The file set contains password protected content, obsolete formats, and is totally unorganized with file names and dates that don’t make sense. The SGA materials include the set of files, activity handouts and a presentation on personal digital archiving. All materials are available on the SGA website.

I love this activity – but we decided to remix the materials and create a genealogy example instead of the murder mystery (see workshop details below). This involved writing a new scenario, editing some of the activity prompts and reworking some of the discussion questions. We used the file set from SGA, but I renamed it “Jean old computer” and added a link to a defunct blogging platform (posterous.com) to try to incorporate social media issues into the mix of photos, documents, and other digital files.  Continue reading