web (dot) mit (dot) edu

As I’ve spent time looking over portions of the http://www.mit.edu domain, I’ve noticed that some websites are located at web.mit.edu and some are mit.edu. Just based on looks, the web.mit.edu websites seemed to be older and as sites were updated the URL was also updated. But why was web.mit.edu ever in use? Well, a librarian colleague who has been part of the MIT community for many years helped solve this mystery for me!

The story goes that when the World Wide Web arrived on the scene in the 1990’s the MIT student group SIPB snagged www.mit.edu URL right away! SIPB, which is a volunteer student computing group (around since 1969), created a wonderful site that you can view via the Internet Archive (snapshot from 1997).

It’s hard to say if this IA playback of the site is completely accurate in design, but the information is fun to look through (like this timeline – web fever has hit!). Only later did the group give over the www.mit.edu domain to MIT… thus the mix of web.mit.edu and mit.edu URLs.  I don’t know the exact date when MIT started using http://www.mit.edu as the hompage URL (or at least redirecting http://www.mit.edu to web.mit.edu), but in the Wayback Machine the change seems to occurs around late 1999 – 2000.

Web history, it’s fun!

While perusing the archived webpages, I noticed that the MIT homepage used to featured some really fun and pretty designs and logos. Sometimes the homepage was designed by someone from the MIT community. This isn’t something the current website does. So glad IA captured the homepage over the years.


web archiving resources for NDSA NE crew (and anyone else reading this!)

This list of resources is shared as a compliment to a presentation I gave at the NDSA New England meeting on September 25, 2015. The presentation discussed the MIT Institute Archives’ efforts to acquire websites without a hosted service. I talked about how technology is important, but policy development and planning are key activities that can be accomplished even if new technology isn’t possible right away. The presentation also highlighted the tools we’re finding useful that are easy for an archivist with limited programming skills to use (web recorder, wget and web archive player). I’ve previously talked about some of these activities on ArchiveHour, see that post here.

P.S. Every time I think I’ve got a handle on the essential web archiving resources, I find out about something new. I also realize that a lot of work has gone into web archiving development long before it was something I first learned about in 2013. With this in mind, it’s quite possible that a lot of good stuff is missing from the following list — please add resources you love in the comments or alert me of my ignorance via contact page. =) thank you!

Get Started

  • International Internet Preservation Consortium (IIPC) website – What is web archiving?
  • IIPC blog post (2015), Ian Milligan – “So You Want to Get Started In Web Archiving?” Provides an excellent list of blogs to follow.
  • Archive-It Web Archiving Life Cycle – the examples are specific to Archive-It service and partners, but in any case the life cycle breakdown and concepts are helpful to think about the range of activities and policy that go into a web archiving program.
  • DPC Technology Watch 13-01, (2013), Maureen Pennock “Web-Archiving”
  • NDSA 2013 Web Archiving in the United States survey report

Continue reading

fellowship update: summer presentation series

Have you ever noticed just how many tools and projects include the word archives? ArchivesSpace, Archivists’ Toolkit, Archivum, Archivematica, Archive-It, Archon… And as if those aren’t enough to keep up with, there are a plethora of other tools to consider  …. ePADD, BitCurator, atom, Aeon, ContentDM…

The features and functionality of the various tools can overlap or can be different yet complementary. The software development support and options for hosted services vary widely. The use cases and placement within workflow is fluid often depending on institutional context and content types. This is no huge issue for professionals actively engaged with learning, testing and implementing the various tools. But what about folks who don’t work with these tools every day, but need to know about them? How can our colleagues keep it all straight?


communicating the shades of digital archives and preservation tools through summer presentation series. (Flickr user Alex Ford)

Well, there is likely no one solution to this, but communication is a big deal in complex organizations. One communication effort for IASC has been the digital archives blog, Engineering the Future of the Past (EFP). This summer, Kari also launched a series of presentations for MIT Libraries staff on digital archives and preservation tools. Kari opened the series by talking about the overall digital archives ecosystem, possible workflow options, and tool integration ideas. Then she hosted a few sessions focused on the following tools: Archivematica, ArchivesSpace, atom, and BitCurator. For slides and other details, check out the post on EFP.

On August 28, I presented on ePADD as part of this summer series. I discussed how email can be challenging for archivists and then gave an overview of my experience testing ePADD so far. I hope to share my slides soon (probably on EFP).

If you’re actively communicating with colleagues about emerging digital curation workflows and software, I’d love to hear about your strategies.

fellowship update: SAA 2015


Sunrise run in Cleveland to start the conference day off right.

Last week was the annual meeting of the Society of American Archivists in Cleveland. This was my first SAA experience. It was overall good, but I really do not love conferences that consist of only concurrent sessions. So much FOMO, it’s not even right. But I managed to see several good sessions over the three days I was in Cleveland. The following highlight a few of the sessions I attended and some tweets.

One of the plenary speakers was Daniel Horowitz Garcia from StoryCorps. He gave a wonderful and moving talk about the power of the stories and diverse voices that archives can preserve and share. The theme of the talk reminded me of something the plenary speaker at NEA 2015 said — “focus on what is made possible by the work.” Rather than talking endlessly about tasks, rules and tools, archivists need to talk most about what is made possible by the work we do.

Continue reading

fellowship update: tool time

One of my goals for the fellowship is to increase overall familiarity and understanding of practical application for various tools useful to digital archives. I try to set aside time each week for testing and learning. Some of the testing I’ve done relates to my work with the Digital Sustainability Lab. Below are a few tools I’ve worked with recently and some others on my “up next” list.

image of gardening tools from 1920s magazine ad

Digital archives toolkits aren’t all that different from gardening toolkits. Gathering, planting and weeding, watering, harvesting… (image: flickr user biodiversity heritage library)

Recently Explored and Tested:

Archivematica (1.4) – Archivematica changed a bit since I last used it in 2013! I’ve been learning more about the storage service options and trying out the new arrangement feature.

ePADD  Email archiving, processing and access from Stanford Libraries. Once we’ve had a chance to work with ePADD more, I’m sure I’ll do some posts here and on Engineering Future of the Past. This tool has me dreaming about a future where all digital archives appraisal and processing incorporates natural language processing and data visualization.

wget (with WARC) – Configuring this tool nearly defeated me. But I got it working on a Mac and have successfully crawled a website with WARC file output! This tool was part of a series of tests for the DS Lab, so I’ll definitely post a more detailed account soon.

TableauThis visualization software is something my fellow Fellow, Christine, and I are working with for our joint project. Once we’re done, I’ll probably share about our project and the work Christine did to get our data visualizin’ with Tableau.

Webrecorder – web archiving for all! I talked about web recorder last month in this post.

Up next:

BitCurator – MIT Libraries is a member of the BitCurator consortium and I’ve used BitCurator a bit in the past. I think it’s high time I increase my familiarity with what’s included in BitCurator and how it might fit into different processing situations.

Lunchbox from NPR – This isn’t a tool that’s really specific to digital archives workflows, but the Waterbug tool could be really useful for prepping images for sharing on social media.

Open Refine – messy data is something that is likely here to stay and I want to know more about how to clean up data efficiently.

MDQC and BWF MetaEdit – AV Preserve tools for checking and adding metadata.

reading notes: the tough stuff

This month I chose three readings that are rather different, yet each takes a look at some of the tough stuff that comes up in the information profession — collaboration, digital preservation and web archives, and e-waste and ethical consumerism.

1… The first is a report from OCLC by Jackie Dooley addressing management of born-digital library material. When it comes to navigating born-digital content, digitized materials, digitally published and delivered content, and open web based content — the best course for acquisition, access, and preservation actions is not always clear or simple.

Continue reading

fellowship update: getting on the same (web) page

“Web Archiving” DCP SPRUCE Digital Preservation Illustrations

Web archiving is a new endeavor for the MIT Institute Archives and Special Collections (IASC) and I am lucky enough to be able to take the lead on developing a website acquisition process for the archives. As with any other initiative, the work involves a lot of collaboration with colleagues. The following sections highlight some of the activities currently in progress for this project….

Outline; or, evolving list(s) of next step activities

I’ve created a loose project outline that is basically an evolving list of activities grouped into some categories (e.g. making the case for web archiving, collaboration and communication, policy and procedures, planning for acquisition and metadata integration, web acquisition tools and services, access). The tasks and categories don’t necessarily represent a strictly linear process, but help me remember the wide range of elements that help to create a thorough web acquisition workflow. (This blog post, for example, fits within communication!)

In order to make the case for web archiving and set groundwork for moving forward, the first task was to write an informational document that defines types of web archiving and explores the IASC’s vision for how web archiving can be part of the digital archives program. The following is an excerpt from the document:

The International Internet Preservation Consortium (IIPC) states that archiving internet content is “the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use.” In general, the goal of a web archiving program should be to “capture and preserve the dynamic and functional aspects of Web pages – including active links, embedded media, and animation – while also maintaining the context and relationships between files” (Antracoli, et al. 2014). The use of tools, software or hosted services to collect (also referred to as copy, harvest, or crawl) websites is an essential technical step of a web archiving program, but selection and use of a tool(s) shouldn’t be the first or only step of a holistic web archiving effort.

The document continues on to explore some policy and visioning considerations that should ideally come before tool selection or acquisition. One of the most important steps is understanding the purpose of collecting websites. Two common types of web archive initiatives include: general collections/subject area development and archival records collections. For the IASC, the approach for web acquisition is within the area of archival records. This means that we are interested primarily in websites that are records of MIT (mostly, the mit.edu domain). Beyond this initial document describing web archiving, we’ll need to continue to document things like vision and scope, appraisal guidelines (which are, of course, informed by existing collection policy), rights and permissions procedures, workflow and frequency of capture, access for researchers, and preservation. We’re using the Archive-It Web Archiving Life cycle model to help structure the planning and documentation development.

Appraisal of mit.edu domain; or, understanding the place of website records 

In addition to exploring a web archiving vision for IASC, I’ve also begun to survey parts of the www.mit.edu domain for sites that are good candidates for acquisition (intellectually, not necessarily technically). In some ways this process is a gap analysis as I consider things like: does this website fit an existing collection? when did IASC last received materials from the office or department? does IASC have any digital content for this collection currently? is this website a replacement of the physical record types that IASC used to receive? It’s not realistic to expect to cover the entire domain with this method, but I still think it’s worth it to spend some time appraising (and documenting appraisal) of a range of websites within the domain. I hope that this process can help us prioritize websites to capture and better understand how websites fit into our collections and finding aids. Throughout this process, I’ve been checking in with Liz, our Archivist for Collections, as her knowledge of the collections and organizational history of the Institute is invaluable!

Technology exploration; or, how this project plan isn’t linear! 

It seems like most U.S. archives and libraries engaged in web archiving are using a hosted service. And that is probably because web acquisition is so complex and difficult to do at scale (e.g. all of mit.edu). MIT Libraries doesn’t currently contract with a web archiving service and we are still exploring options for web acquisition, access and preservation.

But that doesn’t mean we don’t have immediate web archiving needs to address in the meantime! We recently had a request from an Institute office to start archiving a student handbook website. This request pushed us ahead, so without all the planning in exact place, and deviating slightly from the plan, we found a solution to meet immediate, small-scale needs.  After considering a few options, WebRecorder.io released a beta version in May. This tool is super easy to use, creates WARCs, and has a partner project for offline WARC playback (Web Archive Player). We are currently using WebRecorder.io Beta for small-scale web acquisition on a selective basis.

I am so very excited about this and even thought this tool is providing a timely solution – we are not abandoning technology exploration, documentation or policy work! I will be posting more on WebRecorder.io and program documentation on this blog and Engineering the Future of the Past over the next few months.


connect the clues! exploring a personal archiving workshop

In April I organized  a personal digital archiving workshop for the MIT Libraries IAPril period and Preservation Week. The events were open to the MIT community and visitors. A mix of participants from across the community and visitors from Simmons GSLIS attended. The basis of the workshop comes from the Find the Person in the Personal Digital Archive murder mystery activity from the Society of Georgia Archivists, ARMA Atlanta and Georgia Library Association (hereafter the SGA workshop). Kari learned about the SGA workshop through a post on The Signal and we’ve been looking for an chance to try it out. The workshop is great because it provides an opportunity to share personal digital archiving guidance and a chance to discuss the role of digital archives at MIT Libraries to the broader community.

Screenshot of the workshop handout that provided a few high level strategies for getting started with personal archiving

Screenshot of the workshop handout that provided a few high level strategies for getting started with personal archiving

The SGA workshop is a murder mystery activity that provides participants with a USB drive containing files put together by the workshop organizers. The scenario is that the USB was found at the scene of a crime and the participants must explore it to learn about the crime and the person who lost the USB drive. The file set contains password protected content, obsolete formats, and is totally unorganized with file names and dates that don’t make sense. The SGA materials include the set of files, activity handouts and a presentation on personal digital archiving. All materials are available on the SGA website.

I love this activity – but we decided to remix the materials and create a genealogy example instead of the murder mystery (see workshop details below). This involved writing a new scenario, editing some of the activity prompts and reworking some of the discussion questions. We used the file set from SGA, but I renamed it “Jean old computer” and added a link to a defunct blogging platform (posterous.com) to try to incorporate social media issues into the mix of photos, documents, and other digital files.  Continue reading

fellow update: personal digital archiving 2015

In April I attended the Personal Digital Archiving 2015 conference. The two day series of talks provided a mix of perspectives–documentary film makers, doctoral students, archivists, librarians, and contemporary artists. You can find slides from the event in the Internet Archive. Following are a few highlights from the event:

Personal Digital Archiving 2015 was hosted by NYU.

Washington Square. Personal Digital Archiving 2015 was hosted by NYU Moving Image Archiving and Preservation program, NYU Libraries and CNI.

Wendy Hagenmaier discussed the development of the Find the Person in the Personal Digital Archive activity and resources. The project was a joint effort by the Georgia Society of Archivists, ARMA Atlanta, and Georgia Library Association. I used these materials for my own personal digital archiving workshops recently, so it was great to learn more about how and why the content was developed. (Slides) Similarly, a panel of librarians from Bryn Mawr, Wheaton, Vasser, Amherst, and Brown discussed their individual efforts to host personal digital archiving workshops.

Julie Swierczek talked about the preservation of technological and social communication forms online. Focusing on the octothrope (#) and ‘ironic metadata’, she questioned how future historians will understand the irony and cynicism in posts, memes and hashtags. Her essential question – how can individuals and social media companies record context for our social digital activity? (slides)

The opening keynote, Don Perry, talked about the power of bringing communities together, digitizing photographs, and sharing stories about the family history each photo represents. Through the Digital Diaspora Family Reunion roadshows, Perry and his team are using digitized and born-digital photographs to recover collective pasts and build new communities. (website) (slides)

Lauren Algee discussed the development of the DC Punk Archive at the DC Public Library. Lauren noted that while the library’s basement isn’t good for much — it is good for punk shows. As part of the overall effort to reach the punk community and build the archive, the library has hosted a series of shows in the basement.

Jessica Bushey, a doctoral student at University of British Columbia, talked about her research on how individuals manage digital content and share or store them via online platforms. Of her 502 survey respondents, 20% believe content will be available for 4-6 years and 24.49% believe that content in social platforms will still be available in 20 or more years. I am shocked by that kind of trust–20 years! Especially given stories like: this and this and this and this and this. The good news — 91% also have a local copy of their content stored on a laptop or external hard drive.  (slides provide full data from her survey)

Tod Wemmer gave an inspiring talk about the value and importance of audio in preserving context for photographs. I’m currently digitizing some 35mm slides for my grandma, mom and aunts — and decided during this talk that I should have my grandma tell me about the photos and record it. (slides – and the audio clips)

Jason Korvari from Cornell University Library shared about his work to provide web archiving services that can support teaching. He talked about his collaboration with a faculty member who uses websites related to art and artists in her classes. By working with the Library to archive the websites (using Archive-It), the faculty member can fully document courses and ensure access to web content used in the courses. (slides)

Yvonne Ng talked about assessment of PDA resources. She works at WITNESS and they provide video archiving guidance for activists. Assessment is so important for understanding community needs and purposefully prioritizing effort and resources. (slides)

And last, but not least – ePADD a fantastic tool for email acquisition, processing, discovery and delivery! I cannot wait for the official ePADD release in June 2015.

Odds and Ends:

Much more was covered – check out the conference website for details.