Preservation Practicum and Prototyping Databases: A Review

As part of the stipulations of our grant funding, we are required to complete a practicum project at a cultural heritage information institution here in the Washington, DC area. I opted to complete my project over the summer at the Library of Congress Preservation Research and Testing Division (which is primarily why you have seen next to no blog activity from me during the whole summer). I’ve just completed my project and will summarize some of the larger points of my project.

The project was a portion of the CLASS-D development initiative that is ongoing at the Library of Congress PRTD. CLASS-D, which is n acronym for the Center for the Library’s Analytical Science Samples — Digital, is a tasked with developing a functioning database prototype to provide access to sample and analysis metadata for the materials contained within the PRTD’s CLASS collection.

A little background: The CLASS collection is a small body of diverse materials that have been set aside for preservation laboratory research via both noninvasive and invasive (i.e. destructive) techniques. These materials include books (the Barrow books, acquired from the W.J. Barrow Research Laboratory), standard paper samples, TAPPI fiber samples, magnetic tape, and more. These samples undergo a wide variety of laboratory analysis such as microscopic imaging, environmental scanning electron microscope imaging, pH analysis, spectrometry, accelerated aging, &c. Through this ongoing analysis, these samples have generated a great deal of important information on the physical characteristics and aging profiles of a variety of materials of different ages, periods, and production techniques, providing important information on the preservation of cultural heritage materials.

The problem: Now that PRTD has all of this information, how do we disseminate it?

Initial work on the project was completed by Doug Emery. This work ultimately produced a final report filed with the LOC that makes recommendations for data modeling and DB architecture. Based on the work completed for this report, I was tasked with completing the initial prototyping of the actual database in order to prove 1) the appropriateness of the initial data modeling work and 2) the feasibility of the database itself.

Through quite a bit of on-the-job review of DB design methodology, data wrangling, and ham-handed SQL coding, I was able to produce a functional database architecture model that accommodated the variety of sample metadata for all sample types to be included in the database. While I could go in to the nitty-gritty details of the database architecture (which I totally could, but won’t) I think a more valuable point would be some of the things that I took away from my experience at the PRTD:

Be Your Own Manager & Advocate

For those of you unfamiliar with what summer at the LOC is like, let me set the scene: imagine scores of junior fellows and interns meandering the halls, using laboratory space, needing supervisory assistance, all working on different projects across all departments at the LOC. Sound crazy, right? Within the Preservation department, there were at least a dozen scholars on-site throughout the summer working on a variety of in-depth research projects. While this sort of jam-packed work environment is great for innovation and learning new things, it’s not that great for being able to meet one-on-one with a supervisor. Dr. Fenella France, my most gracious host at the LOC, was pulled a million different ways throughout the summer due to her own professional responsibilities and the overabundance of junior researchers. For me, this was a bit of a wake-up call as my professional background has been in environments in which supervisors exerted close control over the work being done by their underlings. For the first time, I found myself doing a large amount of self-guided work without regular in-depth check-ins from the higher ups. This meant that I had to not only had to consciously guide my own schedules and progress, but that I had to also push to have my work reviewed in order to assure that my work and our overall project goals were in alignment. While this took a bit of schedule wrangling on my part, it did lead me to realize that you have to campaign for yourself and your project so that others will provide you with the proper attention and consideration that is required.

Don’t Expect Non-LIS Professionals to Care as Much as You Do About LIS Topics

The PRTD is largely a laboratory research institution. And while they have made huge strides towards serving their information needs and those of external researchers (i.e. CLASS-D), others within the institution simply are not as conscious of LIS issues as myself. This means that when specific researchers are asked for sample metadata, they aren’t necessarily going to provide it in neat, ordered, standard-compliant formats. This, at first, was extremely frustrating because it meant a great deal of data massaging and manual ingest in order to introduce the pilot data into the prototype. Is there a way to raise awareness on these issues? Yes. Can you use those tactics in every scenario? No. Do you need to understand that part of interdisciplinary system development is going to be dealing with others’ disciplinary focuses? Absolutely. Rather than being standoffish on these topics, think of yourself as a conduit through which data can be organized and provided with usable value. Be willing to communicate openly with others in order to help create the greatest level of user service.
And be prepared for a lot of blank looks over lunch as you try to explain data modeling to chemists…

Check Your Data Model. Check it Again. Putting in Pilot Data? Check it Again.

 So you’ve created your database architecture and your ready to ingest your pilot data? You’ve rigorously designed your data model and all the appropriate queries to automate record creation. So, you get cracking on ingesting ~1,90o sample records with nearly 93,000 associated records for physical characteristics. What’s this? You accidentally combined one of the characteristic fields, thereby invalidating all of the physical characteristics you just ingested for over 900 books? Bad words and exasperated looks ensue…

It may go without saying, but always, always, always double, triple, and quadruple check your model before you begin ingesting actual data. You never know what simple mistake you’ve made that will ultimately require several hours of undoing down the road. Luckily, this mistake only ate about 8 hours of my time. But, I would rather not have to waste time because of simple oversight.

You’ll Never Suspect to Find Interdisciplinary Collaboration… Until You Do.

To be perfectly honest, I did not want to work at the LOC. I do not have a science background and was much more interested in several projects more associated with codicology, rare book cataloging, and the like. However, having accepted the LOC project out of necessity, I later discovered that here — in the field of laboratory science — is fertile ground for collaboration with the LIS discipline. The innovative approaches that our field is bringing to the development of information systems and to the practice of data sharing have found a great deal of support and buy-in from the humanities and social science disciplines. And while there is not a lack of interest from the life sciences, there exists a gap between our professional discourse and our applied exercises. Once I arrived at the PRTD, I found several people who were extremely interested in taking advantage of new approaches to data management and sharing. However, given the disciplinary focus of their institute and their own professional responsibilities, they had not yet been able to seek out partnerships or support from LIS professionals. After talking with several PRTD representatives about the possible implementations of the CLASS-D initiative, I found that they were extremely interested in the benefits that could come from having an open access database and the possibilities of implementing RDF-compliant data modeling to promote innovative reuse of data. Despite coming in to the workspace with my own professional preconceptions, I found the PRTD to be an excellent institution, filled with possibilities for creative collaboration between the laboratory science and LIS disciplines. While we may harbor hopes and dreams for what type of institution we may end up working for, be sure to remain open to unforeseen opportunities that may offer you a chance to dramatically impact the mission of a collection, an institution, or a profession.


There will be more to come from the CLASS-D project. Up next, Nick Schwartz will begin working on modeling for the analysis metadata architecture that will attach to the existing architecture. Keep an eye out for more updates!

If you are interested in learning more about the CLASS-D project, feel free to contact me at 23koivisto@cardinalmail.cua.edu.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s