More About the Digital Lab Notebook
This page contains more in-depth information about the vision and development of the digital lab notebook. You can also read our summary description of the digital lab notebook.
Table of Contents
Overview: The Digital Lab Notebook
The digital lab notebook has always been a core project at CHI. Its development is key to the successful implementation of many of the guiding principles of CHI’s technologies. In its simplest form, the digital lab notebook contains the record of the means used to digitally capture information about our world and the history of events describing this information’s subsequent processing until it reaches its final state as a completed digital representation.
The digital lab notebook serves the same function as a written scientist’s lab notebook before the digital age. For centuries scientists wrote down their experiences and subsequent analysis that described the evidence and results of their inquiries. This notebook then became an integral element of their published results. Scientific information cannot be understood in absence of the meaning of the data and the history of its generation. The term ‘metadata’ as data about data is frequently used interchangeably in scholarly discourse with the term ‘provenance’.
Empirical observations, the experience of our senses, and lab notebook provenance accounts that describe their acquisition and processing are the core components of scientific activity. These ideas are also explored in several CHI publications. See “Image-Based Empirical Information Acquisition, Scientific Reliability, and Long-Term Digital Preservation for the Natural Sciences and Cultural Heritage”.
A “digital lab notebook” associated with a digital representation provides transparency, enabling people to assess its reliability and have confidence they can rely on it for their own research purposes. CHI’s current methodologies, capture, and processing tools are designed to collect all of the information necessary for a scientific lab notebook. Anyone can examine the collected files and find a complete provenance account of the means and circumstances surrounding the digital representations.
The transparency and accessibility of the digital lab notebook will soon be dramatically enhanced because it will generate “Linked Open Data”. Linked Open Data makes it easier to read the digital lab notebook. Linked Open Data encodes relationships between entries in the digital lab notebook and simplifies access to its information. The digital lab notebook is necessary for scholarly and scientific digital imaging and documentation. Scientifically useful information requires a provenance account. This provenance explains where the information came from and permits replication experiments, central to scientific practice, to confirm the information’s quality.
“Empirical Provenance”: The Digital Lab Notebook in Authentic Scholarly and Scientific Imaging
In traditional science, provenance is carefully recorded in a lab notebook or a similar record during the scientific inquiry, and it then becomes an integral element of the published results. Such provenance may include descriptions of equipment used, mathematical and logical operations that were applied, controls, oversight operations, and any other process elements necessary to make both the inquiry and its results clear and transparent to scientific colleagues and the interested public.
For a digital photograph, the lab notebook information would include data such as the camera make and model, firmware version, shutter speed and aperture, as well as parameters used to convert the raw sensor data into an image, such as color temperature.
The second essential part of scientific activity is the systematic gathering of observations about the world through the senses. Today, we extend our senses through the use of sensing devices such as digital cameras. In the very old and still vigorously pursued epistemological discussion that examines the nature of human knowledge, the observations of the senses are labeled “empirical.”
CHI uses the term “empirical provenance” to recognize the fundamental scientific nature of the digital lab notebook.
In recent natural science work, many examples of Internet-based scientific projects, often found in the biological sciences, have demonstrated the necessity of documenting how digitally represented information is generated. The Open Microscopy Environment is a prime example. These collaborations rely heavily on empirical provenance accounts to assess the quality of information contributed by the collaborators and to make their work transparent to others.
Digital Lab Notebook Architecture
In collaboration with computer scientists and archiving specialists, CHI is now developing its next generation of software tools that will generate digital lab notebooks containing Linked Open Data with advanced knowledge management features. These next-generation digital lab notebooks will foster the adoption of computational photography imaging tools and methods for the acquisition and generation of scientifically useful digital surrogates. These tools and methods are explicitly designed to produce transparent digital representations that enable scientific evaluation of the representation’s quality and suitability of reuse for novel purposes.
Two technologies are undergoing development and improvement by CHI and their open-source collaborators. The first is Reflectance Transformation Imaging (RTI). RTI is currently undergoing widespread adoption. Demand from the RTI user base is driving this development. The Collaborative Algorithmic Rendering Engine (CARE) tool (currently under construction) involves the application of Algorithmic Rendering to the RTI image acquisition process, entailing similar archival and long-term preservation issues. This work is funded by a National Science Foundation (NSF) grant shared by Princeton University and CHI.
The NSF grant supports the development of the new digital lab note architecture and many parts of the software development work. These lab notebooks will be far easier to query and access, making scientific evaluation of their digital representations easy and practical. They will also enable “born archival” imaging, the assembly of complete scientific information packages ready for archival submission (see “Born Archival” Imaging below).
CHI and its collaborators are now developing a knowledge architecture that will apply to these RTI and AR tools. This architecture will clearly describe procedures for generating the digital lab notebook using Linked Open Data. Linked Open Data uses the Resource Description Framework (RDF), a standard model for data interchange on the Web. This Linked Open Data will be represented as RDF in the digital lab notebook. It will be machine-readable and can be accessible to search engines. These lab notebooks will enable qualitative evaluation of their associated scientific images through improved access to the lab notebook’s information within the worldwide storage and archiving environment and advanced, semantically organized querying of the stored information. In short, this approach can lead to the integration and interrogation of large amounts of separately stored, but related, information.
The architecture and the set of tools, once they are implemented, will produce Linked Open Data, written using RDF, where the relationships between the data inscribed in the lab notebook conform to the international metadata knowledge management standard ISO 21127.
ISO 21712 is the Conceptual Reference Model (CRM) developed by the Documentation Committee (CIDOC) of the United Nations Education, Scientific and Cultural Organization’s (UNESCO) International Council of Museums (ICOM). The Conceptual Reference Model (CRM) is a metadata knowledge management structure that, when encoded into the Linked Open Data, describes the rich variety of interrelationships within the Linked Open Data lab notebook entries.
An essential feature of digital lab notebook’s architecture is a very high level of automation. This keeps metadata management “under the hood” and frees the user to concentrate on the cultural or scientific task at hand. The synergistic combination of lab notebook provenance and automated digital processing, requiring minimal operator configuration, offers advantages for the organization, communication and preservation of digitally generated knowledge. Once the process used to construct a digital surrogate is largely automated, a lab notebook log describing this process can be automatically produced. The technologies CHI develops are specifically selected to enable high levels of automatic processing. This translates directly into ease of use, a short learning curve, and enhanced technology adoption.
How the Knowledge Management Works in the Digital Lab Notebook
Here is a general outline of how the digital lab notebook architecture will work. While many of these operations are complex and significant work remains to be done, the following sketch conveys the overall strategy.
» A new RTI and CARE tool digital image capture tool will capture information about the imaging subject, the people involved in the imaging session, and features of the imaging acquisition equipment configuration not present in the electronic imaging record. For example, to generate a digital lab notebook account of the imaging equipment used during a capture session, the tool will first prompt the capture team to provide a one-time description of all the equipment they use for image capture. The tool enables the capture team to organize the listed equipment into templates reflecting frequently used capture configurations. For example cameras and tripods can be grouped into templates and lighting equipment grouped into other frequently used configurations. These templates can then be combined together into master templates describing the equipment configuration used in a specific capture sessions. During a capture session, all the imaging team needs to do is select the template in use and the information describing that configuration will automatically generate Linked Open Data, stored in the digital lab notebook as RDF and incorporating the knowledge management structures of the CRM.
The capture metadata will also be validated for conformance to RTI and CARE Tool requirements. For example, one of the requirements for the Highlight RTI capture method is that the radius from the center of the imaging subject to the light source used to collect the hemispherical sample of light directions that are used by RTI and CARE Tool must be at least twice the longest diameter of the subject area being imaged. If a validation problem is detected, the software will alert the user.
» CHI recommends always capturing photographs using the camera's RAW file format. After determining the color per-pixel from the camera sensor’s red, green, and blue sensor array (known as demosaicing), RAW photographs depict the information from the camera sensor without additional in-camera processing. RAW files are proprietary formats, owned by the camera manufactures and not publically released. This proprietary nature seriously reduces their chances for long-term preservation and usefulness.
A new image sequence validation tool (under construction) will follow current CHI practice. This practice recommends developing the original RAW photographs into an open file format, the Digital Negative (DNG). During this conversion, the DNG’s white balance, exposure compensation, and tone curve information is captured. The camera-generated Exchangeable Image File Format (EXIF) image information and optional, user-entered International Press Telecommunications Council (IPTC) data are automatically saved into the open, widely used metadata carrier file format, the Extended Metadata Platform (XMP). These XMP files are embedded within each DNG photograph.
The validation tool will possess an image sequence alignment tool that can align the positions of each image in the sequence to sub-pixel accuracy. Accurate alignment of the images in the capture sequence is important to producing good RTIs and CARE Tool Renderings.
The validation tool will generate Linked Open Data RDF during its processing events and from the “harvested” XMP metadata previously stored in the DNG. It will then store this information, along with the capture tool’s RDF. This structure organizes the information in a way that automatically clarifies the relationships between all of the information elements within the image capture sequence.
These metadata rich DNG images will serve as the archival records of the original capture data and the events in this stage of processing. They can then be used to generate images in the user’s desired file format, including the Joint Photographic Experts Group (JPEG) file format currently used by the RTI processing tool, RTIBuilder and the CARE Tool.
The validation tool will perform a series of checks on the image metadata to insure that the data for the entire imaging sequence conforms to the requirements for RTI and CARE Tool processing. For example, the tool will verify that the camera aperture, ISO, and shutter speed are the same throughout the sequence. Any problems with the data will be communicated to the user. These tools may be used during the image capture session to insure that a valid image set has been collected.
» For RTI image generation, the JPEG image set produced from the archival DNG image sequence, will be processed by the open source software, RTIBuilder. RTIBuilder was originally funded and developed by CHI and the University of Minho in Portugal.
Currently, when RTI images are made from these JPEG photographs, RTIBuilder produces an Extensible Markup Language (XML) log file documenting each step of the RTI image generation process. When a finished RTI is produced, the RTI image is stored in a directory structure along with its original DNG data, the XML processing log, and information used during the RTI’s assembly such as images of color and white balance charts and images of the distribution of directional light samples. All of this information in the file structure that can be reused by RTIBuilder following the XML log file to produce additional RTIs.
The digital lab notebook architecture (under construction) will specify how to take the information from the XML log and generate CRM-based RDF.
» For the CARE Tool, the user decisions and the subsequent processing events will be recorded as RDF, as specified by the digital lab notebook architecture. The architecture and accompanying software is under construction.
» The complete set of information assets, the original image data, the RDF subject and process history metadata, the completed digital representation, along with associated documents, and links to the relevant open source software used to build it, would then be combined into one package.
» As discussed in the next section, an archival asset management “wrapper” could then enclose this package.
“Born Archival” Imaging
For each completed RTI or CARE Tool generated digital representation, the complete set of information assets, the original image data, process history metadata stored as RDF, the completed digital representation, along with associated documents, and links to the relevant open source software used to build it, will then be combined into one package.
A wrapping tool (still unfunded) will be able to “wrap” this archival scientific package in a submission format compatible with the desired host repository. This compatible archival format makes both the submission of the archival scientific package to the desired archive as well as the receipt and ingest of the submitted scientific package by the archive as easy as possible.
For the last several years, CHI has been working on the implementation of born archival imaging with museum and library archiving experts Stephen Stead and Martin Doerr. CHI commissioned a research study of archive compatible submission wrappers that was carried out by Martin Doerr. The study demonstrated that CHI’s design plan for the knowledge managed digital lab notebook was sufficient to support the creation of a wrapper for scientific submission packages compatible with each of the major archival format structures examined. This result is important since there are many different archival formats in use around the world. These different archival formats literally manage knowledge and it associated metadata using different and mutually incompatible structures. It will be a major accomplishment to enable born archival imaging for an increasing number of archival formats.
CRM-Based Linked Open Data and the METS Standard
The CHI commissioned study examined the Metadata Encoding and Transmission Standard (METS) in depth. This is the archival standard used by the Library of Congress, the California Digital Library, the University California system, and many other leading archives. Martin Doerr mapped the CRM-based knowledge management structures from the planned digital lab notebook architecture to the knowledge management structures of METS. When combined, their structures are synergistic where the archival advantage of the integrated whole is greater the sum of the two independent parts. Now that the structures are mapped to each other, the wrapping tool could automatically create the METS wrapper for each scientific package submission. A similar process of mapping the CRM-based digital lab notebook knowledge management structures to other archival format structures can yield similar results.
The Digital Lab Notebook in the Long-term Preservation of Digital Information
For digital scientific imaging to have widespread usefulness, it must be archived and accessible. For investments in digital scientific imaging to grow in value over time, the imaging packets must be designed to enhance their chances of long-term preservation. The digital lab notebook will manage the digital archive submission and contain crucial information to improve the chance for the information’s long-term survival. As discussed above, this archival package will contain the digital representations, their original empirical data, along with all the metadata information needed to recreate them. The open source software and source code can also be archived with and/or linked to the package. Once archived, the metadata knowledge needed to recreate the digital representations, found within the archival package, will be available over time to the consecutive digital conservators who will care for the digital representation’s long-term digital preservation.
One advantage of knowledge-managed information contained in a digital lab notebook is found in the archival preservation of digital information. The need for long-term preservation of digital information is both clear and urgent. There are a number of curatorial advantages to knowledge management generally and archived lab notebook information specifically. The Library of Congress has studied the sustainability factors involved in digital preservation. The more a digital curator knows about the files and the methods used to generate them the better they can perform preservation activities. If a file and the open-source software that created it are available to the curator in the future, the file will more likely be preserved to new media standards than a file produced by proprietary, copyright-protected software that is no longer published or maintained.
Advantages of the Digital Lab Notebook
The digital lab notebook is key to successfully implementing many of the guiding principles of CHI’s technologies.
- It helps keep effective metadata management “under the hood” so scientists and cultural heritage practitioners can focus on their primary objectives.
- It enables ease of adoption and compatibility with existing working cultures, in combination with the widely familiar off‐the‐shelf tools of digital photography and CHI’s highly automated open source computational photography software.
- It provides the means to evaluate a digital surrogate’s scientific reliability and suitability for reuse.
- It enhances the long-term preservation chances of digital surrogates through these inherent qualities in its knowledge management model: transparency; self-evident nature; non-proprietary nature; and born-archival ease of archival submission and intake
- It drives the democratization of technology by enabling information acquired by people from all over the world to gain the same level of scientific acceptance as data collected by the most respected institutions.
CHI hopes that the availability of this workflow and toolset centered on the digital lab notebook will start a revolutionary democratization of digital data acquisition, preservation, and use throughout the world.