Carnegie Mellon University Libraries has developed CAMPI, a new web application that uses computer vision to assist librarians processing digital photograph collections.
Carnegie Mellon University Libraries (CMU) has developed a new web application that uses computer vision to assist librarians processing digital photograph collections. In summer 2020, a team of CMU librarians and staff created and tested the Computer-Aided Metadata generation for Photo archives Initiative (CAMPI), a visual search prototype that successfully found and tagged duplicate images and photos depicting similar scenes in the Carnegie Mellon University Archives’ General Photograph Collection (GPC).
Commercial computer vision solutions—such as Google’s AutoML Vision or Vision API models used in conjunction with Google’s enterprise cloud storage platform—are currently being used in applications ranging from identifying animals in camera trap datasets to detecting defects in products on assembly lines. Notably, the New York Times uses Google Cloud and Vision API to enable searches of its huge archive of digitized photos (many of which had detailed date, location, and other relevant information written on the backs of the originals, which enabled metadata harvesting via optical character recognition).
Some libraries or collections use computer vision to automatically add metadata, “because that use fits their collection,” CMU Digital Humanities Developer Matt Lincoln told LJ. “A really good example is at the Library of Congress with their Chronicling America historical newspapers project. They used a system to classify images in newspapers—is this a photograph, a map, or an illustration? We had a different problem. We didn’t have tags on anything [in the GPC] yet, and many of the tags that we did want to be able to add—the things that would be useful to the collection—were idiosyncratic.”
One example Lincoln cited was “The Fence,” a famous Carnegie Mellon landmark that is considered “the unofficial university billboard.” Students paint it and then stand guard for as long as they want their messages to be seen. “We actually don’t have an enormous number of pictures of it, but it’s a really important tag to be able to have. And if we tried to train a computer algorithm and could only give it 20 images of The Fence, it wasn’t going to work out that way.”
So, CAMPI is not intended to scan through digitized photos and add tags automatically. Instead, the prototype demonstrates the benefits of a hybrid approach, with a computer vision interface assisting librarians and archivists as they work with large visual collections by enabling them to find groups of related images and then add descriptive metadata tags in bulk.
“We found that computer vision by itself, or even with non-expert human guidance, would actively impede the public use of a photo archive,” Lincoln, CMU archivists Julia Corrin and Emily Davis, and Digital Humanities Program Director Scott Weingart wrote in a white paper published in October 2020. “Mistakes, failures to understand and surface important or salient features of a collection, and a lack of moral judgment would ultimately cause more harm than good. When computer vision is tightly integrated with a digital archival collection management system, however, expert archivists and editors can strategically leverage machine learning models without making the collection beholden to them. The result yields a faster, more extensive, and more integrative approach to photo processing than is commonly available.”
In addition to white paper authors Corrin, Davis, Lincoln, and Weingart, metadata specialist Angelina Spotts and scanner operator Jon McIntire also worked on the project.
The GPC is a collection of about one million photographic prints, negatives, glass plate negatives, digital images on CDs, and 172GB of born-digital photos, chronicling campus life since the university was founded in 1900. It is one of the most frequently used collections in the archives, and to date about 20,000 of the historic images have been digitized, with digitization driven primarily by requests.
“One of the biggest problems [archivists] face is scale,” Lincoln told LJ. GPC “has accrued a million photographs so far, and since it’s an institutional archive, they’re getting new ones every single year.”
In addition to managing large historical collections, tagging an influx of thousands of new images every year—many of which may be repetitive or redundant—is unsustainable for most archives, including CMU’s. “Appraisal of these transfers must either be done blindly, with no actual assessment of the quality of the photographs maintained, done manually, which is labor-intensive and likely not sustainable, or not done at all,” the white paper states.
And without labor-intensive tagging, image collections are difficult to search. “If you or I want to find a picture, we want to search and find one picture or five pictures,” Lincoln said. “Archivists tend to work on collections of things—here’s a roll of photos, or here’s all of the photos that the marketing department took this month. So, they face challenges trying to make those collections accessible in ways that researchers or users would want...to search them” for specific images.
A request from CMU’s marketing and communications team helped inspire the development of CAMPI. In June 2020, a staff member contacted the archives looking for early images of CMU’s Computation Center, which became home to the university’s first computer in 1956. But an initial search only turned up photos from the late 1960s.
“We know, from experience, that our photo collection is not fully inventoried, and there are images with incorrect descriptions,” Corrin explained in an article on CMU’s site. “I was interested in a tool that would let me to see if I could identify any earlier photos of the Computation Center space or find other images that were improperly labeled.”
A computer vision tool identifying similar images was a logical answer, but commercial computer vision solutions “aren’t very well customized for working with historic photo collections or collections that aren’t pictures of products that are going to go on Amazon,” for example, Lincoln told LJ. “The built-in tags that you might get if you put a photo in Google—for these, we’d put them in and get ‘black-and-white image,’” or other tags that would not be helpful in facilitating searches for this collection. “There are a lot of cases where those pre-existing services are going to be useful, if the kind of tags that you need are very generic data…. If you need something collection-specific…or if you want to do complex search, like faceted search with the other metadata you have, it’s difficult to get that working in a system that’s completely run by a third party.”
Ultimately, the team decided to focus on three interlinked challenges, according to the white paper:
Computer image search systems are “pre-trained” using large visual databases such as ImageNet, which has more than 14 million images that have been classified and hand-annotated, primarily through crowdsourcing. In a 2017 project, CMU researchers partnered with the Carnegie Museum of Art to use one such system—the third edition of Google’s Inception Convolutional Neural Network (InceptionV3)—to cluster and sort images from the museum’s Teenie Harris Archive, a collection of over 70,000 historical photos, 60,000 of which have been digitized. Noting this earlier project’s success, the GPC project team chose to use InceptionV3 as the feature source for its system.
“Generating the image features from a pre-trained model is computationally inexpensive compared to having to train/fine-tune the model itself, so we could create the features for all of our collection without needing specialized hardware,” the white paper explains. “Feature generation only needs to be run once per image per model. It took under 1.5 hours to compute all the features for our ~21,000 images using a standard server with 8 conventional CPU cores…. We then used a fast implementation of an approximate nearest neighbor search algorithm to create an index of this feature space that we could efficiently query.”
They then created a browsing interface that enables users to choose a seed photo and conduct a visual similarity search. “We went for a strategy where we use a very simple computer vision system for visual similarity searching, so that archivists could more rapidly go through the collection and jump from similar picture to similar picture…navigating more quickly than they could by manually going through all of these digitized photos and tagging each one, one at a time,” Lincoln said.
Eventually, the goal is to use the interface for duplicate detection and tagging tasks, but even at the earliest stages, CAMPI’s prototype search interface proved immediately useful. For example, last summer, archivists were able to locate the early images of CMU’s Computation Center requested by CMU’s marketing and communications team by conducting a visual similarity search using the images from the 1960s that had already been discovered. Like much of the collection, the oldest images had not yet had textual metadata assigned to them.
CMU is also in the process of migrating its photo archives from PTFS Knowvation (formerly ArchivalWare) to the open-source Islandora platform, and one of the goals with Islandora will be to make collections such as GPC available for students, faculty, and other departments to access online. “Now that we’re getting ready to move them into a new platform where they can be accessible online, we need metadata to make them text searchable,” Lincoln said. CAMPI will likely make this a much simpler project.
We are currently offering this content for free. Sign up now to activate your personal profile, where you can save articles for future viewing
Add Comment :-
Comment Policy:
Comment should not be empty !!!
OSIYO AGAMA ( MS )
INTERESTING INNOVATION. WOULD LIKE TO LEARN MORE.
Posted : Jan 13, 2021 05:09