The Linked Data Myth

Linked Data is only as useful as the metadata on which it depends, and poor quality metadata ultimately causes the challenges many librarians hope to address with Linked Data.

Kyle BanerjeeUsed successfully for many years in industry, Linked Data appeals to librarians for its potential to improve services. It allows libraries to describe resources more richly than before, leverage expertise and data across the Web, expose local resources, and add new capabilities to the discovery process. It’s therefore not surprising that librarians have increasingly been demanding support for Linked Data in integrated library systems, repository software, and library standards.

However, Linked Data is only as useful as the metadata on which it depends, and poor quality metadata ultimately causes the challenges many librarians hope to address with Linked Data. Given that the resources necessary to create and maintain the access points, vocabularies, and relationships Linked Data needs to function are unlikely to emerge, the potential for Linked Data to benefit library services is limited.

 

WHAT IS LINKED DATA?

Needlessly obtuse jargon makes Linked Data appear more complicated than it is. According to linkeddata.org, Linked Data is "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF." This definition is virtually useless because Linked Data, the Semantic Web, URIs, and RDF are so interrelated that understanding any of the terms requires understanding the other three. Other authoritative sites also offer nebulous definitions laden with technical jargon that can only be understood by those who already know what Linked Data is.

At its core, Linked Data is a way to describe things and relationships between things using Web addresses that serve as identifiers. This means that instead of storing a name, subject heading, location, or other data point, a Linked Data based system stores a Web address where information about those things can be retrieved. Different data points can be maintained by different entities. For example, names and geographic locations might be maintained by different organizations, and the data retrieved could contain other identifiersfor instance, the place where a person works can be stored as a Web-based identifier where information about that can be retrieved.

Linked Data’s system of distributed identifiers allows exploration and expression of much more complex relationships than can be achieved using other methods. It also simplifies certain maintenance and user functions. Whenever information associated with an identifier changes, systems that store that identifier automatically are updated. Likewise, all attributes and relationships associated with an identifier are immediately accessible.

 

IT DOESN'T SOLVE HUMAN PROBLEMS

Linked Data is a powerful tool, but only for problems that have technical origins. As the term implies, Linked Data depends on data. Metadata needs consistent and complete access points. Ontologies and vocabularies need to be comprehensive and well-maintained. Systems need to know what to do with the data they retrieve.

None of those requirements is met for general library use, nor is there reason to expect they will be. For years, libraries have reduced the number of staff dedicated to creating metadata while increasing their dependence on metadata supplied by publishers or downloaded from bibliographic utilities. The resources to maintain necessary vocabularies and ontologies is a small fraction of what Linked Data needs. No system can interpret the meaning of all the MARC fields, and the trend has been continuing normalization (i.e. simplification) of MARC data because patrons and staff alike demand simplicity and configuration is already too complex. Linked Data is orders of magnitude more complex than MARC.

Linked Data is appropriate for limited domains that can be described using well-maintained vocabularies and ontologies. For example, drugs, interactions, and evidence supporting observations can all be classed and related in multiple dimensions, and the Micromedex drug interaction database uses Linked Data to identify potential interactions and side effects. Without Linked Data, this would not be possible.

At its core, we don’t know what we want to do with Linked Data, and our vision of success often revolves more around the mere act of using it rather than doing anything useful with it. The excitement surrounding Linked Data is reminiscent of what often happens when new technologies are introduced, namely people redefine their needs to accommodate a tool. When microwave ovens first became mainstream, people took classes where they learned to bake cakes and whole turkeys in microwaves. When new pharmaceuticals are announced, many people pressure their doctors for prescriptions even if existing drugs meet their needs.

Linked Data is often presented as a general solution for metadata problems, but it’s only truly useful in certain situations. Like an antibiotic, it’s a powerful tool when used appropriately but ineffective or even detrimental when misused. And just as patients who pressure doctors for prescriptions without understanding the implications often receive harmful or useless treatments, the pressure to incorporate Linked Data in systems where it provides questionable benefits works against core library objectives such as supporting discovery and preservation of materials.


Kyle Banerjee is Collections and Services Technology Librarian and Associate Professor, Oregon Health and Science University.

Comment Policy:
  • Be respectful, and do not attack the author, people mentioned in the article, or other commenters. Take on the idea, not the messenger.
  • Don't use obscene, profane, or vulgar language.
  • Stay on point. Comments that stray from the topic at hand may be deleted.
  • Comments may be republished in print, online, or other forms of media.
  • If you see something objectionable, please let us know. Once a comment has been flagged, a staff member will investigate.


Bughdana (Ms.) Hajjar

Dear Sir,
I liked the article very much. I was looking for something simple and in plain English for a change and focuses on my work (Head of Cataloguing and Metadata at the Lebanese American University. I wrote to the Journal because I did not remark an email of yours. I want to explore with you and ask for the permission to translate your article into the Arabic language. Thank you.

Posted : Oct 10, 2020 09:45


Nicolas Prongué

I agree with many of your points. But Linked Data is not totally useless for libraries. It notably allows - or stimulate - the distributed creation of metadata, for example by sharing a common authority file for all German regions. In that sense, imagine all German speaking librarians contributing to a unique record describing a person: there are significant mutualisation benefits, and the descriptions are going to be more complete/detailed. Connecting them through VIAF, you can then have a multilingual service offering authority data once in German, once in French, once in English...
This use case might correspond more to Europe than to the USA, but still it is an end user use case.

Posted : Aug 20, 2020 09:27


Judy Sturntocry

A good introduction to an article on The Myth of Linked Data. When might we see the body?

Posted : Aug 17, 2020 12:32

LJ User

This was meant as a short blip aimed a broad audience to stimulate conversation.

A longer article needs to be tuned to the readers and their background -- what managers, public services, technical services, and systems staff relate to is very different. Also, it needs to examine a much more specific problem such as implementing it in shared environments where records come from many sources, digital collections, etc.

One thing I didn't say in this one because I didn't want it to sound like a technical services article is that libraries have used Linked Data for a long time. At its essence, Linked Data is authority control. Someone has to figure out which identifier to use for an access point (i.e. authorized heading), the records associated with identifiers represent need to be maintained and related to each other (i.e. vocabulary). Authorized fields in MARC already contain unique identifiers/entries as well as indicators indicating which vocabulary is used.

Unfortunately, libraries that don't have time/resources to verify access points and maintain vocabularies necessary for authority control still won't when we rebrand that process as Linked Data.

Posted : Aug 17, 2020 12:32


Jeff Edmunds

Thank you Kyle. I've been making the same case for years, most notably in three widely shared papers from 2017 ("BIBFRAME as Empty Vessel," "Roadmap to Nowhere: BIBFLOW, BIBFRAME, and Linked Data for Libraries," and "Zombrary Apocalypse! : RDA, LRM, and the Death of Cataloging"). See also my video, "Life after MARC?" (https://youtu.be/CqmlSRSGDdo).

Posted : Aug 17, 2020 12:04

Bughdana (Ms.) Hajjar

Thank you Jeff. I read all of those. I will reread them with another perspective again. Bughdana

Posted : Aug 17, 2020 12:04


Paula Abisognio

Right on and to the point. I couldn't agree more.

Posted : Aug 13, 2020 10:25


RELATED 

ALREADY A SUBSCRIBER?

We are currently offering this content for free. Sign up now to activate your personal profile, where you can save articles for future viewing

ALREADY A SUBSCRIBER?