Building Bibliographic RDF Applications and Microservices

Technology - Catalog Pull Platform

The technology used in this pre-conference are both loosely-linked components of the Catalog Pull Platform, an approach to developing bibliographic digital services based on pull the requirements and features from the communities served by libraries and other cultural heritage institutions. Inspired by both the Lean Startup management philosophy and Toyota's Lean Manufacturing, the Catalog Pull Platform moves away from trying to anticipate the needs and then "pushing" services and technology to patrons and staff, but instead identifies and responds to needs and demands for library technology by "pulling" directly from various constituencies served by these institutions.

BIBCAT

In 2014 the Library of Congress issued an RFQ for a BIBFRAME "search and display" system to highlight BIBFRAME. Aaron Schmidt of Influx Library User Experience and Jeremy Nelson were awarded the contract with the Library of Congress. Together with additional help of Mike Stabile, resulted in BIBCAT - a BIBFRAME Catalog - a lightweight catalog web application that indexes BIBFRAME 1.0 Linked Data into Elasticsearch.

When the Library of Congress released BIBFRAME 2.0 in April of 2016, the tools for converting MARC records to these modified or new BIBFRAME classes and properties were missing.

Links:

RDF Framework

The RDF Framework is the future of the Catalog Pull Platform. The vision of the RDF Framework is be able to define a microservice or application in RDF for such entities as BIBFRAME or Schema.org entities. Using the class and property semantics defined in a OWL or other ontology, the RDF Framework provides a single Python class that provide CRUD operations through an automated HTML5 form interface, a REST API, or BIBCAT Ingester.

Currently in alpha status, utilities defined in the RDF Framework are currently be used in BIBCAT with the goal being full integration for version 1.0.0.

Links:

BIBCAT - Ingesters

Warning! the first iteration of the BIBCAT ingesters are painfully slow partly because of all of the look-ups and SPARQL queries involved. We are actively speeding up this process.

This preconference will focus on most developed part of the BIBCAT, the metadata ingester classes that take as inputs, different metadata formats like MARC21, MODS XML, Dublin Core RDF XML, and CSV files, transforms them into BIBFRAME 2.0 and Schema.org RDF triples and then ingests the triples into a Blazegraph triplestore.

Two of the three projects we will examine in depth use different Python Ingester classes. Coupled with both default and custom RDF rules that map the incoming data to RDF triples, the ingesters queries the triplestore to de-duplicate RDF subjects

BIBCAT - Linkers and Generators

Besides ingesters, BIBCAT has two other types of Classes called Linkers and Generators. Linkers provide ways to enhance existing BIBFRAME entities by linking them to external sources of RDF linked data like DBPedia and the Library of Congress.

The BIBFRAME Work Generator, located in bibcat.generators.work, queries the triplestore for Instances with missing Works or works that are blank nodes, and first attempts to resolve the Instance to an existing Work and if that fails, creates a new BIBFRAME Work IRI and properties based the values in BF Instance.