Building Bibliographic RDF Applications and Microservices

Colorado Alliance BIBCAT Pilot

Colorado Alliance of Research Libraries

Colorado Alliance BIBCAT Pilot

Blazegraph Triplestore, RDF Framework, BIBCAT

Using selected MARC records from Colorado College and the University of Colorado Boulder that were generated from the Alliance's Gold Rush comparison service, this project uses the BIBCAT to transform MARC records into BIBFRAME Linked Data. The RDF data is published to the web as Schema.org JSON-LD for indexing by Google, Bing, and other search engines. BIBCAT uses RDF rules that map MARC fields and subfields to BIBFRAME 2.0 entities and properties.

Source Code Repository Live Application

The primary goal of this project is increasing the exposure of a library's catalog collections when patrons are doing searches using web engines like Google or Microsoft Bing

Live site:

http://bibcat.coalliance.org/


Project Iterations

BML Iteration One

Build

In the first Build phase of iteration one, we created a Flask project with the latest versions of BIBCAT and RDF Framework as git submodules that are imported as Python modules into the application.

The minimum viable product features for this iteration are:

  1. Transform and ingest 171,559 MARC 21 records sampled from Colorado College and the University of Colorado at Boulder into 7,056,697 RDF triples
  2. All BIBFRAME Instances are created using an URL pattern of http://bibcat.coalliance.org/ followed by a random UUID, the BIBFRAME Item attempts to use the direct URL of the catalog record in the respective library's ILS as an IRI.
  3. BIBFRAME Works are blank nodes and no de-duplication of BIBFRAME Instances or linking to external was attempted in this iteration.
  4. Create four views (or routes) in the application; an HTML front and detail pages, a XML sitemap, and an XML sitemap index.
  5. Detail page for each BIBFRAME Instance embedded JSON-LD Schema.org metadata for indexing by Google. The detail page's JSON-LD also embedded latitude and longitude and address information for library holding that Instance.
  6. Construct linked Docker containers using Docker Compose using three Docker images, a bibcat Application image, a custom nginx webserver image that routes traffic to bibcat, and a jermnelson/semantic-server-core image that provides a Blazegraph triplestore and a Fedora Commons 4 digital repository for use by bibcat.

Measure

  • Number of BF Instances available for indexing: 157,804
  • Ad Hoc General Google Searches:
    1. Helen Hunt Jackson Ramona:
      Total results: 383,000 (as of 3/1/2017)
      BIBCAT coalliance result? Not in the first 700 hits
    2. James Dean transfigured :the many faces of rebel iconography:
      Total results: 2,070 (as of 3/1/2017)
      BIBCAT coalliance result? 6th hit on first Google result page
  • Site Specific Search on http://bibcat.coalliance.org
    1. Helen Hunt Jackson Ramona:
      Total results: 12 (as of 3/1/2017)
    2. James Dean transfigured :the many faces of rebel iconography:
      Total results: 1 (as of 3/1/2017)

Learn

Ad hoc SEO is more difficult then it initially appears, over the course of the first iteration, we changed the appearance of the detail page because it was showing up the Google search results and we wanted to emphasize the library's respective catalog by embedding the page in an iframe.

Check robots.txt!! We were curious when doing a testing on search engines that the Alliance URLs were showing up but not the Instance IRIs to the respective institution's catalogs. We discovered that the catalog vendor (both Colorado College and University of Colorado Boulder have the same ILS vendor) has disallowed search engines from indexing the records in the catalog.

BML Iteration Two

Build

We are currently defining what features to add to the Alliance BIBCAT project for the second development iteration.

  • Change how URLs for BF Instances are constructed; instead of using a random UUID we will plan on creating more friendly SEO URLs by using the bf:InstanceTitle and "slugify" the title.
  • Instead of depending on the vendor's to open up their catalogs for indexing, we are going to mint new bf:Item IRIs using the slugified approach and include a notation in the slug of what institution holds the item.
  • A new bf:Instance view will have links out to the all of bf:Item entities and each bf:Item may include an embedded Google Map to the institution holding the bf:Item
  • De-duplication of the bf:Instance entities will occur through the use of a prior match key the Alliance uses for their Gold Rush product

Thanks & Acknowledgments

Image of George Machovec

George Machovec

George Machovec is the Executive Director of Colorado Alliance of Research Libraries and was instrumental in initiating this project starting with in two members of the consortium with plans to extend to the entire membership.

Email

Image of Steve Walker

Steve Walker

Steve Walker is the Systems and Network Administrator

Email