GoldRush® BIBCAT

October 5th, 2017 Status update on the GoldRush BIBCAT project.

Introduction

In it's second development iteration, the GoldRush® BIBCAT project is publishing Library Linked Data from three of the Alliance's partner libraries for better exposure and discovery by commercial search engines like Google and Microsoft's Bing.


Build-Measure-Learn Project Basics

As part of the Lean Startup (and the topic of my first book :-) project management approach breaks a project down into three parts.

Build

The start of the Lean project is to first gather requirements from the core constituencies and then developed an initial Minimum Viable Product (MVP). The goal of the MVP is to only build the least amount of functionality and user interface design to meet but not exceed these requirements.

Measure

The next step in the loop is Measure, identifying key metrics and then releasing the MVP for use by your patrons, customers, or other end users. Sometimes after an initial release, minor adjustments can be made to the MVP based on the measurements and feedback from the core user constituencies.

Learn

After completing the Measure phase, the Learn phase involves analyzing the metrics that results in one of three decisions:

  • Continue the project and start preparing for the next BML iteration's Build phase.

  • Pivot the focus of project to a new area or service based on the actionable metrics.

  • Terminate the project based on negative metrics or administrative decision.


Iteration 1 - Building

The Initial MVP for the GoldRush BIBCAT project is to first generate a collection of MARC records from the Alliance GoldRush Comparison service and then using a custom BIBCAT class, transform the MARC21 records to BIBFRAME 2.0 RDF Linked Data. This three-step process is illustrated below:


Iteration 1 - Measuring

In the Measure phase of this iteration, we tested the result of Google's indexing through by running a number of Google queries to see if the Instances are being picked up by Google.

BF Instances available for indexing: 157,804

Ad Hoc General Google Searches:

  • Helen Hunt Jackson Ramona:
    • Total results: 383,000 (as of 3/1/2017)
    • BIBCAT coalliance result? Not in the first 700 hits
  • James Dean transfigured :the many faces of rebel iconography:
    • Total results: 2,070 (as of 3/1/2017)
    • BIBCAT coalliance result? 6th hit on first Google result page

Site Specific Search on http://bibcat.coalliance.org

  • Helen Hunt Jackson Ramona:
    • Total results: 12 (as of 3/1/2017)
  • James Dean transfigured :the many faces of rebel iconography:
    • Total results: 1 (as of 3/1/2017)


Iteration 1 - Learning

From the actionable metrics from the Measure Phase, we came to the following conclusions:

  • Search results for more unique items starting showing up on the first page of Google results, more common items were buried deep in the results pages.
  • ILS restriction on indexing Catalogs, both University of Colorado's and Colorado College's robots.txt file restricted indexing of individual items, hence the explanation of the missing pages in the Google search.
  • Publishing Landing Pages as HTML, as the Alliance cannot control what individual vendors permit in terms of search engine indexing, the initial BF Instance stub or placeholder view should be expanded to offer
  • GIS information embedded in the Schema.org metadata doesn't seem to impact result order in a Google search.
  • SEO friendly URLs, using UUIDs for the BF Instance and Items should be modified for easy for human consumption.

BML Loop Two

After the completed Build-Measure-Learn loop and deciding to continue the project, two important changes to the project occurred. The first was that the the Library of Congress released a new MARC-to-BIBFRAME 2 tool at https://github.com/lcnetdev/marc2bibframe2. The second was the addition of a third partner institution, SUNY Buffalo.


Iteration 2 - Building

The second development cycle for GoldRush BIBCAT began earlier in 2017 and implemented a number of changes to how the BIBFRAME Linked Data is structured and published to the web for search engine crawling. Using this Repository's Github Issue tracking, bugs and enhancements are entered and tracked through through-out the build phase.

  1. In the initial step, MARC XML is exported out of the GoldRush© Comparison Tool that includes a 997 field containing the Alliance Match Key.

  2. The next step is to iterate through the XML export file and process each MARC XML file by running the XML record through the Library of Congress MARC2BIBFRAME2 that uses multiple XSLT files to transform the MARC XML into BIBFRAME 2.0 RDF XML

  3. With the BIBFRAME RDF XML, we then run a process to replace the generated URIs of the largest BIBFRAME Instance and associated BIBFRAME Items to use the following patterns with the "slugified" title:

    BIBFRAME Instance: https://bibcat.coalliance.org/{title-of-instance-in-lowercase-spaces-replaces}
    BIBFRAME Item:https://bibcat.coalliance.org/{title-of-instance-in-lowercase-spaces-replaces}/{institution-name}
  4. Because of the large number of triples in the Library of Congress output RDF, we then run our first RDF-Map to shrink the total number of triples to more manageable size, that is still valid BIBFRAME 2.0

  5. The final step then runs a SPARQL query and transforms the resulting RDF graph to Schema.org RDF which is then serialized as JSON-LD along with a new landing page.


Iteration 2 - Measuring

Search Engine Indexing

With the release of the second iteration of the GoldRush BIBCAT project, measuring the impact publishing the linked data has difficult because of problems with getting Google or Microsoft to fully index all of the BIBFRAME Instances and later Items into the search index.

Speed of Workflow

We are also optimizing the speed of the workflow as a key metric as in anticipation of the next BML Build where we move to 500,000-1,000,000k source MARC XML records.

Google Analytics

To provide a different view of the web traffic of the BIBFRAME Instances and Items, we added Google Analytics tracking code to each of these views.

Thank-you!