Introducing BIBCAT - a BIBFRAME Catalog

Jeremy Nelson
Metadata and Systems Librarian
Colorado College

Presentation's abstract, source code (including content) is licensed under the GPLv3 and is available on github and built using Flask and Skeleton.

Annotate and share your thoughts on this presentation by commenting using service.


How can we improve our catalogs? The traditional ILS (integrated library systems) is expensive, complicated, and our patrons don't care as much about the technology only about finding and retrieving information and resources from their local library. The library's ILS now being marketed as "library services platform" (Breeding 2012), touches most areas of the library's operations, collections, and public services. The functionality needed to support acquisition and circulation work-flows - not mention the increasingly complex electronic resource management for journals and e-books - traditionally requires these large enterprise software systems to include all this functionality into large monolithic "black-boxes".

We can only expect that the recent announcement that Proquest is acquiring Ex Libris will continue the pattern of library technology consolidation into larger corporate entities that consume larger and larger portions of a library's budgets.

How often does our Patron's Research starts Here?

Real, question, how often do YOU start Here?

Instead of library systems that are…

  • Expensive & opaque
  • Top-down
  • Surface customizations

Can an alternative exist?



But how?

Catalog Pull Platform

The Catalog Pull Platform is the research effort by Jeremy Nelson to develop a flexible, lightly coupled, bibliographic and semantic web suite of technology for libraries, museums, and other cultural heritage organizations to develop small and efficient technology solutions for bibliographic and digital asset management.

Push vs. Pull

Inspired by both the Lean Startup and Toyota's Lean Manufacturing, the Catalog Pull Platform moves away from trying to anticipate the needs and then "pushing" services and technology to patrons and staff, but instead identifies and responds to needs and demands for library technology by "pulling" directly from various constituencies served by libraries and cultural heritage organizations.

Catalog Pull Platform


People - primary source of pull

People are the first and most important source of library systems demand in the Catalog Pull Platform. By listening to patrons, staff, and other vested individuals build lightweight utilities to meet that need.


Institutions - Responsiblity and Accountibility

Institutions are the second source of pull for the Catalog Pull Platform as both internal and external groups hold libraries accountable for implicit and explicit outcomes.


Algorithms - Connecting Networked Library Systems

To function as first-class citizens in the networked information environment, the library's primary website and catalog must be easily found Google, Facebook, and other third parties.

By publishing their institution's organizational knowledge and collections as Linked Data and providing open APIs for use by other people and institutions, library systems function as critical asset.

Group Exercise

  1. Please divide yourselves into groups of three, four, or five people.
  2. For each source pull; come up two or three examples of pull from your own experience.
  3. Have own group member enter your responses by highlight and annotating each pull source - people, institutions, and algorithms - with

Interested in Learning more?

Next month, my book Becoming a Lean Library will be published by Chandos Publishing and goes into much more detail on how a Catalog Pull Platform can be used in improving Library operations and services.

BIBFRAME - Library of Congress's MARC21 Replacement

BIBFRAME - short of Bibliographic Framework Initiative - is the Library of Congress efforts to replace MARC21 with a linked-data vocabulary.

What is Linked Data?

Linked Data is about representing information in statements called RDF Triples i.e. subject-predicate-object. Libraries standards are shifting to Linked-Data vocabularies like BIBFRAME, RDF-VRA, MODS RDF, and

With a modern linked-data based vocabularies, a library system should be able to use any vocabulary that is needed to solve a problem or issue. We should be able to "mix-and-match" the elements we need to properly describe our resources for ours and or patron's needs.

What is a triple?

A triple is a simple statement that links a subject through a relationship called a predicate, to a third object. Any of these three elements - subject, predicate, or object - can be an IRL (International Resource Identifier) most commonly an uniform research locator or URL. This flexibility of allowing an IRI to have any of these three roles allows common data to be shared with others (the "linking" of linked data) while also offering institutions the opportunity to customize and capture information and data important to them.


A subject is either an IRI or a Blank Node. A Blank Node is just a unique identifier local to a particular graph and is not universal ID



Predicates must be an IRI; blank nodes or literals values are not acceptable.


An object resource can be IRI, Blank Node, or Literal value
"Mark Twain"

RDF Graphs

An RDF Graph is a collection of subject-predicate-object triples. Below is an example of a simple RDF graph that you may have in your next library catalog for the author Jane Austen and her book Pride and Prejudice.

Subject Predicate Object "Pride and Prejudice"

RDF Graph Displays



    "@id": "",
    "": [
        "@value": "Pride and Prejudice"
    "": [
        "@id": ""
    "@id": "",
    "@type": [
    "": [
        "@id": ""


In 2014 the Library of Congress issued an RFQ for a BIBFRAME "search and display" system to highlight BIBFRAME. Aaron Schmidt of Influx Library User Interface Design and Jeremy Nelson were awarded the contract with the Library of Congress that resulted in BIBCAT - a BIBFRAME Catalog - a lightweight catalog web application that uses the backend BIBFRAME Datastore. The current pilot that is loaded with sample datasets from the Library of Congress (original records were generated from MARC records related to the subjects, "Mark Twain" and "Bible")

Bibcat provides a modern web interface to collections of BIBFRAME RDF graphs


Librarians are guardians of books. They help others along their paths, offering keys to help unlock the doors of knowledge.
-- In the House of the Seven Librarians by Ellen Klages

Colorado College TIGER Catalog & Website

For the past five years, Tutt Library at Colorado College has been involved in experimenting with new approached to bibliographic systems based on diferent NoSQL technologies and emerging linked data standards and vocabularies.


Aristotle Discovery Layer

The first public bibliographic systems developed at Colorado College is Aristotle, a bibliographic Django project for creating discovery and management of born digital and physical artifacts. Aristotle uses a number of other open-source toolset including EULFedora, EULXML, Sunburnt, and PyMARC.

For the Discovery interface, Aristotle uses a forked version of the Kochief Django application.


Prospector Discovery App


For a 2013 Code4Lib presentation, a new Django app was created in the Aristotle Library Apps that used a random selection from the member libraries of the Colorado Alliance of Research Libraries's union catalog. A custom mapping between MARC to the then BIBFRAME model was developed that converted MARC21 to Redis BIBFRAME Redis data structures.


Aristotle Library Apps

Prior to the Aristotle Discovery Layer, initial development of a web application was started in 2011 with the purpose to allow Colorado College seniors to self-submit their Thesis and supporting datasets to the Library's Digital Repository. . This approach resulted in a couple of other internal applications, including a Fedora Repository PID mover and a metadata batch application for adding multiple Fedora Objects using a template.

TIGER Catalog (1st & 2nd iteration)


The first iteration of the TIGER Catalog uses Solr and MongoDB for backend storage and search, and Flask for the web-front end. MongoDB Logo This iteration uses json-serialized MARC21 records that are saved into MongoDB with Solr used for searching.

Extends BIBCAT by pulling directly from the requirements of Colorado College's People and Institutions.

Tutt Library Website


With all of the library's operational and bibliographic information stored in the Catalog Pull Platform, the library and Colorado College will be able to embed facts about the library's collections and the college into the website.

The big shift for Colorado College is that we're exploring the merging of the Tutt Library's catalog with it's website.

Where are we at now?