Building Bibliographic RDF Applications and Microservices

Ingesting MARC

For this activity we will use Python command-line to import the default MARCIngester class from the bibcat/ingesters/marc.py module.

Ingesting a Single MARC21 Record

  1. From your project directory, launch Python from the command-line:

    (py3-env)>python
    >>>             

    or launch Python IDLE:

    (py3-env)>python -m idlelib

  2. Import the pymarc module

    >>> import pymarc

  3. Create an instance MARCReader class using your own MARC21 file or download the 150 MARC Record sample from Colorado College here.

    
    >>> reader = pymarc.MARCReader(
            open("/tmp/rdf-app/cc-marc-sample.mrc", "rb"), 
            to_unicode=True)
            
  4. Retrieve the first MARC21 record from the reader
    >>> first_record = next(reader)
    and print the record
    >>> print(first_record)
    =LDR  00947cam a2200313 a 4500
    =001  40163506
    =003  OCoLC
    =005  19990428161357.0
    =008  981009s1999\\\\mau\\\\\\b\\\\001\0\eng\\
    =010  \\$a98047634
    =020  \\$a0395691303
    =040  \\$aDLC$cDLC$dC#P
    =049  \\$aCOCA
    =050  00$aQP38$b.A54 1999
    =090  \\$aQP38$b.A54 1999
    =100  1\$aAngier, Natalie.
    =245  10$aWoman :$ban intimate geography /$cNatalie Angier.
    =260  \\$aBoston :$bHoughton Mifflin,$c1999.
    =300  \\$axvi, 398 p. ;$c24 cm.
    =500  \\$a"A Peter Davison book."
    =504  \\$aIncludes bibliographical references (p. 369-382) and index.
    =650  \0$aWomen$xPhysiology.
    =650  \0$aWomen$xPsychology.
    =650  \0$aSex differences.
    =902  \\$a150104
    =907  \\$a.b13627557
    =945  \\$aQP38$b.A54 1999$g1$i33027003963844$j0$ltbp  $h0$oc$p$0.00$q $r-$s-$t1$u7$v0$w0$x0$y.i14279873$z990428
    =994  \\$atbp
    =999  \\$b1$c990428$dm$ea$fc$g0
            
  5. Import the MARCIngester Class with the RDF Framework's BIBCAT MARC Ingestion Rules in Turtle RDF format located at kds-bibcat-marc-ingestion.ttl.
    >>> from bibcat.ingesters.marc import MARCIngester
  6. Create an instance of the MARCIngester Class using the default RDF Rules
    >>> marc_ingester = MARCIngester()
  7. Display the number of triples in the RDF Rules Graph loaded from kds-bibcat-marc-ingestion.ttl
    >>> len(marc_ingester.rules_graph)
    206
    Confirm that the BIBFRAME RDF Graph is empty.
    len(marc_ingester.graph)
    0
  8. Run the marc_ingester.transform method on the first_record
    marc_ingester.transform(record=first_record)
  9. Print the BIBFRAME graph serialized as RDF Turtle
    >>> print(marc_ingester.graph.serialize(format='turtle'))
    @prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
    @prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
    @prefix dbo: <http://dbpedia.org/ontology/> .
    @prefix dbp: <http://dbpedia.org/property/> .
    @prefix dbr: <http://dbpedia.org/resource/> .
    @prefix dc: <http://purl.org/dc/elements/1.1/> .
    @prefix dcterm: <http://purl.org/dc/terms/> .
    @prefix dpla: <http://dp.la/about/map/> .
    @prefix edm: <http://www.europeana.eu/schemas/edm/> .
    @prefix es: <http://knowledgelinks.io/ns/elasticsearch/> .
    @prefix kdr: <http://knowledgelinks.io/ns/data-resources/> .
    @prefix kds: <http://knowledgelinks.io/ns/data-structures/> .
    @prefix loc: <http://id.loc.gov/authorities/> .
    @prefix m21: <http://knowledgelinks.io/ns/marc21/> .
    @prefix mods: <http://www.loc.gov/mods/v3> .
    @prefix ore: <http://www.openarchives.org/ore/terms/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix relators: <http://id.loc.gov/vocabulary/relators/> .
    @prefix schema: <http://schema.org/> .
    @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
    @prefix void: <http://rdfs.org/ns/void#> .
    @prefix xml: <http://www.w3.org/XML/1998/namespace> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    
    <http://bibcat.org/9e054b36-0097-11e7-b2b0-a8667f19014b> a bf:Item ;
        bf:generationProcess [ a bf:GenerationProcess ;
                bf:generationDate "2017-03-04T05:01:24.971684" ;
                rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
        bf:itemOf <http://dpla.coloradovirtuallibrary.org/9d817a06-0097-11e7-897a-a8667f19014b> .
    
    <http://bibcat.org/9d817a06-0097-11e7-897a-a8667f19014b> a bf:Instance ;
        bf:classification [ a bf:ClassificationLcc ;
                rdf:value "QP38 .A54 1999" ] ;
        bf:copyrightDate "1999." ;
        bf:dimensions "24 cm." ;
        bf:extent [ a bf:Extent ;
                rdf:value "xvi, 398 p. ;" ] ;
        bf:generationProcess [ a bf:GenerationProcess ;
                bf:generationDate "2017-03-04T05:01:24.917334" ;
                rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
        bf:identifiedBy [ a bf:Isbn ;
                rdf:value "0395691303" ] ;
        bf:instanceOf [ a bf:Work ;
                bf:originDate "1999" ] ;
        bf:provisionActivity [ a bf:Publication ;
                relators:pbl "Houghton Mifflin," ] ;
        bf:subject [ a bf:Topic ;
                rdf:value "Women" ],
            [ a bf:Topic ;
                rdf:value "Sex differences." ],
            [ a bf:Topic ;
                rdf:value "Women" ] ;
        bf:supplementaryContent [ a bf:SupplementaryContent ;
                rdf:value "Includes bibliographical references (p. 369-382) and index." ] ;
        bf:title [ a bf:InstanceTitle ;
                bf:mainTitle "Woman :" ;
                bf:subtitle "an intimate geography /" ] ;
        relators:aut [ a bf:Person ;
                schema:name "Angier, Natalie." ] .
    
    

Creating Custom Turtle RDF Rules File

  1. Start with a new terminal window and create a new custom directory
    mkdir custom
  2. Open a text editor and copy these RDF Namespaces into a new file:
    
    @prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
    @prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
    @prefix kds: <http://knowledgelinks.io/ns/data-structures/> .
    @prefix kdr: <http://knowledgelinks.io/ns/data-resources/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    @prefix relators: <http://id.loc.gov/vocabulary/relators/> .
    @prefix m21: <http://knowledgelinks.io/ns/marc21/> .
    @prefix schema: <http://schema.org/> .
    @prefix loc: <http://id.loc.gov/authorities/> .
        
  3. The first rule will be to associate a new bf:Item with Colorado College's Tutt Library using the IRI of https://www.coloradocollege.edu/library/ through the bf:heldBy predicate (substitute your own institutional IRI defined in the Knowledge Graph activity)

    
    bc:bf-Organization a kds:PropertyLinker;
        kds:destPropUri [ bf:heldBy <https://www.coloradocollege.edu/library/> ] ;
        kds:destClassUri bf:Item .
        
  4. The second rule will extract the barcode from the MARC 945 field, subfield i and create a blank-node bf:Barcode with a linked range of bf:barcode to the bf:Item About MARC URIs
    bc:mrc-barcode a kds:PropertyLinker ;
        kds:srcPropUri m21:M945__i;
        kds:destClassUri bf:Barcode ;
        kds:destPropUri rdf:value ;
        kds:linkedRange bf:barcode ;
        kds:linkedClass bf:Item .
    
        
  5. Now, save your MARC-to-BIBFRAME RDF Turtle file in the custom directory as custom/cc-marc.ttl.
  6. Going back to the running Python session, we will read and use the second MARC record from the MARCReader instance
    >>> second_record = next(reader)
    And print the record
    >>> print(second_record)
    =LDR  00921pam a2200277 a 4500
    =001  38144340
    =003  OCoLC
    =005  19991207162048.0
    =008  971205s1999\\\\njua\\\\\b\\\\001\0\eng\\
    =010  \\$a97049002
    =020  \\$a0134905172
    =040  \\$aDLC$cDLC$dUKM
    =049  \\$aCOCA
    =050  00$aQC806$b.L48 1999
    =090  \\$aQC806$b.L48 1999
    =100  1\$aLillie, Robert J.,$d1952-
    =245  10$aWhole earth geophysics :$ban introductory textbook for geologists and geophysicists /$cRobert J. Lillie.
    =260  \\$aUpper Saddle River, N.J. :$bPrentice Hall,$cc1999.
    =300  \\$ax, 361 p. :$bill. (some col.) ;$c26 cm.
    =504  \\$aIncludes bibliographical references and index.
    =650  \0$aGeophysics.
    =902  \\$a160511
    =907  \\$a.b13756497
    =945  \\$aQC806$b.L48 1999$g1$i33027004066753$j0$ltbp  $h0$oc$p$0.00$q $r-$s-$t1$u12$v0$w2$x3$y.i14450355$z991207
    =994  \\$atbp
    =999  \\$b1$c991207$dm$ea$fc$g0
            
  7. Creating a new MARCIngester instance and with our cc-marc.ttl RDF turtle rule file.
    >>> marc_ingester = MARCIngester(custom='cc-marc.ttl')
  8. With the custom rules added to the default rules, we will run the marc_ingester.transform method with the second_record MARC21 record
    >>> marc_ingester.transform(record=second_record)
  9. Finally, we will print the serialized form of the BIBFRAME output marc_ingester.graph
    >>> print(marc_ingester.graph.serialize(format='turtle').decode())
    @prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
    @prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
    @prefix kds: <http://knowledgelinks.io/ns/data-structures/> .
    @prefix loc: <http://id.loc.gov/authorities/> .
    @prefix m21: <http://knowledgelinks.io/ns/marc21/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix relators: <http://id.loc.gov/vocabulary/relators/> .
    @prefix schema: <http://schema.org/> .
    @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
    @prefix void: <http://rdfs.org/ns/void#> .
    @prefix xml: <http://www.w3.org/XML/1998/namespace> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    
    <http://bibcat.org/bb1ac782-0123-11e7-9986-a8667f19014b> a bf:Item ;
        bf:barcode [ a bf:Barcode ;
                rdf:value "33027004066753" ] ;
        bf:generationProcess [ a bf:GenerationProcess ;
                bf:generationDate "2017-03-04T21:44:23.368616" ;
                rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
        bf:heldBy <https://www.coloradocollege.edu/library/> ;
        bf:itemOf <http://bibcat.org/ba977364-0123-11e7-871c-a8667f19014b> .
    
    <http://bibcat.org/ba977364-0123-11e7-871c-a8667f19014b> a bf:Instance ;
        bf:classification [ a bf:ClassificationLcc ;
                rdf:value "QC806 .L48 1999" ] ;
        bf:copyrightDate "c1999." ;
        bf:dimensions "26 cm." ;
        bf:extent [ a bf:Extent ;
                rdf:value "x, 361 p. :" ] ;
        bf:generationProcess [ a bf:GenerationProcess ;
                bf:generationDate "2017-03-04T21:44:23.250965" ;
                rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
        bf:identifiedBy [ a bf:Isbn ;
                rdf:value "0134905172" ] ;
        bf:instanceOf [ a bf:Work ;
                bf:originDate "1999" ] ;
        bf:provisionActivity [ a bf:Publication ;
                relators:pbl "Prentice Hall," ] ;
        bf:subject [ a bf:Topic ;
                rdf:value "Geophysics." ] ;
        bf:supplementaryContent [ a bf:SupplementaryContent ;
                rdf:value "Includes bibliographical references and index." ] ;
        bf:title [ a bf:InstanceTitle ;
                bf:mainTitle "Whole earth geophysics :" ;
                bf:subtitle "an introductory textbook for geologists and geophysicists /" ] ;
        relators:aut [ a bf:Person ;
                schema:name "Lillie, Robert J.," ] .
    
            
    From this transformation, we see that our custom rules have populated the objects for bf:barcode and the bf:heldBy predicates.

Using a Preexisting bf:Item IRI

  1. We can construct an IRI for linking directly to Colorado College's legacy ILS the bib number located in the MARC 907 field, subfield a subfield. Do this we will first create a function generate_item_iri that takes a MARC 21 record and returns a rdflib.URIRef that links directly to the library's catalog.
    >>> import rdflib
    >>> def generate_item_iri(record):
            if not '907' in record:
                return
            bib_number = record['907']['a'][1:-1]
            return rdflib.URIRef("http://tiger.coloradocollege.edu/record={}".format(bib_number))
    
                
  2. Next, we'll retrieve and use the third record in the reader
    >>> third_record = next(reader)
    And print the third_record
    >>> print(third_record)
    =LDR  01469cam a22003614a 4500
    =001  61109349
    =003  OCoLC
    =005  20070130035705.0
    =008  050714s2006\\\\caua\\\\\b\\\\001\0\eng\\
    =010  \\$a2005019975
    =020  \\$a1412916186 (cloth)
    =020  \\$a9781412916189 (cloth)
    =020  \\$a1412916194 (pbk.)
    =020  \\$a9781412916196 (pbk.)
    =040  \\$aDLC$cDLC$dYDXCP$dBAKER$dUKM$dYBM$dIG#$dOCLCQ$dBTCTA
    =042  \\$apcc
    =043  \\$an-us---
    =049  \\$aCOCA
    =050  00$aQA13$b.P67 2006
    =050  00$aQA13$b.P67 2006
    =100  1\$aPosamentier, Alfred S.
    =245  10$aWhat successful math teachers do, grades 6-12 :$b79 research-based strategies for the standards-based classroom /$cAlfred S. Posamentier, Daniel Jaye.
    =260  \\$aThousand Oaks, Calif. :$bCorwin Press,$cc2006.
    =300  \\$axix, 197 p. :$bill. ;$c26 cm.
    =504  \\$aIncludes bibliographical references (p. 183-191) and index.
    =505  0\$aManaging your classroom -- Enhancing teaching techniques -- Facilitating student learning -- Assessing student progress -- Teaching problem solving -- Considering social aspects in teaching mathematics.
    =650  \0$aMathematics$xStudy and teaching (Secondary)$xStandards$zUnited States.
    =700  1\$aJaye, Daniel.
    =902  \\$a160104
    =907  \\$a.b16842455
    =945  \\$aQA13$b.P67 2006$g1$i33027005249309$j0$ltbp  $h0$oc$p$0.00$q $r-$s-$t1$u3$v26$w1$x0$y.i17378928$z070130
    =994  \\$atbp
    =999  \\$b1$c070130$dm$ea$fc$g0
    
            
  3. Using the new generate_item_iri function on third_record results in an item IRI of http://tiger.coloradocollege.edu/record=b1684245
    >>> item_iri = generate_item_iri(third_record)
    >>> print(item_iri)
    http://tiger.coloradocollege.edu/record=b1684245
            
  4. With the bf:Item function created, we will run the marc_ingester.transform on the third_record and pass in the item_iri to the function with a keyword parameter.
    >>> marc_ingester.transform(record=third_record, item_uri=item_iri)
    Finally print the serialized marc_ingester.graph in Turtle:
    >>> print(marc_ingester.graph.serialize(format='turtle').decode())
    @prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
    @prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
    @prefix kds: <http://knowledgelinks.io/ns/data-structures/> .
    @prefix loc: <http://id.loc.gov/authorities/> .
    @prefix m21: <http://knowledgelinks.io/ns/marc21/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix relators: <http://id.loc.gov/vocabulary/relators/> .
    @prefix schema: <http://schema.org/> .
    @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
    @prefix void: <http://rdfs.org/ns/void#> .
    @prefix xml: <http://www.w3.org/XML/1998/namespace> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    
    <http://tiger.coloradocollege.edu/record=b1684245> a bf:Item ;
        bf:barcode [ a bf:Barcode ;
                rdf:value "33027005249309" ] ;
        bf:generationProcess [ a bf:GenerationProcess ;
                bf:generationDate "2017-03-05T15:10:34.526978" ;
                rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
        bf:heldBy <https://www.coloradocollege.edu/library/> ;
        bf:itemOf <http://bibcat.org/e133d028-01b5-11e7-94fe-ac87a3129ce6> .
    
    <http://bibcat.org/e133d028-01b5-11e7-94fe-ac87a3129ce6> a bf:Instance ;
        bf:classification [ a bf:ClassificationLcc ;
                rdf:value "QA13 .P67 2006" ] ;
        bf:copyrightDate "c2006." ;
        bf:dimensions "26 cm." ;
        bf:extent [ a bf:Extent ;
                rdf:value "xix, 197 p. :" ] ;
        bf:generationProcess [ a bf:GenerationProcess ;
                bf:generationDate "2017-03-05T15:10:34.453103" ;
                rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
        bf:identifiedBy [ a bf:Isbn ;
                rdf:value "1412916186 (cloth)",
                    "1412916194 (pbk.)",
                    "9781412916189 (cloth)",
                    "9781412916196 (pbk.)" ] ;
        bf:instanceOf [ a bf:Work ;
                bf:originDate "2006" ] ;
        bf:provisionActivity [ a bf:Publication ;
                relators:pbl "Corwin Press," ] ;
        bf:subject [ a bf:Topic ;
                rdf:value "Mathematics United States." ] ;
        bf:supplementaryContent [ a bf:SupplementaryContent ;
                rdf:value "Includes bibliographical references (p. 183-191) and index." ] ;
        bf:tableOfContents [ a bf:TableOfContents ;
                rdf:value "Managing your classroom -- Enhancing teaching techniques -- Facilitating student learning -- Assessing student progress -- Teaching problem solving -- Considering social aspects in teaching mathematics." ] ;
        bf:title [ a bf:InstanceTitle ;
                bf:mainTitle "What successful math teachers do, grades 6-12 :" ;
                bf:subtitle "79 research-based strategies for the standards-based classroom /" ] ;
        relators:aut [ a bf:Person ;
                schema:name "Posamentier, Alfred S." ] .
    
    
        

Processing Multiple Records

In the final exercise, we will process the remaining MARC records and adding each output graph to a master graph that we will then save

  1. First create an empty RDF graph
    >>> master_graph = rdflib.Graph()
  2. Use a for loop to iterate through the remaining MARC records in the reader
    >>> for record in reader:
    	item_iri = generate_item_iri(record)
    	marc_ingester.transform(record=record, item_uri=item_iri)
    	master_graph += marc_ingester.graph
    	print(".", end="")
    ..................................................................................
    ..................................................................
            
  3. Save the master_graph in a new output directory in your RDF application directory
    >>> mkdir output
    >>> with open("/tmp/rdf-app/output/cc-150-sample.ttl", 'wb+') as fo:
    	fo.write(master_graph.serialize(format='turtle'))
    
    	
    266918
            

Used in …

Colorado Alliance of Research Libraries

Colorado Alliance BIBCAT Pilot

Blazegraph Triplestore, RDF Framework, BIBCAT

Using selected MARC records from Colorado College and the University of Colorado Boulder that were generated from the Alliance's Gold Rush comparison service, this project uses the BIBCAT to transform MARC records into BIBFRAME Linked Data. The RDF data is published to the web as Schema.org JSON-LD for indexing by Google, Bing, and other search engines. BIBCAT uses RDF rules that map MARC fields and subfields to BIBFRAME 2.0 entities and properties.

Source Code Repository Live Application