SEO Aware URIs

The importance of creating URLs that are favored for human consumption has become more important, particularly as requirements are pulled from the Colorado Alliance of Research Libraries BIBCAT project. Reading Cool URIs don't change by Sir Tim Berners-Lee in date of 2001. Google's recommendations for structuring URLs emphasizes simple URLs that are constructed in a logical fashion, with readable words, and using hyphens for punctuation.

Slugify Pattern

In building the first iterations of the Aristotle Discovery Layer (the latest iteration drives Colorado College's Digital Repository interface at (https://digitalccbeta.coloradocollege.edu) with the web publishing Django platform, I liked that Django provides a function called slugify for creating a SEO friendly url. While I no longer actively develop in Django, I still use this basic regular expression function in many of my projects.

>>> import re
>>> def slugify(value):
    """
    Converts to lowercase, removes non-word characters (alphanumerics and
    underscores) and converts spaces to hyphens. Also strips leading and
    trailing whitespace.
    """
    value = re.sub('[^\w\s-]', '', value).strip().lower()
    return re.sub('[-\s]+', '-', value)
>>> print(slugify("A Midsummer Night's Dream"))
a-midsummer-nights-dream

Wikipedia Pattern

Wikipedia uses a different URL pattern where spaces in the title are replaced with underscores (_) while keep other punctuation like single and double quotes. Here is an example: https://en.wikipedia.org/wiki/A_Midsummer_Night's_Dream

Colorado Alliance of Research Libraries BIBCAT Pilot - BML Iteration Two

Based upon what we learned from last year's BML iteration one for this project with Colorado Alliance of Research Libraries, I'm currently building a second iteration that will use minted BIBFRAME Instance URIs based upon the slug of the BIBFRAME Work's RDF label. We also started to deduplicate the BIBFRAME Instance's based upon a match key that the Alliance staff are using for their Gold Rush comparision service and was extracted from the value in subfield a MARC 997 field. Instead of using a pure RDF-based MARC-to-BIBFRAME RDF Map, we started with the RDF XML output from the https://github.com/lcnetdev/marc2bibframe2 XSLT transformation, ran a process that replaces the URIs with the SEO friendly URIs, and added some additional triples to the primary BIBFRAME Instance and Item.

Below is an example of BIBFRAME 2.0 Instance with the URI for the Instance replaced with https://bibcat.coalliance.org/signing-and-belonging-in-nepal and the Alliance match key added as an additional bf:Local identifier:

<https://bibcat.coalliance.org/signing-and-belonging-in-nepal> a bf:Instance ;
    rdfs:label "Signing and belonging in Nepal /" ;
    bf:carrier <http://id.loc.gov/vocabulary/carriers/nc> ;
    bf:copyrightDate "2016"^^<http://id.loc.gov/datatypes/edtf>,
        "©2016." ;
    bf:dimensions "24 cm." ;
    bf:extent [ a bf:Extent ;
            rdfs:label "xi, 135 pages" ] ;
    bf:generation [ a bf:GenerationProcess ;
            bf:generationDate "2017-04-22T06:16:43.828143" ;
            rdf:value "Generated by BIBCAT version 1.8.0 from KnowledgeLinks.io"^^xsd:string ] ;
    bf:hasItem <https://bibcat.coalliance.org/signing-and-belonging-in-nepal/university-of-colorado-boulder> ;
    bf:identifiedBy [ a bf:Local ;
            bf:source [ a bf:Source ;
                    rdfs:label "OCoLC" ] ;
            rdf:value "930257243" ],
        [ a bf:Isbn ;
            bf:qualifier "(e-book)" ;
            rdf:value "9781563686658" ],
        [ a bf:Isbn ;
            bf:qualifier "(hardcover",
                "alk. paper)" ;
            rdf:value "9781563686641" ],
        [ a bf:Isbn ;
            bf:qualifier "(hardcover",
                "alk. paper)" ;
            rdf:value "1563686643" ],
        [ a bf:Lccn ;
            rdf:value "2015045471" ],
        [ a bf:Isbn ;
            bf:qualifier "(e-book)" ;
            rdf:value "1563686651" ],
        [ a bf:Local ;
            bf:source <https://www.coalliance.org/> ;
            rdf:value "signing_and_belonging_in_nepal___________________________________2016_______gallaudet_a________________________________________hoffmann_dilloway__e_______________" ] ;
    bf:illustrativeContent <http://id.loc.gov/vocabulary/millus/ill> ;
    bf:instanceOf <https://bibcat.coalliance.org/ocn930257243#Work> ;
    bf:issuance <http://id.loc.gov/vocabulary/issuance/mono> ;
    bf:media <http://id.loc.gov/vocabulary/mediaType/n> ;
    bf:note [ a bf:Note ;
            rdfs:label "illustrations" ;
            bf:noteType "Physical details" ],
        [ a bf:Note ;
            rdfs:label "Includes bibliographical references and index." ;
            bf:noteType "bibliography" ] ;
    bf:provisionActivity [ a bf:ProvisionActivity,
                bf:Publication ;
            bf:date "2016"^^<http://id.loc.gov/datatypes/edtf> ;
            bf:place <http://id.loc.gov/vocabulary/countries/dcu> ],
        [ a bf:ProvisionActivity,
                bf:Publication ;
            bf:agent [ a bf:Agent ;
                    rdfs:label "Gallaudet University Press" ] ;
            bf:date "2016" ;
            bf:place [ a bf:Place ;
                    rdfs:label "Washington, DC" ] ] ;
    bf:provisionActivityStatement "Washington, DC : Gallaudet University Press, [2016]" ;
    bf:responsibilityStatement "Erika Hoffmann-Dilloway" ;
    bf:supplementaryContent [ a bf:SupplementaryContent ;
            rdfs:label "Index present" ] ;
    bf:title [ a bf:Title ;
            rdfs:label "Signing and belonging in Nepal /" ;
            bflc:titleSortKey "Signing and belonging in Nepal /" ;
            bf:mainTitle "Signing and belonging in Nepal" ] .

Instead of using the institution's OPAC or Discovery Layer URL for the BIBFRAME Item URI, we decided to construct an Item URI that appends a forward slash "/" followed by the slug of the institution's name. For Instance above, it's primary bf:Item URI is https://bibcat.coalliance.org/signing-and-belonging-in-nepal/university-of-colorado-boulder with the following triples that include a rdfs:seeAlso that links back to the instituion's OPAC or Discovery layer:

<https://bibcat.coalliance.org/signing-and-belonging-in-nepal/university-of-colorado-boulder> a bf:Item ;
    bf:heldBy <http://www.colorado.edu/> ;
    bf:itemOf <https://bibcat.coalliance.org/signing-and-belonging-in-nepal> ;
    bf:shelfMark [ a bf:ShelfMarkLcc ;
            rdfs:label "HV2855.9.H64 2016" ;
            bf:source <http://id.loc.gov/vocabulary/organizations/dlc> ] ;
    rdfs:seeAlso <http://libraries.colorado.edu/record=b8714640> .

This BML iteration is not live yet but will replace the links of the current pilot with MARC XML records from University of Colorado Boulder, Colorado College, and SUNY Buffalo. Because we wanted to added geographic information to improve the search engine result, we needed to used the schema.org ontology to further describe the institutions.

Conclusions

  • Starting with human recognizable, canonical URLs has trade-offs but more inclined to support search engine optimizations especially for public-facing applications
  • Mixing UUID, MARC-based URLs, and SEO URLs is more typical as we build the relationships linking our resources to external bodies and (hopefully) other semantic web applications linking to our published resources

Published on 2017-04-26

Presentation ©2017 by Jeremy Nelson. Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

.