Introduction to Redis

What is Redis?

Redis is an open-source NoSQL key-value datastore where instead of storing just one type of value, such as a string, more complex types of values can be stored. Redis, short for REmote DIctionary Server, was created and is actively maintained and developed by Salvatore Sanfilippo.

By thinking how data can represented and managed as basic computing data structures like a lists, hashes, and sets, Redis allows you to grasp both positive and negative characteristics of your data and it's structures in a more fundamental, mathematical fashion then going through intermediate structuring process.


About the Instructor

Jeremy Nelson is the Metadata and Systems Librarian at Colorado College, a private liberal arts college located in Colorado Springs. Prior to becoming a librarian, Nelson worked a number of different software and financial services companies as a project manager and programmer. After receiving his Masters in Library and Information Science from the University of Illinois in 2003; Nelson worked as a professional librarian at the University of Utah and Western State University of Colorado before joining Colorado College in 2010.

Nelson's experience with Redis started in 2011 when he started experimenting using Redis as a bibliographic datastore, representing library's physical and digital material using both legacy formats like MARC21, and newer, more linked-data based vocabularies like Resource Description and Access (RDA), schema.org and BIBFRAME. He has published and presented at national library conferences in 2012 and 2013. Currently, Nelson is working on a major project with the Library of Congress in implementing a new BIBFRAME catalog using Fedora Commons digital repository, Elastic search, and of course, Redis.

Exercises and Lab Assignments

Each topic will have hands-on exercises using Redis as well as lab assignments that build upon previous lab work in a functioning application.

Exercise: Setup of Training Environment

We will be using an Ubuntu VM as well as the local workstation for the class exercises and labs.

History of Redis

In the mid-2000s, Salvatore Sanfilippo - Redis's principle developer and maintainer - together with a partner started a free web service called LLOOGG. LLOOGG displayed, in near real-time, visitors access and usage patterns on a website for viewing by administrators and other interested individuals. Sanfilippo and his startup developed and released LLOOGG as a PHP application that parsed web server log files into a MySQL database back-end with a dynamically generated HTML view of the web server's activity.

LLOOGG Screenshot of Redis.io traffic

In working on LLOOGG, Sanfilippo increasingly ran into scalability issues with MySQL as the database was being swamped by writes although performing fine with reading tabular data, As Sanfilippo says in a 2013 interview,

I started to realize that what I needed was an in-memory database with support for complex data structures
. As he explored how to achieve better performance, the first inklings of a new type of key-value data storage that would eventually would become the open-source project Redis officially started on April 10th, 2009.

Early in the development of Redis, Sanfilippo refactored LLOOGG by replacing its relational-data model in MySQL for Redis. His intention for swapping out the data storage was to test the capabilities of Redis with the traffic generated from a running instance of LLOOGG. LLOOGG's sizable web traffic helped in the testing of early, unstable versions of Redis. Although Sanfilippo claims that switching to Redis for the LLOOGG project did not result in the discovery of new bugs, this switch did allowed Sanfilippo to design and architect Redis, while other individuals started using and testing Redis with much larger datasets and at larger scales. This work was the beginnings of the lean, small, and fast code-base that is today's version of Redis.

The first stable Redis 3.0.0 version was released for production use on April 1st, 2015, almost six years after the initial start of the Redis project. While the size, types of commands, and data-structures have increased in number over those six years, the overall goals and performance expectations of Redis have not changed that much from Sanfilippo initial ideas about replacing MySQL with Redis for the LLOOGG project. Redis is still an in-memory database, although it now has more options for persisting that data to disk, in addition to a number of powerful and sophisticated commands for the different data structures that are now possible to store and manipulate data in Redis. The biggest change in Redis 3.0 is the inclusion of Redis Cluster, a mode of operating Redis with multiple instances where data is automatically sharded across those nodes.


Lab Applications & Use Cases

About

After each topic, we will have a short break and then come back for a lab section where we'll either meet as a group or in smaller groups to discuss and experiment with the topics from our Redis training.

Your choice …

To make this Redis training more relevant, we can focus on your own problem or use case that interested in solving with Redis during the lab application time. At least for the first three-to-four topics, it may be easier to use my own use cases but we'll just see how the course flows over the three days.

Current Use Case

Background

At a recent digital library conference, a group made up of representatives from a variety of different academic, public, and non-profit organizations like Digital Public Library of America, Amherst College, Boston Public Library, and Colorado College came together and organized what eventually is a Python-based implementation of an open-source Linked Data Fragments server using the new Python 3.4 asyncio module for building a fast, lightweight network server that uses Redis, a NoSQL datastore technology, for caching results. This presentation will present the preliminary results from testing the Linked Data Fragments server with large datasets from college library catalogs as well as datasets from the Library of Congress and the DPLA.

Linked Data Fragments and SPARQL

Problem

  • Using RDF vocabularies for metadata creation and enhancement often relies on availability and performance of external services
  • These services, unfortunately, are not always reliable (e.g. SPARQL endpoints)
  • Any implementation should be reusable across applications and ideally software independent

Linked Data Fragments

  • Implement a mechanism that allows you to use selector patterns to specify subsection of triples you care about, e.g. skos:prefLabel for LCSH
  • Provide a configurable cache layer for remote resources to speed up lookups on often-requested subjects
  • Linked Data Fragments ( http://linkeddatafragments.org/)
  • Simple standardized requests that are easily cacheable (less unique than SPARQL queries) with most query processing done client side

Past Use Cases

Use Case: Library Consortium Catalog

In my 2013 article, Building a Library App Portfolio with Redis and Django, I sketched out a design for a shared bibliographic Redis datastore among different types of libraries.


(Source Code4Lib)

I'll be using this design as an illustrative use case in the Topic examples through-out this course. Two Linked Data vocabularies will be used; Schema.org, a joint effort of Google, Microsoft, Yahoo, and Yandex for structuring data on the web, and BIBFRAME, from the Library of Congress for representing physical and digital assets of a library.