Linking Outside the Box (Linked Data for the Uninitiated, Part 2)

My last post, “Linked Data for the Uninitiated,” was designed to introduce the concept of Linked Data, a philosophy that constructs information to be readable and useful for both machines and humans on the internet.* As a follow up, I had intended to write a review of good Linked Data and RDF-related plugins but quickly realized the futility of such an exercise–plugins are constantly being updated (or becoming outdated); the needs of individual projects and users differ too greatly for me to have concrete advice; and my own expertise lies outside that area. I’ve touched on plug-ins here, but am focusing instead on openness and general suggestions for incorporating Linked Data principles into projects using an out-of-the-box CMS (content management system).  At the end, you’ll find a link to a Zotero bibliography collected in the GC Digital Fellows group geared toward a more technical introduction to Linked Data. (If you have favorite plug-ins, tools, or CMS advice, please weigh in with a comment).

Collecting, sharing, and attributing data
The biggest stumbling block to widespread Linked Data is insufficient digital literacy. W3C’s recent Working Group Document, “Publishing and Linking on the Web,” is evidence of the disconnect between internet use and digital literacy, and ongoing efforts to educate the public. Developers and large institutions will have more complex methods, but I want to emphasize the importance of small-scale or entry-level applications of Linked Data.

As the five-star data model indicates, putting data on the web in an open, machine-readable, non-proprietary format can go a long way toward being a part of the Linked Data cloud.  Data presented as a table or text (rather than in a flat image or PDF) is machine readable, searchable, and more useful. Likewise, even in a printed or PDF version, including a permanent URL for the online version is helpful. It’s best to explicitly note how your data was collected, what your policy is for letting others use the data, and a suggested citation so that users will be able to assess your collection methods and reliability of the data provided. A Creative Commons license can be specified for an entire website or a specific kind of data. Leigh Dodds’ Lost Boy blog has some information about attribution and citation of data. And to cover all sides of the issue, Melanie Conroy’s “Linked Data for Individual Use” series on the HASTAC blog touches on the issue of data quality and when you might not want to use Linked Data.

Creating manual hyperlinks to a stable URLs is another simple way to promote Linked Data on any kind of site. Classics librarian Phoebe Acheson has created a practical primer on best practices for linking bibliographic data. She is also the collector of hundreds of bibliographic entries in her Ancient World Open Bibliographies project that includes a blog, wiki site, and public Zotero library. The project pulls together bibliographic data about the ancient world that is openly available on the web and organizes it. Most importantly, entries link to a permanent URI. WorldCat provides a permanent URL for almost any book or publication. Zotero data can easily be shared and offers lots of categories for recording metadata about each item.

Metadata
Metadata is a set of data that gives information about a particular item (a photo or document, for example). A digital camera may record the date and time a photo is taken or the GPS coordinates of the photo’s location. The geographic information could be mapped or contributed to Open Street Map.  These pieces of information associated with each photo are the item’s metadata. EXIF data is a kind of metadata recorded by digital cameras. When you open a photo in certain sites (like EXIFdata.com) or software (such as Picasa or Adobe Lightroom).

Using keywords in the “tags” feature of a blog post, photograph, or document is another way to add metadata to files to make them more easily searchable. Some sites, including Flickr, also allow machine-readable tags. Pleiades is an example of a site that utilizes machine tags from Flickr users who would like to participate. Participants can create a machine tag on one of their Flickr photos that corresponds to an ancient site that is mapped in Pleiades. The the Pleiades entry for that ancient location can automatically call up the machine-tagged photo to be displayed alongside the rest of the location data. Tom Elliot has written a post explaining how users can link their Flickr photos to ancient world sites. This is a model that could be used in many other kinds of projects, as explained in the Flickr API forum.

Libraries are often keepers and innovators of metadata. For instance, the Library of Congress records metadata for every object in its collection, displayed in a list, as it does for this photo of Ortahisar, Cappadocia. It also keeps a MARC (machine-readable) record, viewable through a link at the bottom of the record. Museums keep similar records, and any researcher can record similar data. The Dublin Core element set is a list of categories that can be used in conjunction with any item (i.e. a book in a library, or a sculpture in a museum). It is important to denote clearly what you are describing: e.g. the Pantheon in Rome or a photograph of the Pantheon taken last summer. (There can be separate metadata for each). Ideally, each metadata element (in Dublin Core or any other element set) will be completed with a description from a controlled vocabulary, (a pre-determined, widely accepted set of terms). Art historians often look to Getty Vocabularies, and Schema.org is a hub for exploring other options. Useful reflections on interpreting these categories can be found on Lee Ann Ghajar’s blog.

Content Management Systems
Although there’s no simple Linked Data solution for using an out-of-the-box Content Management System (CMS), there are some tweaks and hacks available. Here are a few examples:

  • Omeka, designed with the organization and  display of collections in mind,  uses Dublin Core to describe items in collections. (There’s hope for an Omeka Linked Open Data plugin as well).
  • Drupal developers are working on ways to integrate Linked Open Data into a site. (See also Joachim Neubert’s work on Linked Data for special collections).
  • WordPress has a number of linked data plugins available, including ISAW’s Ancient World Linked Data for WordPress, which creates a javascript pop-up when you roll over a link. (Click here for examples). There are also RDF and RDFa plugins available. You can also tweak Permalink Settings so that posts and pages have a cleaner, custom URL (without the default question marks and numbers).

considering longevity
Longevity is a crucial aspect of Linked Data that should be considered from the outset of any project. Encouraging others to use your data is a signal that you intended for it to remain available. It is important to decide who will be paying for server access and whether those funds will be available for the foreseeable future. Many academic institutions have repositories that can archive work indefinitely, often hosted by a library or media lab.

“Link rot” is a problem; it’s a nickname for hyperlinks that call up an error message because the original resource has disappeared. Last week, thousands of researchers temporarily lost access to their data while the US Federal Government and its affiliated websites were shut down. While this is not entirely avoidable, it underscores the importance of permanent URIs that allow us to link to reliable data sets that are likely to remain relevant.

conclusions
Linked Data is accomplished through the structure of data as it is put on the web–through constructing clean URIs when building websites; using tools like plug-ins; and even manually adding hyperlinks to a permanent reference. It helps to do the best you can to incorporate Linked Data principles in the planning of the project, from the beginning. Thoughtfully upload data to make sure it can be exported in useful formats. But I’d like to stress that attempting perfection can cripple a project, particularly one with a limited budget and small group. Barriers to entry can be
overcome through collaboration, through institutions like LAWDI or THATCamp, in graduate education and supplementary workshops, and by promoting open access whenever possible.

further reading
More resources for learning about Linked Data are in the GC Digital Fellows Linked Data folder in the new, public Zotero group.

*This essay is cross-posted on Documenting Cappadocia, a project in the GC New Media Lab, with thanks for the time I was able to spend researching Linked Data.