Transforming TEI for the Web

Last month, I led a workshop for the GC Digital Initiatives on “Getting Started with TEI.” For those who don’t know, TEI (short for Text Encoding Initiative) is a method for encoding, or “tagging,” texts in such a way that both humans and computers can make sense of them. It is a set of guidelines used for electronic editing and working with textual data in the humanities, social sciences and linguistics, which is based on XML (the eXtensible Markup Language).  With TEI, editors and scholars can “tag” a text for various features such as structure, typography, or references. My workshop specifically focused on how TEI facilitates the digital transcription of hand-written manuscripts. We practiced encoding a couple of pages from Oscar Wilde’s manuscript of “The Picture of Dorian Gray,” paying particular attention to how Wilde edited out the homosexual elements and innuendos as he revised his draft.

In the image below, you can see how the workshop participants used TEI to mark up the revisions that Wilde made on the passage. Here, the participants went through the manuscript line by line to indicate what is written and where things are struck out, added, or cannot be read. Among the available TEI elements, we focused on <del>, <add>, and <gap>, which indicate deletions, additions, and undecipherable script, as well as <rend>, which describes how or where a piece of text is rendered, such as with a strikethrough, or above the current line. For example, in the below excerpt from our encoding, you can see the <rend> elements in orange.

At the end of the workshop, we were able to see our progress by transforming the TEI file into HTML, which we could then view in the browser. The browser rendition shows how the typographic elements appear once applied to the source text. As a result, it’s useful for presenting the manuscript details in an easy to read format. For example, an excerpt tagged with “strikethrough” would present the text with a line running across it. You can see this presentation in the image at the top of this article, displays our transformed TEI side by side with the original manuscript page. In textual editing terms, this kind of transcription is known as a “diplomatic transcription” of a manuscript.

So how did we get this TEI document to render in a diplomatic transcription? The answer has to do with the XSLT stylesheets. XSLT stands for Extensible Stylesheet Language for Transformations. It is a method (or a series of rules) for converting data from one XML language to another, such as TEI to HTML. With an XSLT stylesheet, you can “transform” a TEI document into HTML, which is then visible on a web browser.

In most basic terms, XSLT works by mapping one set of standards into another. In this case, we map TEI into HTML by turning each TEI element into its equivalent in HTML. For example, we might take a <head> element in TEI (which encodes headings) and turn it into a <h1> element in HTML (which indicates a first-level heading). Or, we might take <line> (for a line of text) in TEI, and turn it into <p> (for paragraph) in HTML. The code for these instructions in the XSLT file would appear like so:

<xsl:template match=”head”>

            <h1><xsl:apply-templates/></h1>

</xsl:template>

<xsl:template match=”line”>

            <p><xsl:apply-templates/></p>

</xsl:template>

Don’t worry if this code looks unfamiliar. All it does is take everything within the <head> and <line> tags in TEI and places it inside <h1> and <p> tags, respectively. All XSLT files contains a list of such instructions, which transfer information from a TEI format into an HTML format.

Below, I will demonstrate how you might run your own transformations on your TEI files. Since the learning curve for working with TEI transformations (and XSLT in particular) is quite steep, I will be focusing on two relatively easy ways to transform TEI: first, with an application called oXygen, and second, with a package called the TEI Boilerplate. Both methods facilitate the transformation of TEI in a way that doesn’t require to you to write any code or create your own customization.

First, let’s look at oXygen, which is an XML editor. oXygen is not free, but it’s by far the best XML editor out there. Running a transformation with oXygen is super easy because the application allows you to run the “Transformation Scenario” right then and there. To get oXygen, you have to download the program (you can get a free trial version for 30 days, and/or obtain a yearly academic license for about $100). To run the transformation, first open your TEI files in oXygen. Then, on the right side of the application window, you’ll see a panel that reads “Transformation Scenario.” You’ll want to check off the appropriate scenario, which is “TEI P5 XHTML.” This option indicates that you want to process your TEI for an HTML output. Then, you’ll press the red play button at the top of the window. A few seconds later, your browser will open the transformed TEI file. It should look something like the image of the diplomatic transcription above.

Now, let’s look at TEI Boilerplate. This is by far the easiest, free way to transform your TEI for the web. Developed by a team at Indiana University, TEI Boilerplate is meant to be lightweight and relatively quick to implement. To use the editor, first download the TEI Boilerplate files to your computer. Then, add your TEI files to the teibp/content directory of TEI Boilerplate. Then, open your TEI files in a text editor and add a piece of processing instructions to the top of each TEI file. It should go after the XML declaration and before the first <TEI> element:

<?xml-stylesheet type=”text/xsl” href=”teibp.xsl”?>

This piece of code indicates that the XML should be processed by the TEI Boilerplate XSLT stylesheet. After adding it to your TEI file, you can open that file in the browser to see the transformation. With my Dorian Gray passage, the transformation looks like this:

You might notice some key differences from the first transformation with oXygen. There aren’t any line breaks, for one. And this document doesn’t render the “gaps” (or indecipherable text) in the manuscript. With everything jumbled together and no indication of gaps, it’s a bit more difficult to follow than the previous version. As you may know, this is what sometimes happens when you opt for a lightweight version of a tool. TEI Boilerplate isn’t as robust as oXygen, which can catch all of the TEI elements in the transformation, including the gaps and the line breaks in the encoding. The good news is that both methods are customizable, if you’re willing to learn a little bit more about how they work with XSLT.

To learn more about both of these methods, and about XSLT in general, refer to the following resources. These excellent tutorials, articles, and slideshows are written by giants like the Women Writers Project and Digital.Humanities@Oxford, as well as influential TEI developers like Lou Burnard and Sebastian Rahtz. These resources are only a starting point, but they should get you well on your way to running (and customizing!) your own TEI transformations.

General TEI Transformation and Publication Resources

Transformation and Publication Primer, Women Writers Project

Transforming TEI slideshow, Sebastian Rahtz

TEI Boilerplate resources

TEI Boilerplate Homepage

TEI Boilerplate tutorial, John Walsh, Indiana University

TEI Boilerplate tutorial, Lou Burnard Consulting

XSLT Resources

XSLT Tutorial, Women Writers Project

TEI XSLT Stylesheets slideshow, TEI@Oxford

Working with XSLT for TEI XML, Digital.Humanities@Oxford

Online XSLT transformer