Where is the meaning of a literary text to be found?
There are many facets of textual meaning. Some are expressed through layout, structure, and content. Others are interpreted meanings. Textual markup makes the meanings explicit so that they can be processed reliably, either by computer algorithms or by scholars working with the text. This document provides a gentle (and mostly theoretical introduction) to the methods of using textual markup for the exploration of literary meaning.
Descriptive markup allows us to make explicit distinctions in the text in a formal way. It helps identify what aspects of the text are, rather than what they look like.
Encoding textual markup requires editorial and interpretive decisions. Markup can help answer research questions and deciding what markup is needed can be a research activity in itself. Detailed document analysis is needed before encoding for the resulting markup to be useful. You must ask which features to markup, why you are choosing to markup these features, and how consistently you will be able to do so.
XML is language for document markup that was designed specifically for the web. A document’s content is divided up into descriptive elements which form a hierarchical tree (a single root and many nodes). XML looks very similar to html, except that it must be well-formed (follow strict coding rules) and it is extensible (not limited to a small set of elements). A basic xml file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<message priority="high">Hello!</message>
</root>
The "priority" part of the
Notice that the
Web pages are marked up in html (or xhtml, the version that follows the rules of xml). Browsers are programmed to display (x)html in specific ways. That is, they transform the marked up content so that it will have a pre-determined appearance. Since (x)html was not originally intended for this purpose, it is not very good for supplying style. So it may be supplemented by a more powerful styling language such as css (Cascading Style Sheets). These provide the browser with further styling instructions.
An xml document has no inherent style properties and cannot be styled directly by css. Instead, the xml elements first have to be transformed into (x)html elements in order for them to appear in a web page. Complex processing instructions are needed to do this. One of the most common methods of doing this is xslt (Extensible Stylesheet Language Transformations). This is a language for selecting xml elements and writing them into other documents which may include (x)html and css code. A document in this language is perhaps deceptively called an xslt stylesheet. The language is complicated, but here is what an xslt stylesheet might do.
<p>
elements.<p class="high">
. This will produce a web page that says: Hello!
So why use xml? Why not just directly create a web page using (x)html and css? There are a large number of reasons, and the ones given here are only examples.
In short, an xml document can be put to more uses than an xhtml document. It is not restricted to the single platform of a web browser. Even in the web environment, it is a great deal more flexible for the conveying and manipulating of textual meaning.
XML documents have to follow a schema, a pre-determined set of elements and attributes. This is the case for (x)html; however, with an xml document, you produce your own schema, one that contains the elements and attributes relevant for structuring analyzing your document. You are not bound to use the ones devised for displaying documents in web browsers.
The rules for creating schemas are complex. One shortcut is to use a pre-made schema and then modify it. The one must suitable for the study of literary texts is the TEI (Text Encoding Initiative) schema. This schema was created specifically for the study of literary documents, especially by scholars working in the humanities. The benefit of using the TEI schema is that your document will be readable by a wide variety of applications which can process TEI-encoded documents.
Using the TEI schema is as simple as stating that you are using it in your xml document and then following the coding guidelines in the TEI documentation. Unfortunately, the guidelines are very complex (there are over 300 TEI elements). That said, a simple TEI document is not too hard to produce.
Applying an XSLT stylesheet to your document is also simple. Underneath the xml declaration, you include a link to your stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="path_to_stylesheet.xsl" type="text/xsl"?>
The suffix .xsl is used for xslt stylesheets.
Writing the stylesheet is another matter. XSLT is a complex language which is much harder to learn than other markup languages. Therefore, it is recommended that considerations of how the document will appear in output be left to the end of a project.
We thus return to the questions asked at the beginning of this document. How do we find meaning in a literary text, and how can we use textual markup to encode this meaning? The TEI schema provides some useful guidance, and the next portion of this document will explore what its guidelines have to offer. We can also, of course, supplement the TEI schema if necessary, and we should be thinking about elements or attributes me might need, but which are not specified by the TEI.
This introduction is heavily indebted to the series of tutorials put together by James Cummings for the Man of Law's Tale Project workshop at Adam Mickiewicz University , Poznan, Poland. Following this link will lead you to other resources from the TEI @ Oxford. More complete information on the TEI can be found on the main Text Encoding Initiative web site.