Chapter 3. Levels of text representation

Version 1.1 (5 May 2004)

 

3.1 Introduction
3.2 Levels of text representation
3.3 Bringing it all together

Back to list of contents

 

3.1 Introduction

A transcription is basically a representation of a primary source in another format, such as paper or the electronic medium. Some transcriptions aim to reproduce the source text as closely as possible, others allow for a certain amount of generalisation. In transcriptions of speech, a distinction is usually drawn between narrow and broad transcriptions, depending on the amount of phonetic detail. The same perspective applies to transcriptions of manuscript texts. Close (or narrow) transcriptions are usually referred to as diplomatic, while regularised transcriptions are often referred to as normalised. This is the basic distinction drawn in e.g. Wittgenstein's Nachlass: The Bergen Electronic Edition (1998-2000). Here, all texts are available in two versions, a diplomatic transcription and a normalised one. For examples, please refer to this page.

We suggest that medieval Nordic texts may be transcribed on up to three levels. In addition to the normalised level, we identify two closer levels. We shall refer to the narrowest level as the facsimile level, while the "medium" level is designated as diplomatic. The three levels are exemplified in ch. 3.2 below.

The distinction between three levels of text representation does not mean that Menota transcription should contain all three levels. Many transcribers will probably choose a single level for their transcription. Our recommendation is to use these levels as a guide, so that a transcription can be described as following one of these levels. This information should be given in the header, and can optionally be given by use of specific elements in the transcription itself, as discussed in ch. 3.2 below. If a transcriber wishes to deviate from any of these levels, and there may be good reasons to do so, we recommend that the deviations are specified in the header.

It is convenient to begin by looking at a Latin text example, Passio et Miracula Beati Olavi. An important source for this work is Corpus Christi College, Oxford MS 209, a vellum manuscript from the late 12th century. Below is a low resolution facsimile from the very beginning of the Passio, with an XML-conformant transcription to the right. For a facsimile of the whole manuscript in high resolution, please refer to Early Manuscripts at Oxford University.

 

Fig. 3.1 CCC 209, fol. 57r., l. 1-15.
© Corpus Christi College, Oxford

<head>Passio et miracula beati Olavi></head>

<div type="section" n="1">
<p><hi rend="blue">R</hi>egnante illustrissimo rege Olauo apud Noruuegiam, que est terra pregrandis uersus aquilonem locata, a meridie Daciam habens, eandem ingressi sunt terram pedes euuangelizancium pacem, euuangelizancium bona.</p>
</div>

<div type="section" n="2">
<p>Hactenus sacrilegis ydolorum mancipate ritibus et supersticiosis erroribus deluse nationes ille ueri Dei cultum et fidem audierant; audierant quidem, set multi suscipere contempserant.</p>
</div>

<div type="section" n="3">
<p>Sicut enim loca aquiloni proxima inhabitabant, ita familiarius eas possederat et tenaciori glacie infidelitatis astrinxerat aquilo ille, a quo panditur omne malum super uniuersam faciem terre, et a cuius facie ollam succensam uidet Ieremias, et qui in Ysaia iactanter profert:</p>
</div>

<div type="section" n="4">
<p><quote>Super astra celi exaltabo solium meum, sedebo in monte testamenti in lateribus Aquilonis.</quote></p>
</div>

(Adapted from an edition by Lars Boje Mortensen 2003, University of Bergen. Cf. also the edition by Frederick Metcalfe 1881.)

(When the illustrious King Óláfr ruled in Norway, a vast country located towards the north and having Denmark to the south, there entered into that land the feet of them that preach the gospel of peace and bring glad tidings of good things. The peoples of that country, previously subject to the ungodly rites of idolatry and deluded by superstitious error, now heard of the worship and faith of the true God - heard indeed, but many scorned to accept. Living in a region close to the north, it was the same north, from which comes every evil over the whole face of earth, that had possessed them all the more inwardly and gripped them all the more firmly in the ice of unbelief. From its face Jeremiah saw a seething pot; and in Isaiah there is the boaster who says, "I will exalt my throne above the stars of God: I will sit also upon the mount of the congregation, in the sides of the north.") [Translated by Devra Kunin 2001.]

The transcription above is easily readable, even in its "raw" XML format. In fact, if it was stripped for all elements, it would look like a plain ASCI text from any word processor:

Passio et miracula beati Olavi Regnante illustrissimo rege Olauo apud Noruuegiam, que est terra pregrandis uersus aquilonem locata, a meridie Daciam habens, eandem ingressi sunt terram pedes euuangelizancium pacem, euuangelizancium bona. Hactenus sacrilegis ydolorum mancipate ritibus et supersticiosis erroribus deluse nationes ille ueri Dei cultum et fidem audierant; audierant quidem, set multi suscipere contempserant. Sicut enim loca aquiloni proxima inhabitabant, ita familiarius eas possederat et tenaciori glacie infidelitatis astrinxerat aquilo ille, a quo panditur omne malum super uniuersam faciem terre, et a cuius facie ollam succensam uidet Ieremias, et qui in Ysaia iactanter profert: Super astra celi exaltabo solium meum, sedebo in monte testamenti in lateribus Aquilonis.

With the help of an XML style sheet, the text could be displayed with a certain amount of formatting on the basis of the mark-up. For example, the title (<head>) might be shown in bold type, the initial might be rendered with an enlarged capital in blue colour, sections might be set out in separate paragraphs and numbered in bold type, and the Biblical quotation could be given in italics:

Passio et miracula beati Olavi
1. Regnante illustrissimo rege Olauo apud Noruuegiam, que est terra pregrandis uersus aquilonem locata, a meridie Daciam habens, eandem ingressi sunt terram pedes euuangelizancium pacem, euuangelizancium bona.
2. Hactenus sacrilegis ydolorum mancipate ritibus et supersticiosis erroribus deluse nationes ille ueri Dei cultum et fidem audierant; audierant quidem, set multi suscipere contempserant.
3. Sicut enim loca aquiloni proxima inhabitabant, ita familiarius eas possederat et tenaciori glacie infidelitatis astrinxerat aquilo ille, a quo panditur omne malum super uniuersam faciem terre, et a cuius facie ollam succensam uidet Ieremias, et qui in Ysaia iactanter profert:
4. Super astra celi exaltabo solium meum, sedebo in monte testamenti in lateribus Aquilonis.

Medieval Nordic texts need not contain any more mark-up than in this example, and they will still be fully valid XML. However, in order to comply with the Menota standard, it should follow the TEI guidelines. There is thus some information that must be entered at the very beginning of the file, and there must be a header. For an example of a header, please go to the Menota header. The basic structure of the file is thus quite simple:

<?xml version="1.0" encoding="UTF-8"?>
<TEI.2>
<teiHeader>

Here goes structured information on the text and the transcription.
</teiHeader>
<text>
<body>

Here goes the text as exemplified above.
</body>
</text>
</TEI.2>

It is important to keep in mind that a transcription may be as straightforward and readable as this, and it would be fully acceptable as a Menota text.

However, not all primary sources are equally straight-forward to transcribe. For most vernacular sources entities will be required to deal with additional characters, and we might also like to transcribe the text at a more diplomatic level than in this example. For example, the last word on the very first line is transcribed as "apud". In the facsimile above, we see that it has been written with the letters "ap" and a superlinear abbreviation mark. Some transcribers might want to record the fact that the latter two characters have been expanded by the transcriber, for example by using the <expan> tag:

ap<expan>ud</expan>

Other transcribers would like to encode the actual abbreviation mark being used, in this case a superlinear bar. This might be encoded with the help of an entity such as "&bar;", meaning "a horizontal bar placed above the preceding character":

ap&bar;

Yet other transcribers would like to encode the fact that the word has been abbreviated with a superlinear bar AND that this abbreviation should be expanded as "ud" in this particular context. The superlinear bar is highly ambigious; in this short extract alone, it should be expanded as "m" in "terram" (l. 4), "ut" in "Sicut" (l. 9), "ni" in "enim" (l. 9), "n" in "omne" (l. 12).

The more information the transcriber wants to put into the text, the more complex it will become. The next chapters will go into more details.

 

3.2 Levels of text representation

We believe that there are three focal levels of text representation for medieval Nordic texts and suggest that a transcription should reflect at least one of these levels. Furthermore, a transcription should be easily expandable so as to accommodate one or two additional levels. This time, we shall use a short extract from an Old Icelandic manuscript, AM 645 4to (first quarter of the 13th century).


Fig. 3.2 Enlarged and slightly edited extract from AM 645 4to fol. 55v, l. 14-15 (cf. the photographic facsimile in Hreinn Benediktsson 1965, pl. 28)

 

3.2.1 Facsimile level

On this level, the text is transcribed character for character, line for line. Allographic variation is to a great extent reflected in the transcription, and abbreviation marks are copied without any expansion. Thus, the text in fig. 3.2 would be transcribed as

karin&us; &et; leunti&us; fun&dunc;o&slong;c e&bar;i ig&osup;<lb/>fo&bar; &slong;ino&bar; eft&er; vp&isup;ri&slong;o c&isup;&slong;z

and displayed (subject to appropriate fonts) as


Fig. 3.3 Facsimile rendering of the example text in fig. 3.2.

Note that the superlinear "i" above "p" in the second line most likely is a mistake by the scribe. At the facsimile level, the transcriber ought to encode the manuscript exactly as it reads, even if it contains obvious mistakes. Corrections can be made by inserting a note, or it can be left to the diplomatic or normalised level.

 

3.2.2 Diplomatic level

On this level, not all types of allographic variation are transcribed, and line divisons are usually not shown in the display of the transcription. Abbreviations are expanded, but the expanded part is clearly marked. Some obvious mistakes may be corrected, such as "Leuntius" > "Leutius" (accepting "ti" for "ci", though), while the superlinear "i" has been suppressed. In the transcription expansions are set out by the element <expan> and in the display usually by italics. The text would then be transcribed as

karin<expan>us</expan> <expan>ok</expan> leunti<expan>us</expan> fun&dunc;o&slong;c e<expan>ig</expan> i ig<expan>ro</expan><lb/>fo<expan>m</expan> &slong;ino<expan>m</expan> eft<expan>ir</expan> vp<note>superlinear i suppressed</note>ri&slong;o c<expan>ri</expan>&slong;z

and displayed as (now disregarding the line break)


Fig. 3.4 Diplomatic rendering of the example text in fig. 3.2.

 

3.2.3 Normalised level

On this level, the orthography is regularised according to the norm found in grammars and dictionaries for the language in question. For Old Icelandic and Old Norwegian texts we recommend the normalisation rules in AMKO's dictionary (ONP). Abbreviations are expanded silently, and punctuation is regularised as well. Thus, the text in fig. 3.1 would be transcribed as

Karinus ok Levcius fundusk eigi &iacute; gr&oogon;fum s&iacute;num eptir upprisu Krists

and displayed as


Fig. 3.5 Normalised rendering of the example text in fig. 3.2.

For a more detailed discussion of these levels, please refer to Haugen 1995.  

 

 

3.3 Bringing it all together: more than one level of text representation in a transcription

The transcriptions in ch. 3.2 each reflect a specific level of text representation. However, we believe that the transcription should be expandable to accommodate more than one level. Here, we shall suggest two ways of achieving this. We regard both solutions as equivalent, and believe that the choice between them will be dictated by practical considerations. In both cases, we recommend using the <w> element to group each lexical word in the transcription, as explained in ch. 2.3.

 

3.3.1 Representing textual levels by way of attributes

We recommend that texts are transcribed on a facsimile level, as exemplified in 3.2.1 above. This level will be the basic one in the transcription and the content of the <w> element. The diplomatic level can be supplied by the attribute rend and the normalised form by the attribute reg. Since elements are not allowed in attributes, the function of the <expan> element is taken over by curly brackets, e.g. "han{n}" for "han<expan>n</expan>". Thus, the text in fig. 3.1 could be encoded with all three levels as

<w rend="karin{us}" reg="Karinus">karin&us;</w>
<w rend="{oc}"reg="ok">&et;</w>
<w rend="leuti{us}" reg="Levcius">leunti&us;</w>
<w rend="fundosc" reg="fundusk">fun&dunc;o&slong;c</w>
<w rend="e{ig}i" reg="eigi">e&bar;i</w>
<w rend="i" reg="&iacute;">i</w>
<w rend="g{ro}fu{m}" reg="gr&oogon;fum">g&osup;<lb/>fo&bar;</w>
<w rend="sino{m}" reg="s&iacute;num">&slong;ino&bar;</w>
<w rend="eft{ir}" reg="eptir">eft&er;</w>
<w rend="vp{p}riso" reg="upprisu">vp&isup;ri&slong;o</w>
<w rend="c{ri}sz" reg="Krists">c&isup;&slong;z</w>

With the help of style sheets in XML, the text may be displayed on each level:
(a) the facsimile level is the content of the <w> element, i.e. what the transcription reads when all elements are stripped
(b) the diplomatic level is the content of the rend attribute, in which "{" is interpreted as "start italics" and "}" as "end italics" in a typical display
(c) the normalised level is the content of the reg attribute

 

3.3.2 Representing textual levels by way of elements

Here, we suggest that each level is identified by elements within <w>: <facs> for the facsimile rendering, <dipl> for the diplomatic rendering (in which the element <expan> is used), and <norm> for the normalised rendering. This makes for a parallel encoding, in which up to three text strings co-exist within the boundaries of the <w> elements. For the sake of clarity, we have set out each word in a paragraph of its own:

<w>
<facs>karin&us;</facs>
<dipl>karin<expan>us</expan></dipl>
<norm>Karinus</norm>
</w>

<w>
<facs>&et;</facs>
<dipl><expan>oc<expan></dipl>
<norm>ok<norm>
</w>

<w>
<facs>leunti&us;</facs>
<dipl>leuti<expan>us</expan></dipl>
<norm>Leucius</norm>
</w>

<w>
<facs>fun&dunc;o&slong;c</facs>
<dipl>fundosc</dipl>
<norm>fundusk</norm>
</w>

<w>
<facs>e&bar;i</facs>
<dipl>e<expan>ig</expan>i</dipl>
<norm>eigi<</norm>
</w>

<w>
<facs>i</facs>
<dipl>i</dipl>
<norm>&iacute;</norm>
</w>

<w>
<facs>g&osup;<lb/>fo&bar;</facs>
<dipl>g<expan>ro</expan><lb/>fo<expan>m</expan></dipl>
<norm>gr&oogon;fum</norm>
</w>

<w>
<facs>&slong;ino&bar;</facs>
<dipl>sino<expan>m</expan></dipl>
<norm>s&iacute;num</norm>
</w>

<w>
<facs>eft&er;</facs>
<dipl>eft<expan>ir</expan></dipl>
<norm>eptir</norm>
</w>

<w>
<facs>vp&isup;ri&slong;o</facs>
<dipl>vp<note>superlinear i suppressed</note>riso</dipl>
<norm>upprisu</norm>
</w>

<w>
<facs>c&isup;&slong;z</facs>
<dipl>c<expan>ri</expan>sz</dipl>
<norm>Krists</norm>
</w>

The display of the transcription is made by style sheets in XML:

(a) the facsimile level is the content of the <facs> element
(b) the diplomatic level is the content of the <dipl> element, in which the <expan> element describes expanded abbreviations
(c) the normalised level is the content of the <norm> element

The elements <facs>, <dipl> and <norm> are not defined in TEI, but are part of the DTD we have defined. Please see the DTD in the tool chest.

 

Top of page

 

Version 1.0 published 20 May 2003. Version 1.1 published 5 May 2004.