Encoding questions - and answers

Encoding questions - and answers

1. Transpositions
2. Corrections in the manuscript
3. Discontinuous sections
4. Fusion of words
5. Chapter headings inline
6. Chapter headings in the margin
7. Interlinear glosses
8. Punctuation at the beginning of a line
9. Superfluous hyphens
10. Split words
11. Names in the margin

By Tone Merete Bruvik (Aksis, Unifob) and Odd Einar Haugen (University of Bergen)

This is a list of encoding questions we have received after the publication of The Menota handbook v. 2.0. Please do not take these answers as the final truth - there are indeed many ways of analysing and encoding a text.

Note that we refer to the P5 version of the TEI Guidelines. For this reason, elements which are specific for Menota, have been added in a namespace and are prefixed with “me:”, e.g. <me:facs>.

We would like to thank Haraldur Bernhardsson (Reykjavík) and Andrea de Leeuw van Weenen (Leiden) for comments to this page.

1. Transpositions

Question: Sometimes words, phrases, sentences or stanzas are transposed. How should this be encoded?

Answer: In a multi-level transcription like the one of the Eddic poems, we recommend that the text should be encoded strictly as it is - word by word, line by line - on the facs level and probably also on the dipl level. On the norm level, however, the encoder may transpose text passages. We believe this works well as long as the transposition is contiguous. For example, if the sequence of words is 1-2-3-4-5 in the manuscript, but the encoder believes it should be 1-2-4-3-5, this can be encoded by using the <choice> and then the <orig> and <reg> elements.

This is a simplified example, using only the facs and the norm levels. Note that the <choice> element is used to wrap up the <me:facs> and the <me:norm> elements:

      <w>
        <choice>
          <me:facs>word 1</me:facs>
          <me:norm>word 1</me:norm>
        </choice>
      </w>

      <w>
        <choice>
          <me:facs>word 2</me:facs>
          <me:norm>word 2</me:norm>
        </choice>
      </w>

      <choice>
        <orig>
          <w>
            <me:facs>word 3</me:facs>
          </w>
          <w>
            <me:facs>word 4</me:facs>
          </w>          
        </orig>
        <reg>
          <w>
            <me:norm>word 4</me:norm>
          </w>
          <w>
            <me:norm>word 3</me:norm>
          </w>          
        </reg>
     </choice>

     <w>
       <choice>
         <me:facs>word 5</me:facs>
         <me:norm>word 5</me:norm>
       </choice>
     </w>

We would not encourage too complex transpositions. A Menota transcription is typically a transcription which stays close to the manuscript. The transcription on the norm level is above all a transcription in a normalised orthography. It is not the result of higher criticism.

For complex transpositions, we would recommend using the elements <interGrp> and <interp>. See the TEI P5 Guidelines ch. 17.3.

2. Corrections in the manuscript

Question: Sometimes words are corrected by the scribe, using signs like '/:' , '/.' or '' ' ' '' (examples from GKS 2365 4to). How should these signs and the corrected text be encoded?

Answer: We assume that these signs are used for transpositions, as for example in Vsp. 38.3-4, which according to Bugge's edition has (in a simplified orthography)

menn mord vargar
meins vara oc

However, the sign '/:' above “mord vargar” and the sign '/.' above “meins vara” indicates that the correct reading should be

menn meinsvara
oc mord vargar

In other words, this is a case of transposition, which is discussed in section 1 above. For the rendering of signs like '/:' etc., we would be inclined to use the <add> element with the @place and @type attributes. One may look upon the correction sign as a kind of addition, performed by the scribe for indicating a transposition.

This is a possible encoding:

      <w>
        <choice>
          <me:facs>menn</me:facs>
          <me:facs>menn</me:facs>
        </choice>
      </w>

      <choice>
        <orig>
          <w>
            <me:facs>mord vargar</me:facs>
          </w>
          <add place="supralinear" type="transposition-sign">/:</add>
            <w>
              <me:facs>meins vara</me:facs>
            </w>
          <add place="supralinear" type="transposition-sign">/.</add>
          <w>
            <me:facs>oc</me:facs>
          </w>          
        </orig>        
        <reg>
          <w>
            <me:norm>meinsvara</me:norm>
          </w>
          <w>
            <me:norm>ok</me:norm>
          </w>
          <w>
            <me:norm>mordvargar</me:norm>
          </w>          
        </reg>
      </choice>

For a similar case, see ch. 7.2.1 of the Menota handbook.

3. Discontinuous sections

Question: A section of a text, e.g. a sermon, may by interrupted and continued later on. This is the case in the Old Icelandic book of homilies. How should this be encoded?

Answer: We recommend putting each part of a discontinuous sermon in its own <div> element, so that there will be no overlapping structures:

      <div type="sermon">
        <p>The first part of a sermon</p>
      </div>
      <div type="sermon">
        <p>Another sermon</p>
      </div>
      <div type="sermon">
        <p>The second part of a sermon</p>
      </div>

The next step is to link the two discontinuous sections. That can be done by using the @xml:id, the @next and the @prev attributes, as discussed in the TEI P5 Guidelines ch. 16.7:

      <div type="sermon" xml:id="sermon1part1" next="sermon1part2">
        <p>The first part of a sermon</p>
      </div>
      <div type="sermon">
        <p>Another sermon</p>
      </div>
      <div type="sermon" xml:id="sermon1part2" prev="sermon1part1">
        <p>The second part of a sermon</p>
      </div>

Note that the @xml:id values must be unique. So, if there are three examples of a discontinuous sermon in the whole text, one might use the following values: 'sermon1part1' , 'sermon1part2' , 'sermon2part1' , 'sermon2part2' , 'sermon3part1' and 'sermon3part2' . Generally, attribute values can not contain any spaces, and, specifically, the @xml:id must begin with an alphabetic character, not a number.

4. Fusion of words

Question: The word “þaz” should be analysed as a fusion of “þat es > þats > þaz”. In this case, the character “z” reflects the final part of a word and the beginning of the next, enclitic word. How should this be encoded in a lemmatised text?

Answer: We recommend using the <seg> element to wrap up the two words with the @enc attribute indicating that there should be no break between them (cf. ch. 8.3.2.11 of the handbook):

      <seg type="enc">
        <w lemma="s&aacute;">
          <choice>
            <me:facs>&thorn;az</me:facs>
            <me:dipl>&thorn;az</me:dipl>
            <me:norm>&thorn;at</me:norm>
          </choice>
        </w>
        <w lemma="er">
          <choice>
            <me:facs></me:facs>
            <me:dipl></me:dipl>
            <me:norm>s</me:norm>
          </choice>
        </w>
      </seg>

On the facs and dipl levels, the word will be displayed as “þaz”, and on the norm level as “þats”. The word form “þaz” will be linked to the lemma “sá”, and the enclitic form “s” to the lemma “er”.

5. Chapter headings inline

Question: Chapter headings are often placed in between chapters. For example, the heading may be placed towards the end of one or more lines straddling two chapters. How should this be encoded?

Answer: If a chapter heading is placed inline, i.e. within the frame of the column, we recommend the encoding specified in the handbook, ch. 4.6. This is an example:

05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
06 . . . . . . . . . . . . . . these are the last words
07 Here begins the text of CAPITULUM of chapter 1.
08 chapter 2 . . . . . . . . . . . . . . . . . . . . . . .
09 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

We recommend that the text is encoded in the logical order:

      <div type="chapter">
        <p> . . . . . . . . . .
          <lb n="6"/> . . . . . . . these are the last words
          <lb n="7"/>of chapter 1.
        </p>
      </div>
      <div type ="chapter">
        <head><lb n="7"/>CAPITULUM</head>
        <p>
          <lb n="7"/>Here begins the text of
          <lb n="8"/>chapter 2 . . . . . . . .
        </p>
      </div>

Note that the position of each segment in line 6 and 7 are indicated using the <lb/> element. Note that the same linebreak may be indicated several places. In this case, there are three segments in line 7, and correspondingly, there are three <lb/> elements with the same value, 7, for the @n attribute.

Next, the order of the segments must be specified, so that the stylesheet can display the text in its factual order (on the facs level) or its logical order (on the norm level). This can be specified with the @rend attribute:

      <div type="chapter">
        <p> . . . . . . . . . .
          <lb n="6"/> . . . . . . . these are the last words
          <lb n="7"/>of chapter 1.
        </p>
      </div>
      <div type ="chapter">
        <head rend="inline middle"><lb n="7"/>CAPITULUM</head>
        <p>
          <lb n="7"/>Here begins the text of
          <lb n="8"/>chapter 2 . . . . . . . .
        </p>
      </div>

6. Chapter headings in the margin

Question: Chapter headings may also be placed in the margin, sometimes disconnected. In one example, the heading begins in the upper margin and continues in the left margin. How should this be encoded?

Answer: If the heading is placed outside the column, i.e. entirely in the margin, we recommend that it is encoded as an addition:

      <head>
        <add place="margin-top">This is a </add>
        <add place="margin-left">heading for a chapter</add>
      </head>

The encoding of additions is discussed in ch. 7.2.1 of the handbook.

7. Interlinear glosses

Question: Glosses may be added above words. For example, the Latin text of the credo has been glossed in the Old Icelandic homily book. How can glosses and glossed words be linked in the encoding?

Answer: We suggest that the words on the baseline and the interlinear glosses are regarded as two original readings, and that the relationship between them is that of the choice: one may read either the one or the other. Thus, we recommend using the <choice> and the <orig> elements. In this example, “ego” has been glossed as “ek”:

      <choice>
        <orig rend="interlinear">
          <w>ek</w>
        </orig>
        <orig xml:lang="lat">
          <w>ego</w>
        </orig>
      </choice>

Assuming that the language of the whole text has been specified as Old Norse in the <text> element in the beginning of the encoded text, it is only necessary to single out the Latin words using the @xml:lang attribute. Do not forget that the languages found in the text must be specified in the header. See ch. 10.4 of the handbook.

8. Punctuation at the beginning of a line

Question: Sometimes a line begins with a punctuation mark. How should this be encoded?

Answer: In a multi-level transcription, we recommend that all punctuation marks are placed in the <me:punct> element. An example:

      <p>
        <lb n="14"/> . . . . . . . .
          <w>final</w>
          <w>word</w>
        <lb n="15"/>
           <me:punct>.</me:punct>
           <w>New</w>
           <w>sentence</w>
           <w>begins</w>
            . . . . . . . .
      </p>

The style sheet specifies that there should be a space after a <me:punct> element in the display.

9. Superfluous hyphens

Question: The line sometimes end with a superfluous hyphen. How should this be encoded?

Answer: We suggest that the hyphen is encoded as it stands on the facs level and possibly on the dipl level, but that it is left out at the norm level. Otherwise, the encoding will be as recommended in ch. 4.8.2 of the handbook. The overall structure will be like this:

      <p>
        <lb n="14"/> . . . . . . . .
          <w>final</w>
          <w>word</w>
          <me:punct>-</me:punct>
        <lb n="15"/>
           <w>New</w>
           <w>sentence</w>
           <w>begins</w>
            . . . . . . . .
       </p>

And the encoding of the superfluous punctuation mark on all three levels will be like this:

      <me:punct>
        <choice>
          <me:facs>-</me:facs>
          <me:dipl>-</me:dipl>
          <me:norm></me:norm>
        </choice>
      </me:punct>

10. Split words

Question: Sometimes the encoder believes that a word has been split, but there are no signs of transpositions indicating that the scribe (or any later hand) was aware of it. How should this be encoded in a lemmatised text?

Answer: There are several ways of dealing with this. Assuming that this only happens at the word level, we suggest that it is dealt with by using the various levels of encoding, i.e. facs, dipl and norm. An example: The manuscript has “with house in”, but the encoder believes the reading should be “within house”:

      <w lemma="with">
        <choice>
          <me:facs>with</me:facs>
          <me:dipl>with</me:dipl>
          <me:norm></me:norm>
        </choice>
      </w>

      <w lemma="within">
        <choice>
          <me:facs></me:facs>
          <me:dipl></me:dipl>
          <me:norm>within</me:norm>
        </choice>
      </w>

      <w lemma="house">
        <choice>
          <me:facs>house</me:facs>
          <me:dipl>house</me:dipl>
          <me:norm>house</me:norm>
        </choice>
      </w>

      <w lemma="in">
        <choice>
          <me:facs>in</me:facs>
          <me:dipl>in</me:dipl>
          <me:norm></me:norm>
        </choice>
      </w>

11. Names in the margin

Question: In the Old Icelandic homily book, the name Bede has been added in the margin by the scribe, apparently to indicate that he thought Bede was the author of the sermon. How should this be encoded?

Answer: We recommend that the name is encoded as an addition, and that it is located at the beginning of the <div> for the sermon:

      <div type="sermon">
        <docAuthor>
          <add hand="scribe" place="margin-left">
            <name type="person">Bede</name>
          </add>
        </docAuthor>
        <head>Title of sermon</head>
        <p>Text of sermon.</p>
      </div>

Additions are discussed in ch. 7.2.1 of the handbook, and names in ch. 9.1.

First published 28.01.2009. Last updated 09.03.2009. Webmaster.