Medieval Unicode Font Initiative


A proposal for supplementary characters in Unicode: Medieval Nordic

 

Subrange 2: Diacritical characters

As explained in the introduction to this proposal, Unicode contains a large number of precomposed characters in the Latin alphabet, but is very reluctant to accept any new additions. This subrange has therefore been greatly reduced in the present version, and now only contains 9 characters, listed in sections 2.1, 2.2 and 2.3.

The remaining list of composite characters are discussed in section 2.4, and is no longer included in the proposal. However, since the inventory of combinations is of interest for font developers, lists of expected combinations are included in this section.

Also note that characters for metrical analysis have been moved to subrange 11.1 (i.e. characters with combinations of macron, acute and breve).

 

2.1 Characters with a loop

In Old Norse manuscripts, the vowel "o" may have a loop. The loop should probably be interpreted as a reduced form of the character "e" or "a", but the resulting character ought to be distinguished from the ligatures "oe" and "ao". The loop may be placed in a high position to the right, or in a low position to the left.

There is no combining loop in Unicode 3.2. Since the loop only appears in connection with the vowel "o" it is probably best to make this into a separate character.

Glyph

Entity

Unicode

Descriptive name

&olll;

0000

LATIN SMALL LETTER O WITH LOWER LEFT LOOP

&Olll;

0000

LATIN CAPITAL LETTER O WITH LOWER LEFT LOOP

&ourl;

0000

LATIN SMALL LETTER O WITH UPPER RIGHT LOOP

&Ourl;

0000

LATIN CAPITAL LETTER O WITH UPPER RIGHT LOOP

 

2.2 Characters with complex diacritics

The character "y" may in some Icelandic sources appear with a combination of a dot above and an acute accent, placed side by side. There is no single combining diacritical mark of this type, and since this combination only appears with the character "y", it is suggested to introduce a composite character rather than a new combining mark.

The character "o ogonek" is included in Unicode 3.2 in the range Latin Extended-B (01EB and 01EA). However, the accented form is not included, and must be encoded as a combination of LATIN SMALL LETTER O WITH OGONEK (01EB) or LATIN CAPITAL LETTER O WITH OGONEK (01EA) and COMBINING OGONEK (0328). These two characters have in my opinion the most prominent position of all characters in this proposal, since they are the only characters needed for the display and printing of normalised Old Norse orthography that are not included in a composite form in Unicode. In view of recent additions of composite characters in Unicode, such as several characters in Latin Extended-B (01F8, 01F9, 0218, 0219, 021A, 021B, 021E, 021F, 0226, 0227, 0228, 0229, 022A, 022B, 022C, 022D, 022E, 022F, 0230, 0231, 0232, 0233) these two composite characters ought to be seriously considered for inclusion.

Glyph

Entity

Unicode

Descriptive name

&ydaac;

0000

LATIN SMALL LETTER Y WITH DOT ABOVE AND ACUTE

&Ydaac;

0000

LATIN CAPITAL LETTER Y WITH DOT ABOVE AND ACUTE

&ohbrac;

0000

LATIN SMALL LETTER O WITH HOOK BELOW RIGHT AND ACUTE

&Ohbrac;

0000

LATIN CAPITAL LETTER O WITH HOOK BELOW RIGHT AND ACUTE

2.3 Characters with a hook above

In Medieval Nordic manuscripts, the characters "a", "e", "i", "j" and "y" may appear with a hook, which is placed above the character, facing to the left. It has been suggested that this hook could be rendered with the combining hook above used as a tone mark in Vietnamese (0309). However, the hook above in Medieval Nordic manuscripts has a slightly different form, and should be defined and designed as a horizontally and vertically turned ogonek. Like the ogonek, it usually overlaps with the base character.

Glyph

Entity

Unicode

Descriptive name

&comhal;

0000

COMBINING HOOK TO THE LEFT ABOVE

Recommendation: characters with a hook above are encoded with the combining mark proposed here.

List of expected combinations of base characters and a combining hook above

Note: In combination with the character "o", the hook above may face to the right. This hook is attested, but is very unusual, and it is an open question whether it should be recognized as a separate mark. See the link above for examples.

 

2.4 Characters with other types of diacritics

Medieval Nordic characters may appear with a number of existing combining marks in Unicode. It is suggested that such combinations are treated as decomposed characters, i.e. as a combination of a base character and a combining diacritical mark. Note, however, that since this proposal includes several base characters not (yet) included in Unicode, many of these hitherto "unknown" characters may appear with diacritical marks, such as the ligatures "au", "av", "oo" etc.

For this reason, expected combinations are listed below, so that font developer can take this into consideration.

 

2.4.1 Characters with a single acute

The acute is widely used in Medieval Nordic sources, primarily over vowels but also over some consonants. It is often used simply as a distinctive mark, especially over "i", which frequently is dotless and easily mistaken for part of an "m", "n" or "u" (minims). In some manuscripts the acute is used to denote length, and this is the usage in standard orthography.

Recommendation: characters with a single acute are encoded with the combining acute accent (0301).

List of expected combinations of base characters and the combining acute accent

 

2.4.2 Characters with a double acute

The double acute is used in Hungarian over the vowels "o" and "u". In Medieval Nordic manuscripts, especially late Icelandic ones, the double acute accent is sometimes used to denote length and are found over all vowels, consonants (semivowels) such as "j", "v" and "w", and some of the ligatures.

Recommendation: characters with a double acute are encoded with the combining double acute accent (030B).

List of expected combinations of base characters and the combining double acute accent

 

2.4.3 Characters with a single dot above

Single dots above are used for some Old English characters such as "c" and "g", and in general as a length mark in Medieval Nordic manuscripts, above consonants (geminates) as well as above vowels. In Old Norse standard orthography dots above are not used, but they are found in diplomatic editions.

Recommendation: characters with a single dot above are encoded with the combining dot above (0307).

List of expected combinations of base characters and the combining dot above

 

2.4.4 Characters with a single dot below

A special category of signs are characters with a dot below, typically indicating an uncertain reading. As such they do not appear in the manuscripts themselves, but they are quite frequent in diplomatic editions of Medieval Nordic texts. They are also frequently encountered in epigraphical contexts, e.g. in Runic inscriptions (namely in the transliteration of runes into the Latin alphabet).

Recommendation: characters with a single dot below are encoded with the combining dot below (0323).

List of expected combinations of base characters and the combining dot below

 

2.4.5 Characters with a double dot above

Double dots above, diaeresis, are widely used over vowels, as in modern German and Swedish. In Medieval Nordic manuscripts, especially late Icelandic ones, diaeresis is found over vowels and ligatures, as well as "v" and "w".

Recommendation: characters with a double dot above are encoded with the combining diaeresis (0308).

List of expected combinations of base characters and the combining diaeresis

 

2.4.6 Characters with a hook below

A few vowels, especially "o" and "e", may appear with a hook in Old Norse manuscripts. The latter combination, "e caudata", is common in Latin manuscripts, in which the letter form alternates with the ligature "æ". This hook, also called ogonek, is placed below the base character and faces to the right. In addition to the vowels, the character "t" may appear with a hook.

Recommendation: characters with a hook below are encoded with the combining ogonek (0328).

List of expected combinations of base characters and the combining ogonek

 


Version 2.0, 5 February 2003 OEH