<< Back to Basic XML

Special Characters in XML

The ampersand, greater than sign, less than sign, apostrophe, and quotation mark are all pre-defined in XML. Other special characters--like Old English letters--must be defined in your document. To do this, include the following code immediately after your XML declaration:

<!ENTITY Thorn "&#222;">
<!ENTITY thorn "&#254;">
<!ENTITY Eth "&#208;">
<!ENTITY eth "&#240;">
<!ENTITY Aesc "&#198;">
<!ENTITY aesc "&#230;">
<!ENTITY Amacron "&#256;">
<!ENTITY Emacron "&#274;">
<!ENTITY Imacron "&#298;">
<!ENTITY Omacron "&#332;">
<!ENTITY Umacron "&#362;">
<!ENTITY Ymacron "&#562;">
<!ENTITY AEmacron "&#508;">
<!ENTITY amacron "&#257;">
<!ENTITY emacron "&#275;">
<!ENTITY imacron "&#299;">
<!ENTITY omacron "&#333;">
<!ENTITY umacron "&#363;">
<!ENTITY ymacron "&#563;">
<!ENTITY aemacron "&#509;">
<!ENTITY nbsp "&#160;">
<!ENTITY copy "&#169;">
<!ENTITY mdash "&#8212;">
<!ENTITY ldquo "&#8220;">
<!ENTITY rdquo "&#8221;">

This code also defines vowels with macrons over them, a space (equivalent to (X)HTML &nbsp;), a copyright symbol, an m-dash, and left and right curly double quotation marks.

Once you have done this, you can use the codes after the word "ENTITY" to encode special characters as you would in (X)HTML. So &Thorn; will give you a capital thorn.

For those who are interested in how this works, you are using the <!ENTITY> element to define an object that can be used anywhere in the document. When an XML processor sees the code &...;, it automatically looks for the entity named in between the ampersand and the semicolon. In our example, it is "Thorn". The processor then finds the entity called "Thorn" and sees that it has been mapped as the equivalent of the hexidecimal code "&#222;". This code is a universal standard code for the capital thorn. So the processor replaces "Thorn" with that code. Anything trying to display the code (like a web page) will then automatically display the capital thorn. You could just type "&#222;" in your XML code, but the hexidecimal is harder to remember than "Thorn".

<< Back to Basic XML