komlenic.com

making noise since 1977

Escaping CDATA

« | Fri September 12, 2008 | comments and reactions | permanent link | »

CDATA sections are most often used in XML to denote data that is to be interpreted as only character data (i.e. not markup). Quite often you'll need to store some content containing regular text intermixed with HTML or other markup within a set of XML tags. Without the use of a CDATA section, the XML parser would of course be thrown off by these intermixed markup tags, mistaking them for tags belonging to the XML document.

Wikipedia (as usual) does a great job of further explaining the use of CDATA sections.

Filed under "something which has been escaping (pun intended) us for a long time", is how to handle the rare but troublesome instances where you might need to include the string "]]>" within a CDATA section. As this sequence of characters is used as the ending delimiter for CDATA sections, its inclusion would cause the section to be prematurely closed by the parser. Wikipedia provides the simple solution:

The preferred approach to using CDATA sections for encoding text that contains the triad "]]>" is to use multiple CDATA sections by splitting each occurrence of the triad just before the ">". For example, to encode "]]>" one would write:

<![CDATA[]]]] ><![CDATA[>]] >

This means that to encode "]] >" in the middle of a CDATA section, replace all occurrences with the following:

]]]] ><![CDATA[>

(This effectively stops and restarts the CDATA section).

blog comments powered by Disqus