Encoding Entities to Work with CKEditor 3

Tue February 8, 2011

If Google is any indication, I'm not the only developer who has spent hours trying to understand how to best integrate CKEditor to handle code snippets, problematic entities and pre-formatted text . Wrap your code snippet in a <pre> or <code> tag and everything works more-or-less as you would expect - until your code snippet contains the character '<' (which is of course, a very common character if you're working with xml, html, php, asp, etc.).  This is an old problem which code monkeys have been handling forever by simply escaping/encoding any less-than symbols as &lt; so that they display on screen as an angle bracket, but are not parsed by the browser as a possible start to an HTML element.

That works great for displaying special characters in the browser, but when populating a CKEditor instance with them, they are decoded back to the original character (and start causing all sorts of problems).

After hours of searching, I finally found this post on the CKEditor forums which at least described the problem:

...I think the actual problem is that CKEditor automatically converts bracket entities to actual brackets when initializing. So if the saved field HTML is:

<p>A paragraph starts with <code>&lt;p&gt;</code>.</p>

CKEditor will automatically convert the `&lt;` and `&gt;` entities to `<` and `>` while initializing, and then parse it as an actual `<p>` tag.

This is exactly what I observed, and no combination of configuration options in the CKEditor API seems capable of disabling this behavior.  (Believe me, I tried, and even hacked the source for a few hours in vain.)  In my case, I am pulling html-formatted text from a database (which frequently contains code snippets with encoded < symbols), editing in CKEditor, and inserting back into the database.  How can I preserve a string like &lt;?php when it enters CKEditor, so that it displays properly and can be edited and re-saved? 

The Solution

Double encode all entities found in the source data before CKEditor initializes.    Specifically what this means is to convert all occurrences of the ampersand character, &, to &amp; before CKEditor gets its hands on the data.  With PHP:

// before being fed to the textarea of CKEditor
$content = str_replace('&', '&amp;', $content);

You may be thinking: "doesn't PHP have htmlspecialchars() and/or htmlentities() functions that could be called instead of a string replacement?"  The answer is yes, but these functions will encode much more than what we need here (thus making CKEditor's source view look like a mess of entities instead of html with a few special entities inserted where needed).  We don't need every < > " ' & converted to an entity, in fact we only need '&' converted to '&amp;'.

The flow here looks something like this for various character sequences:

Pulled from DB
or file
Action What is sent
to CKEditor
What displays in
Action Inserted into DB
or file
< replace
< [used for markup] none
> > [used for markup] >
&lt; &amp;lt; < &lt;
&gt; &amp;gt; > &gt;
&amp; &amp;amp; & &amp;
&amp;amp; &amp;amp;amp; &amp; &amp;amp;

The critical things to note here are:

  1. The literal < and > characters maintain state throughout, as they should - you want your real <p> and other tags to stay real tags the whole way through.
  2. To display the literal &lt; or &gt; or &amp; strings on screen, they will have to be stored double-encoded in the database or file (and therefor triple-encoded like &amp;amp;amp; when sent to CKEditor).
  3. This method requires that the following CKEditor configuration options be used:

    config.entities = false;
    config.htmlEncodeOutput = false;
  4. CKEditor essentially strips off the double-encoding when it loads, so there is no need to perform any decoding or special replacements with the data submitted by the editor - it will be saved correctly in the database or file.

As this post (with a ton of entities needed to display properly) was created and edited in a system which uses the technique, I'd say it works!

