making noise since 1977

Encoding Entities to Work with CKEditor 3

« | Tue February 8, 2011 | comments and reactions | permanent link | »

If Google is any indication, I'm not the only developer who has spent hours trying to understand how to best integrate CKEditor to handle code snippets, problematic entities and pre-formatted text . Wrap your code snippet in a <pre> or <code> tag and everything works more-or-less as you would expect - until your code snippet contains the character '<' (which is of course, a very common character if you're working with xml, html, php, asp, etc.).  This is an old problem which code monkeys have been handling forever by simply escaping/encoding any less-than symbols as &lt; so that they display on screen as an angle bracket, but are not parsed by the browser as a possible start to an HTML element.

That works great for displaying special characters in the browser, but when populating a CKEditor instance with them, they are decoded back to the original character (and start causing all sorts of problems).

After hours of searching, I finally found this post on the CKEditor forums which at least described the problem:

...I think the actual problem is that CKEditor automatically converts bracket entities to actual brackets when initializing. So if the saved field HTML is:

<p>A paragraph starts with <code>&lt;p&gt;</code>.</p>

CKEditor will automatically convert the `&lt;` and `&gt;` entities to `<` and `>` while initializing, and then parse it as an actual `<p>` tag.

This is exactly what I observed, and no combination of configuration options in the CKEditor API seems capable of disabling this behavior.  (Believe me, I tried, and even hacked the source for a few hours in vain.)  In my case, I am pulling html-formatted text from a database (which frequently contains code snippets with encoded < symbols), editing in CKEditor, and inserting back into the database.  How can I preserve a string like &lt;?php when it enters CKEditor, so that it displays properly and can be edited and re-saved? 

The Solution

Double encode all entities found in the source data before CKEditor initializes.    Specifically what this means is to convert all occurrences of the ampersand character, &, to &amp; before CKEditor gets its hands on the data.  With PHP:

// before being fed to the textarea of CKEditor
$content = str_replace('&', '&amp;', $content);

You may be thinking: "doesn't PHP have htmlspecialchars() and/or htmlentities() functions that could be called instead of a string replacement?"  The answer is yes, but these functions will encode much more than what we need here (thus making CKEditor's source view look like a mess of entities instead of html with a few special entities inserted where needed).  We don't need every < > " ' & converted to an entity, in fact we only need '&' converted to '&amp;'.

The flow here looks something like this for various character sequences:

Pulled from DB
or file
Action What is sent
to CKEditor
What displays in
Action Inserted into DB
or file
< replace
< [used for markup] none
> > [used for markup] >
&lt; &amp;lt; < &lt;
&gt; &amp;gt; > &gt;
&amp; &amp;amp; & &amp;
&amp;amp; &amp;amp;amp; &amp; &amp;amp;

The critical things to note here are:

  1. The literal < and > characters maintain state throughout, as they should - you want your real <p> and other tags to stay real tags the whole way through.
  2. To display the literal &lt; or &gt; or &amp; strings on screen, they will have to be stored double-encoded in the database or file (and therefor triple-encoded like &amp;amp;amp; when sent to CKEditor).
  3. This method requires that the following CKEditor configuration options be used:

    config.entities = false;
    config.htmlEncodeOutput = false;
  4. CKEditor essentially strips off the double-encoding when it loads, so there is no need to perform any decoding or special replacements with the data submitted by the editor - it will be saved correctly in the database or file.

As this post (with a ton of entities needed to display properly) was created and edited in a system which uses the technique, I'd say it works!

blog comments powered by Disqus