Encoding Entities to Work with CKEditor 3
« | Tue February 8, 2011 | comments and reactions | permanent link | »
If Google is any indication, I'm not the only developer who has spent hours trying to understand how to best integrate CKEditor to handle code snippets, problematic entities and pre-formatted text . Wrap your code snippet in a <pre>
or <code>
tag and everything works more-or-less as you would expect - until your code snippet contains the character '<
' (which is of course, a very common character if you're working with xml, html, php, asp, etc.). This is an old problem which code monkeys have been handling forever by simply escaping/encoding any less-than symbols as <
so that they display on screen as an angle bracket, but are not parsed by the browser as a possible start to an HTML element.
That works great for displaying special characters in the browser, but when populating a CKEditor instance with them, they are decoded back to the original character (and start causing all sorts of problems).
After hours of searching, I finally found this post on the CKEditor forums which at least described the problem:
...I think the actual problem is that CKEditor automatically converts bracket entities to actual brackets when initializing. So if the saved field HTML is:
<p>A paragraph starts with <code><p></code>.</p>CKEditor will automatically convert the `<` and `>` entities to `<` and `>` while initializing, and then parse it as an actual `<p>` tag.
This is exactly what I observed, and no combination of configuration options in the CKEditor API seems capable of disabling this behavior. (Believe me, I tried, and even hacked the source for a few hours in vain.) In my case, I am pulling html-formatted text from a database (which frequently contains code snippets with encoded < symbols), editing in CKEditor, and inserting back into the database. How can I preserve a string like <?php
when it enters CKEditor, so that it displays properly and can be edited and re-saved?
The Solution
Double encode all entities found in the source data before CKEditor initializes. Specifically what this means is to convert all occurrences of the ampersand character, &
, to &
before CKEditor gets its hands on the data. With PHP:
// before being fed to the textarea of CKEditor $content = str_replace('&', '&', $content);
You may be thinking: "doesn't PHP have htmlspecialchars()
and/or htmlentities()
functions that could be called instead of a string replacement?" The answer is yes, but these functions will encode much more than what we need here (thus making CKEditor's source view look like a mess of entities instead of html with a few special entities inserted where needed). We don't need every < > " ' &
converted to an entity, in fact we only need '&
' converted to '&
'.
The flow here looks something like this for various character sequences:
Pulled from DB or file |
Action |
What is sent to CKEditor |
What displays in CKEditor WYSIWYG |
Action |
Inserted into DB or file |
---|---|---|---|---|---|
< |
replace all '&' with & |
< | [used for markup] |
none needed |
< |
> | > | [used for markup] | > | ||
< | &lt; | < | < | ||
> | &gt; | > | > | ||
& | &amp; | & | & | ||
&amp; | &amp;amp; | & | &amp; |
The critical things to note here are:
- The literal < and > characters maintain state throughout, as they should - you want your real <p> and other tags to stay real tags the whole way through.
-
To display the literal
<
or>
or&
strings on screen, they will have to be stored double-encoded in the database or file (and therefor triple-encoded like&amp;amp;
when sent to CKEditor). -
This method requires that the following CKEditor configuration options be used:
config.entities = false; config.htmlEncodeOutput = false;
- CKEditor essentially strips off the double-encoding when it loads, so there is no need to perform any decoding or special replacements with the data submitted by the editor - it will be saved correctly in the database or file.
As this post (with a ton of entities needed to display properly) was created and edited in a system which uses the technique, I'd say it works!