Encoding Entities to Work with CKEditor 3
« | Tue February 8, 2011 | comments and reactions | permanent link | »
If Google is any indication, I'm not the only developer who has spent hours trying to understand how to best integrate CKEditor to handle code snippets, problematic entities and pre-formatted text . Wrap your code snippet in a
<code> tag and everything works more-or-less as you would expect - until your code snippet contains the character '
<' (which is of course, a very common character if you're working with xml, html, php, asp, etc.). This is an old problem which code monkeys have been handling forever by simply escaping/encoding any less-than symbols as
< so that they display on screen as an angle bracket, but are not parsed by the browser as a possible start to an HTML element.
That works great for displaying special characters in the browser, but when populating a CKEditor instance with them, they are decoded back to the original character (and start causing all sorts of problems).
After hours of searching, I finally found this post on the CKEditor forums which at least described the problem:
...I think the actual problem is that CKEditor automatically converts bracket entities to actual brackets when initializing. So if the saved field HTML is:<p>A paragraph starts with <code><p></code>.</p>
CKEditor will automatically convert the `<` and `>` entities to `<` and `>` while initializing, and then parse it as an actual `<p>` tag.
This is exactly what I observed, and no combination of configuration options in the CKEditor API seems capable of disabling this behavior. (Believe me, I tried, and even hacked the source for a few hours in vain.) In my case, I am pulling html-formatted text from a database (which frequently contains code snippets with encoded < symbols), editing in CKEditor, and inserting back into the database. How can I preserve a string like
<?php when it enters CKEditor, so that it displays properly and can be edited and re-saved?
Double encode all entities found in the source data before CKEditor initializes. Specifically what this means is to convert all occurrences of the ampersand character,
& before CKEditor gets its hands on the data. With PHP:
// before being fed to the textarea of CKEditor $content = str_replace('&', '&', $content);
You may be thinking: "doesn't PHP have
htmlentities() functions that could be called instead of a string replacement?" The answer is yes, but these functions will encode much more than what we need here (thus making CKEditor's source view look like a mess of entities instead of html with a few special entities inserted where needed). We don't need every
< > " ' & converted to an entity, in fact we only need '
&' converted to '
The flow here looks something like this for various character sequences:
Pulled from DB
What is sent
What displays in
Inserted into DB
|<||[used for markup]||
|>||>||[used for markup]||>|
The critical things to note here are:
- The literal < and > characters maintain state throughout, as they should - you want your real <p> and other tags to stay real tags the whole way through.
To display the literal
&strings on screen, they will have to be stored double-encoded in the database or file (and therefor triple-encoded like
&amp;amp;when sent to CKEditor).
This method requires that the following CKEditor configuration options be used:
config.entities = false; config.htmlEncodeOutput = false;
- CKEditor essentially strips off the double-encoding when it loads, so there is no need to perform any decoding or special replacements with the data submitted by the editor - it will be saved correctly in the database or file.
As this post (with a ton of entities needed to display properly) was created and edited in a system which uses the technique, I'd say it works!