Daniel Harrison's Personal Blog

Personal blog for daniel harrison

Options and Tradeoffs for Rich Text Editing in HTML5ish Technologies March 11, 2011

Filed under: development,internet — danielharrison @ 9:22 pm

There’s a number of options for adding rich text editing to your website, all have a number of tradeoffs that will be guided around the amount of control you need.

Content Editable

Content editable is the default solution for text editing on the web.  Originating from Microsoft’s pioneering work in 4.0 browsers all browsers now support the basic API.  It’s the technology behind most rich editors, tinyMCE, YUI editor, CKEditor.  The problem though is that the technology is quite old in internet time and the API doesn’t smell quite right in 2010.  The API isn’t one that will feel familiar to developers familiar with javascript, jQuery etc and dom manipulation.  It lives at a higher abstraction via the document.execCommand.  If you apply the bold command to a set of text it doesn’t return a selection, the new element or set of elements and doesn’t really care about the DOM at that level.  If you do want to take a DOM centric approach you’ll need to attach listeners for node operations etc and get a bit clever about understanding what changed.  Most frameworks mean you don’t really need to care and abstract it away sufficiently that it’s easy to have a competent, performant solution ready in a couple of hours.  The contentEditable technology does address some of the complexity that can arise in complex formatting that if you took a ownership position you’d have to solve.  For example applying bold or converting to a list works on nested content and gets it right enough.  It doesn’t produce what would be considered the cleanest html, eg every paragraph is <p><br><p> (<div><br><div> in webkit based browsers).  It’s the good enough solution and if you’re happy enough to make it a desktop browser based experience and want a quick solution, this solution is the easiest.   You also get things like spell checking for free (most browsers now support this by default).  One extension to contentEditable is to use the selection API.  This tool has facilities to surround content, insert elements at the start of selection etc and manipulate HTML based on user input.  In some ways the selection API is easier to use as it has a DOM based view of the world which makes it much easier to integrate it with bleeding edge technologies like html5 history.

I’ve been keenly monitoring the ADC for news of when content editable will be supported on the ipad with mobile safari but it doesn’t seem like this is a near term priority.  It’s still not supported in the latest 4.3 iOS release.   So contentEditable is ruled out if you’re targeting the iPad; other tablets I’m not so sure of.  To some extent this is not surprising as getting the experience right for tablet devices is going to take some thinking given the experience certainly wasn’t envisiged with tablets in mind.

Bind to a an element, monitor keystrokes, insert into DOM.

The you bought it you own it solution.  The advantage over contentEditable is you can make it work on iPad and other devices that don’t support content editable.  I believe this is the solution that google now uses in it’s docs experience.  If the text editing is a core competency you need to own and if you’re developing a custom solution then this is a feasible option.  It’s alot of work but owning everything gives you great power and it uses standard DOM operations so is well supported by the browsers you’ll care about.  If you’ve got an product where you’re using OT or causal trees to synchronise changes in a collaborative environment, this works well as likely you already have that information to send to the server to synchronise user edits anyway.

Canvas

Canvas is the newest technology you can implement text editing with.  This is another solution where you need to own the whole stack, monitor keystrokes and insert glyphs.  Canvas is fast; very fast, which makes doing things like displaying graphics a very fluid experience in modern browsers.  It has a pixel coordinate system which gives you fine grained control over everything, even more so than any html generating example.  My early prototypes did raise a blocker that ruled it out for me though.  The canvas API uses methods like fillText to write text and measureText to determine the space it’s going to take.  One of the core features of a text editor is that it requires overlay of a cursor to indicate position of active editing.  The problem is measureText only works reliably on fixed width (monospace) fonts.  This is why it works in programming environments like Bespin/SkyWriter which uses code oriented monospaced fonts.  The measureText gives you the width in pixels.  When using a proportional font this width will not be consistent due to aliasing and the proportional algorithms that make it look pretty on your screen.  For example with the term ‘cat’.  Measuring ‘cat’ will give you the width of the whole word.  If you want to shift the cursor to between the a and the t you’ll need to know how much space ‘ca’ takes of the whole word.  Due to the calculation (particularly if you start worrying about bold and italics) the measureText of ‘ca’ will include a few extra pixels to account for the fact that a is now the end letter of a word.  So for measureText it’s the total space to print out ‘ca’ as a word including all styles applied to the font and padding at the end letter.  If you wanted to overlay a cursor next to the ‘a’ in ‘cat’ using measureText to calculate where the a ended, then by default you’d end up with the cursor sitting in the ‘t’ somewhere.  Obviously being off a few pixels matters in the UI.  As the calculation of proportional fonts is quite complex and goes into low level technology, in order to determine a feasible cursor position more information is needed than is currently available.  In proportional fonts particularly when dealing with italics, letters technically overlap, eg. /la/ the l actually pushes into the top space over the a depending on the font, so where should the cursor go?  At the end of  the l or at the beginning of the a (beginning of the a, on top of some of the l).  The obvious solution would be to add this information to the API so that it can record where letters start and end and their general dimensions.   That said given the non accessibility of canvas and the fact it’s not meant to be a text editing environment, there’s good reasons why the API designers probably don’t want to facilitate this madness.   There are hacks of course to figure this out.  I played with writing the text to a white background, getting the written text as an image and then using pixel sampling to determine where the letter really started, yuck!  It’s a lot of work and when you care more about the input over absolute control for display, contentEditable or rolling your own direct dom manipulation solutions are the quickest and easiest path.