Daniel Harrison's Personal Blog

Personal blog for daniel harrison

Brisbane YOW Talk October November 3, 2011

Filed under: business,development,Uncategorized — danielharrison @ 3:17 am

I went to the latest Brisbane YOW talk in October which had a focus on cloud computing and analytics.

First was Readify and MYOB to .Net Cloud. ¬†This was interesting as I haven’t come accross the .net cloud in practice due to it being late on the scene comparatively and most projects that I’ve come accross for distributed computing tending not to be .net. I’ve tended to favour an environment where there’s more control ala ec2 and java\scala\c based solutions even over app engine. It seems like a competent solution and to have adopted common standard practices; sql hosting or a big table like data store, worker and worker queues etc. It’s been a few years since I’ve shipped products in .net but it gave me confidence if I was stuck without any alternative the same patterns and practices I’ve been using could be brought accross pretty easily.

One point of interest was that of single tenant vs multi tenant data hosting. My experience is multitennant is much much harder architecturally, particularly doing things like managing upgrade and concurrent version support. While being the holy grail for the potential efficiencies, it seems to have lost the impetus that it once had as the shining light on the hill. The pattern that I seem to be seeing is that multi-tennancy is losing to virtualised single tenancy stacks. Due to the speed and cost effectiveness of being able to spin up on demand instances, ease of backup and tools like chef and puppet that make provisioning much easier, it seems like a pattern of single tenancy is becomming the default. My theory is that it’s become _so_ cheap to run virtualised stacks in a public cloud provider, that the cost of architecting and development of multtennant solutions isn’t cost efficient for most classes of problems.

One thing I don’t think we got a really clear answer on was legal implications on the cloud and offshore hosting. From my understanding even if encrypted, for various requirements like PCI-DSS and others it makes it almost impossible to use an offshore cloud for data persistence. In Aus this rules out most public cloud providers but I strongly suspect most companies are rolling this out at a dev level and not really concerning themselves with the legal implications. I was really hoping that we get an amazon cloud here but it seems like singapore will be the local hub for the generic cloud providers. Just given the sheer size of the govt market I can see a few providers lured onshore but with expensive fibre until the NBN really gets cranking it doesn’t seem like it would be very cost effective for them.

Dave Thomas was the presentation I was looking forward to most. It was focussed on end user computing for what he terms thinkers (analysts, data scientests, economists, etc). This is a topic dear to my heart with my original degree being in economics and for a new project I’m looking at kicking off, I’ll be working with some exceptional analysts that we’ll need to empower. I’ve been thinking alot about how to harvest and collect data and with some kind of cooperative process, build a toolchain for experimental and then deployable models. This is an area that is awash with hype and money at the moment due to the promise it can deliver. It really feels like the early days of Hari Seldon . The main takeaway I had was that empowering these users means that to be effective, the tools we will write as engineers cross the boundaries from high performance computing to language design and most importantly, usability with a view of the analyst at the centre. These are all individually hard problems to solve as it is, and we’re in very early days. It explains why companies such as palantir et al are growing so fast and getting alot of serious attention and money. If you get a good solution I think it’s very easy to see that it will revolutionalise business data processing as did the database before it.

The tool he demoed would have been particularly useful as a generic data anaylsis tool and seemed to me a general purpose tool to start understanding the data , visualising it etc with view for determining a specific answer. It was a very brief glimpse but seemed oriented on solving those segmentation queries, eg tracking down a subset of a larger population given various tracers, patterns etc. It seemed pretty effective and gave analysts that ability to mine large amounts of data and segment down to some subset of interest in what seemed close to realtime. Part of what I see as an excellent data modeling habit is to get down and play with the dirty dirty dirty data. You need to understand its characteristics and this tool would fit the bill. It’s wierd when you think about it in a way; These very expensive tools are processing peta and terabytes of data to produce formats where an analyst can apply their superior pattern recognition ability to it to solve the problem and draw often non-intuitive deductions. It’s all about getting it to a format our highly fallable brains can work on. Both this tool, and from what I’ve seen of the new trends of tools such as palantir, mean you can process massive amounts of data to identify and visualise data to segment interesting items that only years ago was simply too slow to be able to respond to in any meaningful way. You can do lots of experiments and visualize the data and then go on and discover more interesting trends and pointers etc in realtime, so I really see these tools changing the face of the analytics profession. In uni we would run through data and get some dodgy little black and white line graph that was next to unintelligible and would have to kill -9 if you were overly ambitious in your data usage, it’s changed so much in a decade. With this ability to record everything everywhere and now analyze it quickly and get initial results in near real time it means businesses and govt can be much more responsive to dealing with everything to planning and breaking emergencies. While I think this is a boon for social research and faster and improved responsiveness for governments I strongly suspect it’s really going to be most used in finance and getting us to buy more, faster ūüėČ

It did get me thinking though and¬†spurred¬†a few conversations with a few colleagues doing big big data analytics. ¬† From my experience in economic modeling and some peripheral fraud detection getting an answer is the /start/ of the job; the next step is to build a tunable¬†predictive¬†model and hook it up to some actions. My feeling is that /most/ of the time you are trying to build a model that then learns (in a constrained manner) and reacts on it’s own. It will be customized and monitored by less analytical staff by tweaking parameters based on current trends and observations. It’s obviously the first half in this model to idenfify the trend, but you need to do something with it and I wonder where tools will take us. I guess in engineering parlance instead of returning a value I’m returning a function that changes based on the inputs. How do we develop tools that allow the building of dynamic models we can use as filters, event drivers, adapters in our systems we ship today. Things that are not static but given core parameters and a stream of information to eat, adjust within a predictable manner. Will we ever have a scenario where we have an analyst that will model, analyse data and output a compiled artifact we slot into our systems as a core observor and action initiating blob. It seems to me like we’re heading to some kindof model which is part rule system, part integration code and part tunable analysis system. My previous role at Oracle was leading development for a high performance rule modelling system for policy experts. I think coupled with a dynamic and probabilistic model it would be capable to put something together that would operate this way and operate over large, real time data sets and streams.

Overall the YOW nights are excellent and this was no exception. It’s great that these high quality speakers are comming to Oz now and I’m really looking forward to the conference in dec.


Things to watch out for in HTML5 IndexedDB as at 21 June 2011 June 21, 2011

Filed under: development,internet,javascript,web — danielharrison @ 6:59 am

I’m between contracts at the moment so taking the opportunity to play with some bleeding edge technology.¬† With it seeming like everyone’s jumping on the HTML5 bandwagon, even microsoft with windows 8, seemed like a good opportunity to restart my side project playing with the latest web tech.

So there’s a few things to note if you pick up indexedDB..¬† It’s bleeding edge and to be expected but here’s my experiences over the last week.

IndexedDB is in webkit (chrome) and firefox but not yet in safari.¬† The database visualisation in the webkit developer tools isn’t linked in yet so you can’t mange the database that way yet.¬† You can’t delete a database programmatically yet in either chrome or firefox.¬† If you’re writing unit tests this is going to be a bit of a pain ;).¬† Also you can’t yet access the indexedDB from webworkers.¬† At this stage it’s attached to the window.¬† One of the things I’m playing with is a stemming and text sorting index which was all running via webworkers.¬† It’s an easy workaround, you just take the results from the webworkers and at a convenient time, merge and store instead of doing it directly.¬† Still, will be cool when this works.

The other thing I’ve noticed is that it feels very different than other data stores, even other kvp such as cassandra etc.¬†¬† It really is a javascript data store.¬†¬† The feeling I get is the asynchronous model is the preferred interaction method which again feels different that other API’s.¬† I’m still getting the feel, but it feels right for client side javascript.¬† In my opinion if I had to choose between the sqllite model and this, I’d choose this as a better technology direction for browser based client structured storage.¬† Sqllite would have just recreated the sql feeling of datastores and I don’t think it would have felt quite right for javascript in the long term.

I’m sure these will be addressed pretty shortly, I’m running chrome alpha and dev channels, ffox 5 and will post back when I notice a change.


Your cafe has a UX problem May 20, 2011

Filed under: Uncategorized — danielharrison @ 8:35 am

Summoning the cranky curmudgeon.


See the fancy handle-less milk cup. See the milk drop on the table that will need cleaning? It’s almost impossible to pour the milk into the tea without spilling milk everywhere.¬† So instead of having customers like me drink and leave.¬† Every table needs to have the milk cleaned off before the next customer comes along necessitating staff to clean said milk. They must be spending thousands in wages in cleaning costs instead of serving customers.

My current reading is Universal Principals of Design.  Great book.


Services and Contracts May 19, 2011

Filed under: development,web — danielharrison @ 10:10 am

I’ve been playing a bit this week with WADL.¬† The service I’m playing with is a JSON REST service that’s representing the service contract in WADL.¬† It’s got me thinking about the role for service descriptions in a post WSDL world.¬† Fundamentally, if I release a service for the world must there be a method of specifying A) the endpoints exposed by a service and B) the data format that it accepts in a published standard.¬† So really the question is; If I release a service what’s the best way of helping people to write a client that will interact with it?¬† Or maybe; If the integrators are knife wielding coding maniac’s, what should I do?

First my thoughts on WADL.¬† WADL works well with XML based services.¬† The data can be typed with XSD and in combination with param path(an XPATH to the node of interest), has sufficient information to generate an implementation that interacts with that service.¬† It breaks down when trying to represent JSON (or alternate formats), path being undefined and as JSON doesn’t have an official schema definition, there’s no way of specifying the contract of complex JSON types or the path in the payload.¬† There’s almost schema’s such as JSONSchema of course and a number of notable others.¬† So it’s possible with a hodgepodge of almost standards to fully specify a JSON REST service that could be theoretically used for code generation.¬† The major impedance being tool support. ¬† At the current moment in time with a JSON service you end up with multiple points of truth, the WADL and then the documentation around the JSON payload; what the parameters mean and their business logic +/- sample integrations.

So if WADL isn’t by itself sufficient for JSON, the question is; how do you hand over a JSON service to an integration team and get them to use it effectively?¬† At the moment this answer seems to be; here’s REST learn it, here’s JSON it’s simple, here’s our documentation about the data we expect.¬† It’s easy as a experienced developer to expect that these technologies are mastered and it’s a simple 1/2 day task to get it up and running.¬† However having shipped and supported a product that external developers have had to write an integration to, the lesson I’ve learned, is that it’s never simple enough!¬† Other developers will not bother to understand the technologies, will not read your documentation and will consider it your problem. ¬† JSON and REST are fundamentally simple building blocks …¬† once you’ve mastered a number of other technologies and building blocks.¬† My experience is people writing integrations (mainly in the enterprise space) against your API are time pressed and often the most inexperienced developers.¬† So how do you cater for them?

The main benefit of WSDL in my experience is code generation.¬† Integration developers don’t need to understand SOAP, it’s point at the WSDL (WS-I compliant of course) and boom, get a client with a mostly understandable business object model.¬† Put your app on top, populate with the data and let the generated code take care of the rest.¬†¬† Does this need to exist for JSON REST services?¬† The immediate answer is YES, of course! but it runs bit deeper.¬† The downsides of the WSDL¬† approach is that it’s a lot of magic.¬† When things work, it works well.¬†¬† As soon as a problem crops up it can be an almost impossible task to understand what and where it’s not working.¬† By not having to fundamentally understand the technology stack and relying on the generated magic the masked complexity becomes an insurmountable problem, complexity always escapes. ¬† The WSDL stack is actually quite deep and complex, the solution to the complexity being code generation, wizards and, well, magic.¬† JSON REST in my view is a fundamental shift in the solution of the problem.¬† Not let’s specify this more completely, to ensure better interoperability with another standard and ensure that we can generate software where the complexity is masked; but a re-orientation where with a bit of basic knowledge the problem is sufficiently simple not to require that additional complexity.¬†¬† If JSON REST services get sufficiently complex in order to require the overhead of complex integration specifications and code generation then to some extent they’ve failed, the technology stack has failed to be sufficiently simple!

So here’s my conclusions.¬† It’s like most complex problems in software development, a human problem.¬† I think the choice to use JSON is a choice about the users that you want and expect to use your service.¬† WSDL may be a more appropriate solution, particularly in the enterprise space.¬† If shipping a JSON REST service; ship documentation, an example stub program in all of the languages you want to officially support and JSON samples (drop the WADL).¬† In the best of both worlds of a product you ship BOTH and allow users to self select.¬† Most of the core concepts are identical and with a little clever architecture in your products service it’s pretty easy to do.


Options and Tradeoffs for Rich Text Editing in HTML5ish Technologies March 11, 2011

Filed under: development,internet — danielharrison @ 9:22 pm

There’s a number of options for adding rich text editing to your website, all have a number of tradeoffs that will be guided around the amount of control you need.

Content Editable

Content editable is the default solution for text editing on the web. ¬†Originating from Microsoft’s pioneering work in 4.0 browsers all browsers now support the basic API. ¬†It’s the technology behind most rich editors, tinyMCE, YUI editor, CKEditor. ¬†The problem though is that the technology is quite old in internet time and the API doesn’t smell quite right in 2010.¬† The API isn’t one that will feel familiar to developers familiar with javascript, jQuery etc and dom manipulation. ¬†It lives at a higher abstraction via the document.execCommand. ¬†If you apply the bold command to a set of text it doesn’t return a selection, the new element or set of elements and doesn’t really care about the DOM at that level. ¬†If you do want to take a DOM centric approach you’ll need to attach listeners for node operations etc and get a bit clever about understanding what changed.¬† Most frameworks mean you don’t really need to care and abstract it away sufficiently that it’s easy to have a competent, performant solution ready in a couple of hours.¬† The contentEditable technology does address some of the complexity that can arise in complex formatting that if you took a ownership position you’d have to solve.¬† For example applying bold or converting to a list works on nested content and gets it right enough. ¬†It doesn’t produce what would be considered the cleanest html, eg every paragraph is <p><br><p> (<div><br><div> in webkit based browsers). ¬†It’s the good enough solution and if you’re happy enough to make it a desktop browser based experience and want a quick solution, this solution is the easiest. ¬† You also get things like spell checking for free (most browsers now support this by default). ¬†One extension to contentEditable is to use the selection API.¬† This tool has facilities to surround content, insert elements at the start of selection etc and manipulate HTML based on user input.¬† In some ways the selection API is easier to use as it has a DOM based view of the world which makes it much easier to integrate it with bleeding edge technologies like html5 history.

I’ve been keenly monitoring the ADC for news of when content editable will be supported on the ipad with mobile safari but it doesn’t seem like this is a near term priority.¬† It’s still not supported in the latest 4.3 iOS release.¬†¬† So contentEditable is ruled out if you’re targeting the iPad; other tablets I’m not so sure of.¬† To some extent this is not surprising as getting the experience right for tablet devices is going to take some thinking given the experience certainly wasn’t envisiged with tablets in mind.

Bind to a an element, monitor keystrokes, insert into DOM.

The you bought it you own it solution.¬† The advantage over contentEditable is you can make it work on iPad and other devices that don’t support content editable. ¬†I believe this is the solution that google now uses in it’s docs experience. ¬†If the text editing is a core competency you need to own and if you’re developing a custom solution then this is a feasible option. ¬†It’s alot of work but owning everything gives you great power and it uses standard DOM operations so is well supported by the browsers you’ll care about.¬† If you’ve got an product where you’re using OT or causal trees to synchronise changes in a collaborative environment, this works well as likely you already have that information to send to the server to synchronise user edits anyway.


Canvas is the newest technology you can implement text editing with. ¬†This is another solution where you need to own the whole stack, monitor keystrokes and insert glyphs. ¬†Canvas is fast; very fast, which makes doing things like displaying graphics a very fluid experience in modern browsers. ¬†It has a pixel coordinate system which gives you fine grained control over everything, even more so than any html generating example. ¬†My early prototypes did raise a blocker that ruled it out for me though. ¬†The canvas API uses methods like fillText to write text and measureText to determine the space it’s going to take.¬† One of the core features of a text editor is that it requires overlay of a cursor to indicate position of active editing. ¬†The problem is measureText only works reliably on fixed width (monospace) fonts. ¬†This is why it works in programming environments like Bespin/SkyWriter which uses code oriented monospaced fonts. ¬†The measureText gives you the width in pixels. ¬†When using a proportional font this width will not be consistent due to aliasing and the proportional algorithms that make it look pretty on your screen. ¬†For example with the term ‘cat’. ¬†Measuring ‘cat’ will give you the width of the whole word.¬† If you want to shift the cursor to between the a and the t you’ll need to know how much space ‘ca’ takes of the whole word.¬† Due to the calculation (particularly if you start worrying about bold and italics) the measureText of ‘ca’ will include a few extra pixels to account for the fact that a is now the end letter of a word.¬† So for measureText it’s the total space to print out ‘ca’ as a word including all styles applied to the font and padding at the end letter. ¬†If you wanted to overlay a cursor next to the ‘a’ in ‘cat’ using measureText to calculate where the a ended, then by default you’d end up with the cursor sitting in the ‘t’ somewhere. ¬†Obviously being off a few pixels matters in the UI. ¬†As the calculation of proportional fonts is quite complex and goes into low level technology, in order to determine a feasible cursor position more information is needed than is currently available. ¬†In proportional fonts particularly when dealing with italics, letters technically overlap, eg. /la/ the l actually pushes into the top space over the a depending on the font, so where should the cursor go?¬† At the end of ¬†the l or at the beginning of the a (beginning of the a, on top of some of the l).¬† The obvious solution would be to add this information to the API so that it can record where letters start and end and their general dimensions. ¬† That said given the non accessibility of canvas and the fact it’s not meant to be a text editing environment, there’s good reasons why the API designers probably don’t want to facilitate this madness. ¬† There are hacks of course to figure this out.¬† I played with writing the text to a white background, getting the written text as an image and then using pixel sampling to determine where the letter really started, yuck!¬† It’s a lot of work and when you care more about the input over absolute control for display, contentEditable or rolling your own direct dom manipulation solutions are the quickest and easiest path.


Qunit November 30, 2010

Filed under: Uncategorized — danielharrison @ 1:54 am

Started using jQuery’s qunit for testing javascript today. I needed a framework that could run in a single page and was looking for something that would write out some content indicating what passed and didn’t. I’d started writing my own with some simple output after each test fn and thought this *must* have been done before. Sure enough Qunit fit the bill. Had a bug with chrome not liking it being served in the wrong character encoding (ISO8859-1 instead of UTF8), but apart from that, simple straight forward framework that made it easy.


Ahhh, newspapers November 9, 2010

Filed under: Uncategorized — danielharrison @ 9:16 am

I bought the SMH iphone app today, low and behold before I could actually use it I had to register for an account before I could see any content. I’m not kidding, first screen in the app. Too much trouble, so closed it and fired up the free ABC app and wrote it off as a 3 buck lesson learned. I didn’t want to comment, I wanted the news, you’d think of all people a newspaper would understand that.