Daniel Harrison's Personal Blog

Personal blog for daniel harrison

Brisbane YOW Talk October November 3, 2011

Filed under: business,development,Uncategorized — danielharrison @ 3:17 am

I went to the latest Brisbane YOW talk in October which had a focus on cloud computing and analytics.

First was Readify and MYOB to .Net Cloud. ¬†This was interesting as I haven’t come accross the .net cloud in practice due to it being late on the scene comparatively and most projects that I’ve come accross for distributed computing tending not to be .net. I’ve tended to favour an environment where there’s more control ala ec2 and java\scala\c based solutions even over app engine. It seems like a competent solution and to have adopted common standard practices; sql hosting or a big table like data store, worker and worker queues etc. It’s been a few years since I’ve shipped products in .net but it gave me confidence if I was stuck without any alternative the same patterns and practices I’ve been using could be brought accross pretty easily.

One point of interest was that of single tenant vs multi tenant data hosting. My experience is multitennant is much much harder architecturally, particularly doing things like managing upgrade and concurrent version support. While being the holy grail for the potential efficiencies, it seems to have lost the impetus that it once had as the shining light on the hill. The pattern that I seem to be seeing is that multi-tennancy is losing to virtualised single tenancy stacks. Due to the speed and cost effectiveness of being able to spin up on demand instances, ease of backup and tools like chef and puppet that make provisioning much easier, it seems like a pattern of single tenancy is becomming the default. My theory is that it’s become _so_ cheap to run virtualised stacks in a public cloud provider, that the cost of architecting and development of multtennant solutions isn’t cost efficient for most classes of problems.

One thing I don’t think we got a really clear answer on was legal implications on the cloud and offshore hosting. From my understanding even if encrypted, for various requirements like PCI-DSS and others it makes it almost impossible to use an offshore cloud for data persistence. In Aus this rules out most public cloud providers but I strongly suspect most companies are rolling this out at a dev level and not really concerning themselves with the legal implications. I was really hoping that we get an amazon cloud here but it seems like singapore will be the local hub for the generic cloud providers. Just given the sheer size of the govt market I can see a few providers lured onshore but with expensive fibre until the NBN really gets cranking it doesn’t seem like it would be very cost effective for them.

Dave Thomas was the presentation I was looking forward to most. It was focussed on end user computing for what he terms thinkers (analysts, data scientests, economists, etc). This is a topic dear to my heart with my original degree being in economics and for a new project I’m looking at kicking off, I’ll be working with some exceptional analysts that we’ll need to empower. I’ve been thinking alot about how to harvest and collect data and with some kind of cooperative process, build a toolchain for experimental and then deployable models. This is an area that is awash with hype and money at the moment due to the promise it can deliver. It really feels like the early days of Hari Seldon . The main takeaway I had was that empowering these users means that to be effective, the tools we will write as engineers cross the boundaries from high performance computing to language design and most importantly, usability with a view of the analyst at the centre. These are all individually hard problems to solve as it is, and we’re in very early days. It explains why companies such as palantir et al are growing so fast and getting alot of serious attention and money. If you get a good solution I think it’s very easy to see that it will revolutionalise business data processing as did the database before it.

The tool he demoed would have been particularly useful as a generic data anaylsis tool and seemed to me a general purpose tool to start understanding the data , visualising it etc with view for determining a specific answer. It was a very brief glimpse but seemed oriented on solving those segmentation queries, eg tracking down a subset of a larger population given various tracers, patterns etc. It seemed pretty effective and gave analysts that ability to mine large amounts of data and segment down to some subset of interest in what seemed close to realtime. Part of what I see as an excellent data modeling habit is to get down and play with the dirty dirty dirty data. You need to understand its characteristics and this tool would fit the bill. It’s wierd when you think about it in a way; These very expensive tools are processing peta and terabytes of data to produce formats where an analyst can apply their superior pattern recognition ability to it to solve the problem and draw often non-intuitive deductions. It’s all about getting it to a format our highly fallable brains can work on. Both this tool, and from what I’ve seen of the new trends of tools such as palantir, mean you can process massive amounts of data to identify and visualise data to segment interesting items that only years ago was simply too slow to be able to respond to in any meaningful way. You can do lots of experiments and visualize the data and then go on and discover more interesting trends and pointers etc in realtime, so I really see these tools changing the face of the analytics profession. In uni we would run through data and get some dodgy little black and white line graph that was next to unintelligible and would have to kill -9 if you were overly ambitious in your data usage, it’s changed so much in a decade. With this ability to record everything everywhere and now analyze it quickly and get initial results in near real time it means businesses and govt can be much more responsive to dealing with everything to planning and breaking emergencies. While I think this is a boon for social research and faster and improved responsiveness for governments I strongly suspect it’s really going to be most used in finance and getting us to buy more, faster ūüėČ

It did get me thinking though and¬†spurred¬†a few conversations with a few colleagues doing big big data analytics. ¬† From my experience in economic modeling and some peripheral fraud detection getting an answer is the /start/ of the job; the next step is to build a tunable¬†predictive¬†model and hook it up to some actions. My feeling is that /most/ of the time you are trying to build a model that then learns (in a constrained manner) and reacts on it’s own. It will be customized and monitored by less analytical staff by tweaking parameters based on current trends and observations. It’s obviously the first half in this model to idenfify the trend, but you need to do something with it and I wonder where tools will take us. I guess in engineering parlance instead of returning a value I’m returning a function that changes based on the inputs. How do we develop tools that allow the building of dynamic models we can use as filters, event drivers, adapters in our systems we ship today. Things that are not static but given core parameters and a stream of information to eat, adjust within a predictable manner. Will we ever have a scenario where we have an analyst that will model, analyse data and output a compiled artifact we slot into our systems as a core observor and action initiating blob. It seems to me like we’re heading to some kindof model which is part rule system, part integration code and part tunable analysis system. My previous role at Oracle was leading development for a high performance rule modelling system for policy experts. I think coupled with a dynamic and probabilistic model it would be capable to put something together that would operate this way and operate over large, real time data sets and streams.

Overall the YOW nights are excellent and this was no exception. It’s great that these high quality speakers are comming to Oz now and I’m really looking forward to the conference in dec.

Advertisements
 

Your cafe has a UX problem May 20, 2011

Filed under: Uncategorized — danielharrison @ 8:35 am
Tags:

Summoning the cranky curmudgeon.

20110520-084700.jpg

See the fancy handle-less milk cup. See the milk drop on the table that will need cleaning? It’s almost impossible to pour the milk into the tea without spilling milk everywhere.¬† So instead of having customers like me drink and leave.¬† Every table needs to have the milk cleaned off before the next customer comes along necessitating staff to clean said milk. They must be spending thousands in wages in cleaning costs instead of serving customers.

My current reading is Universal Principals of Design.  Great book.

 

Qunit November 30, 2010

Filed under: Uncategorized — danielharrison @ 1:54 am
Tags:

Started using jQuery’s qunit for testing javascript today. I needed a framework that could run in a single page and was looking for something that would write out some content indicating what passed and didn’t. I’d started writing my own with some simple output after each test fn and thought this *must* have been done before. Sure enough Qunit fit the bill. Had a bug with chrome not liking it being served in the wrong character encoding (ISO8859-1 instead of UTF8), but apart from that, simple straight forward framework that made it easy.

 

Ahhh, newspapers November 9, 2010

Filed under: Uncategorized — danielharrison @ 9:16 am

I bought the SMH iphone app today, low and behold before I could actually use it I had to register for an account before I could see any content. I’m not kidding, first screen in the app. Too much trouble, so closed it and fired up the free ABC app and wrote it off as a 3 buck lesson learned. I didn’t want to comment, I wanted the news, you’d think of all people a newspaper would understand that.

 

Playing with Cassandra Again September 30, 2010

Filed under: cassandra,development,internet,Uncategorized — danielharrison @ 12:59 am

I’ve been recently playing with the latest version of Cassandra again.¬†¬† Some new things going in the direction I like is that it seems to be growing into a more enterprise keystore model rather than something that is solving a specific high volume websites requirements only.¬† To me it felt like there’d been a lot of work in beefing up the management and making it solve a more generic problem.¬† The programmatic adhoc schema editing was a good improvement and based on the direction, 1.0 is shaping up to be really good.

My previous access code was using the thrift API directly.¬† For this prototype I tried out a few libraries; Pelops and Hector.¬† Both seemed to still be thrift focused and I’m not sure how this works with the change to AVRO.¬† Thrift always felt clumsy to me.¬† Technologies like thrift and avro, where you’re expressing a language independent communication protocol that various languages need talk in, in my view can’t help bleeding those idioms and generality up to the client.¬† It means client access code often feels, well slightly awkward.¬† It feels a bit like the good old days with IIOP/CORBA and EJB communication.¬† My personal preference is targeted hand coded adapters which feel like a good fit for the language, but the downside of course is that the clients can lag and not always be available for your choice of language.¬† So it’s a tradeoff as always.¬† Hector seems like it’s actively trying to avoid this but still has wrappers where if feels a bit thrifty, eg HKSDef instead of KSDef used to create a keystore.¬† If you are trying out and evaluating these libraries I would highly recommend you bite the bullet and just get the cassandra source for your targeted library and build it yourself.¬† Due to the fast moving nature it looks like the current releases are out of date and to get it working you really need the latest trunk versions of everything.¬† For example I don’t think beta2 of 0.7 cassandra is available as a package but it seems to be required with the current version of Pelops and Hector, Pelops is source only on github, so you’ll likely be building things yourself anyway.¬† I was impressed by both and it feels like there’s alot of room for future improvement and both seem to be shaping up as strong client access libraries.

Another good thing is that it seems like there’s some valuable resources coming through.¬† At the moment it’s a lot of google and reading the forums to nut out problems.¬† I bought the ‘Cassandra, the definitive guide’ rough cuts book from Oreilly and it seems like it’s taken a lot of the information, focused it and made it a good source for explanation of idioms and general wisdom.¬† So my recommendation would be to buy as it seems like it’s going to be an invaluable reference.

My biggest problem for using cassandra at the moment is support for multitenancy.¬† For the problem I have in mind it requires text indexing and content that is private per account.¬† With a model like cassandra you need to know what you will be searching for first and basically you build column families representing those indexes.¬† Now in my case I have users, accounts (many users), objects (storing text) and various indices around that text that drive my application.¬† Think a little bit like an RDF store with accounts and users.¬† Now in a traditional database model I would probably store this as a separate database for each account.¬† This may mean each running datastore instance has 10’s to 1000’s of databases.¬† With cassandra and the way this is structured this would not be advisable.¬† Each keystore maintains memory etc and to take advantage of it’s model of replication etc it’s more advisable to have less keyspaces.¬† Now one of the easy wins in the database server world of having separate databases per account is you’re guaranteed to not see other accounts data, you’re connecting to the datastore for that client which makes it very easy to guarantee and maintain security.¬† Under cassandra this makes it an application concern at the moment.¬† For my prototype I wasn’t happy with the extent that this was invading my code and required extra indices to make it all work, all of which increased the cognitive load of developing the application.¬† There’s work afoot around multi-tennancy requirements, but until that’s addressed, for me at least, it rules cassandra out.¬† The cassandra team are working on it and there’s some interesting proposals (the namespace one seems interesting) and I’m sure once it’s complete it will really make cassandra the first choice for an enterprise keystore.

 

Australia == end of the earth September 3, 2010

Filed under: Uncategorized — danielharrison @ 12:19 am
Tags:

Australia tends to have comparatively overpriced books due to various protectionist government policies. Being the bookish sort this means I tend to order a lot of books online. My preferred provider is bookdepository as it has free shipping to Oz and it takes between 4 days and 2 weeks for books to get here. Today I had to order a few books from Amazon as they weren’t available on bookdepository (Steve Blanks 4 steps to the Epiphany if you’re interested) and got a bit of a shock. Std shipping is 18-32 days !? Expedited shipping to get the same as bookdepository is ~$AUD50 which is almost equivalent to the order cost. You hit this occasionally with vendors in the US but didn’t really expect this from Amazon. Books I’ve ordered from pragmatic programmers and o’reilly with std shipping have tended to get here in a week or so.

My current thinking is it must be coming via balloon. If it took 80 days in the proverbial book, then this timetable would be about right.

 

Thoughts on blocking autoplaying content, firefox4 September 2, 2010

Filed under: Uncategorized — danielharrison @ 1:23 am
Tags:

I recently started using the firefox4 beta. It seems like it’s really coming along and should be a great new version. I have noticed an interesting side effect though. As most of the plugins I use aren’t supported yet I get the default experience; this includes flash content playing by default. I had the flashblock plugin and it really sped up and made my browsing experience¬† much better.¬†¬† As an example, one of the current news sites I previously used, smh.com.au autoplayed video broadcasts with no site wide way to turn it off.¬† Memo to news sites:¬† I can read faster than you can read it to me, autoplaying just makes me go somewhere else, including not buying the physical copy.¬† The thing that I find most annoying is disruptive sound via flash.¬† I’m typically playing music on the computer as I work and surfing to a site that suddenly has spoken or other audible content is quite jarring, particularly when it comes through at the maximum volume.

I was thinking about the implications with html5 video and multimedia.¬† Flashblock stops any animation and sound coming through as most of this type of content is flash which makes it easy to screen. When html5 video really takes off how would you achieve the same effect?¬† It’s a core html feature, so unless there’s a mechanism to disable sound in the browser and make video not run by default as browser preferences, this could quickly get annoying.¬† Maybe a good candidate for a plugin.¬† I’m guessing someone’s already thought about this, but I couldn’t see options in beta4 at the moment.