I went to the latest Brisbane YOW talk in October which had a focus on cloud computing and analytics.
First was Readify and MYOB to .Net Cloud. This was interesting as I haven’t come accross the .net cloud in practice due to it being late on the scene comparatively and most projects that I’ve come accross for distributed computing tending not to be .net. I’ve tended to favour an environment where there’s more control ala ec2 and java\scala\c based solutions even over app engine. It seems like a competent solution and to have adopted common standard practices; sql hosting or a big table like data store, worker and worker queues etc. It’s been a few years since I’ve shipped products in .net but it gave me confidence if I was stuck without any alternative the same patterns and practices I’ve been using could be brought accross pretty easily.
One point of interest was that of single tenant vs multi tenant data hosting. My experience is multitennant is much much harder architecturally, particularly doing things like managing upgrade and concurrent version support. While being the holy grail for the potential efficiencies, it seems to have lost the impetus that it once had as the shining light on the hill. The pattern that I seem to be seeing is that multi-tennancy is losing to virtualised single tenancy stacks. Due to the speed and cost effectiveness of being able to spin up on demand instances, ease of backup and tools like chef and puppet that make provisioning much easier, it seems like a pattern of single tenancy is becomming the default. My theory is that it’s become _so_ cheap to run virtualised stacks in a public cloud provider, that the cost of architecting and development of multtennant solutions isn’t cost efficient for most classes of problems.
One thing I don’t think we got a really clear answer on was legal implications on the cloud and offshore hosting. From my understanding even if encrypted, for various requirements like PCI-DSS and others it makes it almost impossible to use an offshore cloud for data persistence. In Aus this rules out most public cloud providers but I strongly suspect most companies are rolling this out at a dev level and not really concerning themselves with the legal implications. I was really hoping that we get an amazon cloud here but it seems like singapore will be the local hub for the generic cloud providers. Just given the sheer size of the govt market I can see a few providers lured onshore but with expensive fibre until the NBN really gets cranking it doesn’t seem like it would be very cost effective for them.
Dave Thomas was the presentation I was looking forward to most. It was focussed on end user computing for what he terms thinkers (analysts, data scientests, economists, etc). This is a topic dear to my heart with my original degree being in economics and for a new project I’m looking at kicking off, I’ll be working with some exceptional analysts that we’ll need to empower. I’ve been thinking alot about how to harvest and collect data and with some kind of cooperative process, build a toolchain for experimental and then deployable models. This is an area that is awash with hype and money at the moment due to the promise it can deliver. It really feels like the early days of Hari Seldon . The main takeaway I had was that empowering these users means that to be effective, the tools we will write as engineers cross the boundaries from high performance computing to language design and most importantly, usability with a view of the analyst at the centre. These are all individually hard problems to solve as it is, and we’re in very early days. It explains why companies such as palantir et al are growing so fast and getting alot of serious attention and money. If you get a good solution I think it’s very easy to see that it will revolutionalise business data processing as did the database before it.
The tool he demoed would have been particularly useful as a generic data anaylsis tool and seemed to me a general purpose tool to start understanding the data , visualising it etc with view for determining a specific answer. It was a very brief glimpse but seemed oriented on solving those segmentation queries, eg tracking down a subset of a larger population given various tracers, patterns etc. It seemed pretty effective and gave analysts that ability to mine large amounts of data and segment down to some subset of interest in what seemed close to realtime. Part of what I see as an excellent data modeling habit is to get down and play with the dirty dirty dirty data. You need to understand its characteristics and this tool would fit the bill. It’s wierd when you think about it in a way; These very expensive tools are processing peta and terabytes of data to produce formats where an analyst can apply their superior pattern recognition ability to it to solve the problem and draw often non-intuitive deductions. It’s all about getting it to a format our highly fallable brains can work on. Both this tool, and from what I’ve seen of the new trends of tools such as palantir, mean you can process massive amounts of data to identify and visualise data to segment interesting items that only years ago was simply too slow to be able to respond to in any meaningful way. You can do lots of experiments and visualize the data and then go on and discover more interesting trends and pointers etc in realtime, so I really see these tools changing the face of the analytics profession. In uni we would run through data and get some dodgy little black and white line graph that was next to unintelligible and would have to kill -9 if you were overly ambitious in your data usage, it’s changed so much in a decade. With this ability to record everything everywhere and now analyze it quickly and get initial results in near real time it means businesses and govt can be much more responsive to dealing with everything to planning and breaking emergencies. While I think this is a boon for social research and faster and improved responsiveness for governments I strongly suspect it’s really going to be most used in finance and getting us to buy more, faster 😉
It did get me thinking though and spurred a few conversations with a few colleagues doing big big data analytics. From my experience in economic modeling and some peripheral fraud detection getting an answer is the /start/ of the job; the next step is to build a tunable predictive model and hook it up to some actions. My feeling is that /most/ of the time you are trying to build a model that then learns (in a constrained manner) and reacts on it’s own. It will be customized and monitored by less analytical staff by tweaking parameters based on current trends and observations. It’s obviously the first half in this model to idenfify the trend, but you need to do something with it and I wonder where tools will take us. I guess in engineering parlance instead of returning a value I’m returning a function that changes based on the inputs. How do we develop tools that allow the building of dynamic models we can use as filters, event drivers, adapters in our systems we ship today. Things that are not static but given core parameters and a stream of information to eat, adjust within a predictable manner. Will we ever have a scenario where we have an analyst that will model, analyse data and output a compiled artifact we slot into our systems as a core observor and action initiating blob. It seems to me like we’re heading to some kindof model which is part rule system, part integration code and part tunable analysis system. My previous role at Oracle was leading development for a high performance rule modelling system for policy experts. I think coupled with a dynamic and probabilistic model it would be capable to put something together that would operate this way and operate over large, real time data sets and streams.
Overall the YOW nights are excellent and this was no exception. It’s great that these high quality speakers are comming to Oz now and I’m really looking forward to the conference in dec.