Daniel Harrison's Personal Blog

Personal blog for daniel harrison

Brisbane YOW Talk October November 3, 2011

Filed under: business,development,Uncategorized — danielharrison @ 3:17 am

I went to the latest Brisbane YOW talk in October which had a focus on cloud computing and analytics.

First was Readify and MYOB to .Net Cloud. ¬†This was interesting as I haven’t come accross the .net cloud in practice due to it being late on the scene comparatively and most projects that I’ve come accross for distributed computing tending not to be .net. I’ve tended to favour an environment where there’s more control ala ec2 and java\scala\c based solutions even over app engine. It seems like a competent solution and to have adopted common standard practices; sql hosting or a big table like data store, worker and worker queues etc. It’s been a few years since I’ve shipped products in .net but it gave me confidence if I was stuck without any alternative the same patterns and practices I’ve been using could be brought accross pretty easily.

One point of interest was that of single tenant vs multi tenant data hosting. My experience is multitennant is much much harder architecturally, particularly doing things like managing upgrade and concurrent version support. While being the holy grail for the potential efficiencies, it seems to have lost the impetus that it once had as the shining light on the hill. The pattern that I seem to be seeing is that multi-tennancy is losing to virtualised single tenancy stacks. Due to the speed and cost effectiveness of being able to spin up on demand instances, ease of backup and tools like chef and puppet that make provisioning much easier, it seems like a pattern of single tenancy is becomming the default. My theory is that it’s become _so_ cheap to run virtualised stacks in a public cloud provider, that the cost of architecting and development of multtennant solutions isn’t cost efficient for most classes of problems.

One thing I don’t think we got a really clear answer on was legal implications on the cloud and offshore hosting. From my understanding even if encrypted, for various requirements like PCI-DSS and others it makes it almost impossible to use an offshore cloud for data persistence. In Aus this rules out most public cloud providers but I strongly suspect most companies are rolling this out at a dev level and not really concerning themselves with the legal implications. I was really hoping that we get an amazon cloud here but it seems like singapore will be the local hub for the generic cloud providers. Just given the sheer size of the govt market I can see a few providers lured onshore but with expensive fibre until the NBN really gets cranking it doesn’t seem like it would be very cost effective for them.

Dave Thomas was the presentation I was looking forward to most. It was focussed on end user computing for what he terms thinkers (analysts, data scientests, economists, etc). This is a topic dear to my heart with my original degree being in economics and for a new project I’m looking at kicking off, I’ll be working with some exceptional analysts that we’ll need to empower. I’ve been thinking alot about how to harvest and collect data and with some kind of cooperative process, build a toolchain for experimental and then deployable models. This is an area that is awash with hype and money at the moment due to the promise it can deliver. It really feels like the early days of Hari Seldon . The main takeaway I had was that empowering these users means that to be effective, the tools we will write as engineers cross the boundaries from high performance computing to language design and most importantly, usability with a view of the analyst at the centre. These are all individually hard problems to solve as it is, and we’re in very early days. It explains why companies such as palantir et al are growing so fast and getting alot of serious attention and money. If you get a good solution I think it’s very easy to see that it will revolutionalise business data processing as did the database before it.

The tool he demoed would have been particularly useful as a generic data anaylsis tool and seemed to me a general purpose tool to start understanding the data , visualising it etc with view for determining a specific answer. It was a very brief glimpse but seemed oriented on solving those segmentation queries, eg tracking down a subset of a larger population given various tracers, patterns etc. It seemed pretty effective and gave analysts that ability to mine large amounts of data and segment down to some subset of interest in what seemed close to realtime. Part of what I see as an excellent data modeling habit is to get down and play with the dirty dirty dirty data. You need to understand its characteristics and this tool would fit the bill. It’s wierd when you think about it in a way; These very expensive tools are processing peta and terabytes of data to produce formats where an analyst can apply their superior pattern recognition ability to it to solve the problem and draw often non-intuitive deductions. It’s all about getting it to a format our highly fallable brains can work on. Both this tool, and from what I’ve seen of the new trends of tools such as palantir, mean you can process massive amounts of data to identify and visualise data to segment interesting items that only years ago was simply too slow to be able to respond to in any meaningful way. You can do lots of experiments and visualize the data and then go on and discover more interesting trends and pointers etc in realtime, so I really see these tools changing the face of the analytics profession. In uni we would run through data and get some dodgy little black and white line graph that was next to unintelligible and would have to kill -9 if you were overly ambitious in your data usage, it’s changed so much in a decade. With this ability to record everything everywhere and now analyze it quickly and get initial results in near real time it means businesses and govt can be much more responsive to dealing with everything to planning and breaking emergencies. While I think this is a boon for social research and faster and improved responsiveness for governments I strongly suspect it’s really going to be most used in finance and getting us to buy more, faster ūüėČ

It did get me thinking though and¬†spurred¬†a few conversations with a few colleagues doing big big data analytics. ¬† From my experience in economic modeling and some peripheral fraud detection getting an answer is the /start/ of the job; the next step is to build a tunable¬†predictive¬†model and hook it up to some actions. My feeling is that /most/ of the time you are trying to build a model that then learns (in a constrained manner) and reacts on it’s own. It will be customized and monitored by less analytical staff by tweaking parameters based on current trends and observations. It’s obviously the first half in this model to idenfify the trend, but you need to do something with it and I wonder where tools will take us. I guess in engineering parlance instead of returning a value I’m returning a function that changes based on the inputs. How do we develop tools that allow the building of dynamic models we can use as filters, event drivers, adapters in our systems we ship today. Things that are not static but given core parameters and a stream of information to eat, adjust within a predictable manner. Will we ever have a scenario where we have an analyst that will model, analyse data and output a compiled artifact we slot into our systems as a core observor and action initiating blob. It seems to me like we’re heading to some kindof model which is part rule system, part integration code and part tunable analysis system. My previous role at Oracle was leading development for a high performance rule modelling system for policy experts. I think coupled with a dynamic and probabilistic model it would be capable to put something together that would operate this way and operate over large, real time data sets and streams.

Overall the YOW nights are excellent and this was no exception. It’s great that these high quality speakers are comming to Oz now and I’m really looking forward to the conference in dec.


Congratulations to Bitbucket September 30, 2010

Filed under: business,development,mercurial,startups — danielharrison @ 12:01 am

I saw that bitbucket has been acquired by aussie company Atlassian.¬† I was a pro user as I had a few private repositories (hg didn’t originally support sub repositories).¬† I was always impressed by the customer service at bitbucket and from my dealings I got the impression they were good guys who had the customers interests first.¬† I changed credit cards and paypal subscriptions stopped working for me and rather than make a big deal out of it, Jesper basically stopped charging me money.¬† I got it working again eventually, but it’s that kind of attitude that convinced me that they had my interests as a customer first and that I’d made a good choice over competitors or doing it myself.¬† I know this experience means I recommended them and as a early stage startup it’s an experience that I’ll remember when I’ve (hopefully) got paying customers ūüėČ

So I saw my billing had been cancelled and now it looks like with my current usage I won’t have to pay anything.¬† It also looks like there’s been a few UX changes around teams etc.¬† I like the strategy of at the same time as announcing it, it’s rebranded and working.¬† I previously introduced Atlassians suite into my former workplace (confluence, bamboo, jira, greenhopper, crowd, … ) running over subversion and it always seemed that not having a SCCM system was a weak point to their competitors; so it seems like this is a good strategic investment.¬† When evaluating tools, the competitors for the most part seemed to be SCCM companies with a layer on top.¬† The reason I chose Atlassian was that integrated layer on top with confluence, bamboo, jira etc meant for an internationally distributed team, it gave us the focal point for development that we needed.¬† It will be interesting to see if this is offered for on-premises installation as Atlassian tools are java based and bitbucket with hg I suspect is python based.¬† I looked at running hg with jython when it first came out but it had a few native modules which would have had to be ported from c to get it running.¬†¬† Maybe python is ok though, my experience is the people who tend to look after and maintain these systems tend to be biased towards a particular model, eg java or .net, python might be ok for unix guys, but for windows I’m not sure.¬† Asking either to play outside their comfort area was playing with fire in terms of support, at least in my previous company that’s why we maintained ‘native’ versions with some neat technologies that were baked in house.

So congratulations bitbucket and I’m looking forward to see where it goes from here.


Tradies, Urghhhh! July 7, 2010

Filed under: business,me — danielharrison @ 2:30 am

There was a post up on the RiotACT about looking for other providers for a trades service as the previous ones dumped cigarette buts all over the yard.

We had a similar experience with Knebel kitchens when we renovated so I feel the pain.¬† Our particular experience included cigarettes being dumped on the ground and then went as far as taking out our phone line with their truck and even tiling the floor too high so that doors couldn’t close. ¬†¬† Their solution to that was to trim off the bottom of the door BTW.¬†¬† Ultimately it was a month late and the price premium we payed for a managed single project manager to coordinate everything was wasted money.

It flaberghasts me that tradies in Canberra can think that this behavior is acceptable, it must be costing them business.¬† There’s some gems but it seems like a crap shoot to me.