Daniel Harrison's Personal Blog

Personal blog for daniel harrison

Playing with Cassandra Again September 30, 2010

Filed under: cassandra,development,internet,Uncategorized — danielharrison @ 12:59 am

I’ve been recently playing with the latest version of Cassandra again.   Some new things going in the direction I like is that it seems to be growing into a more enterprise keystore model rather than something that is solving a specific high volume websites requirements only.  To me it felt like there’d been a lot of work in beefing up the management and making it solve a more generic problem.  The programmatic adhoc schema editing was a good improvement and based on the direction, 1.0 is shaping up to be really good.

My previous access code was using the thrift API directly.  For this prototype I tried out a few libraries; Pelops and Hector.  Both seemed to still be thrift focused and I’m not sure how this works with the change to AVRO.  Thrift always felt clumsy to me.  Technologies like thrift and avro, where you’re expressing a language independent communication protocol that various languages need talk in, in my view can’t help bleeding those idioms and generality up to the client.  It means client access code often feels, well slightly awkward.  It feels a bit like the good old days with IIOP/CORBA and EJB communication.  My personal preference is targeted hand coded adapters which feel like a good fit for the language, but the downside of course is that the clients can lag and not always be available for your choice of language.  So it’s a tradeoff as always.  Hector seems like it’s actively trying to avoid this but still has wrappers where if feels a bit thrifty, eg HKSDef instead of KSDef used to create a keystore.  If you are trying out and evaluating these libraries I would highly recommend you bite the bullet and just get the cassandra source for your targeted library and build it yourself.  Due to the fast moving nature it looks like the current releases are out of date and to get it working you really need the latest trunk versions of everything.  For example I don’t think beta2 of 0.7 cassandra is available as a package but it seems to be required with the current version of Pelops and Hector, Pelops is source only on github, so you’ll likely be building things yourself anyway.  I was impressed by both and it feels like there’s alot of room for future improvement and both seem to be shaping up as strong client access libraries.

Another good thing is that it seems like there’s some valuable resources coming through.  At the moment it’s a lot of google and reading the forums to nut out problems.  I bought the ‘Cassandra, the definitive guide’ rough cuts book from Oreilly and it seems like it’s taken a lot of the information, focused it and made it a good source for explanation of idioms and general wisdom.  So my recommendation would be to buy as it seems like it’s going to be an invaluable reference.

My biggest problem for using cassandra at the moment is support for multitenancy.  For the problem I have in mind it requires text indexing and content that is private per account.  With a model like cassandra you need to know what you will be searching for first and basically you build column families representing those indexes.  Now in my case I have users, accounts (many users), objects (storing text) and various indices around that text that drive my application.  Think a little bit like an RDF store with accounts and users.  Now in a traditional database model I would probably store this as a separate database for each account.  This may mean each running datastore instance has 10’s to 1000’s of databases.  With cassandra and the way this is structured this would not be advisable.  Each keystore maintains memory etc and to take advantage of it’s model of replication etc it’s more advisable to have less keyspaces.  Now one of the easy wins in the database server world of having separate databases per account is you’re guaranteed to not see other accounts data, you’re connecting to the datastore for that client which makes it very easy to guarantee and maintain security.  Under cassandra this makes it an application concern at the moment.  For my prototype I wasn’t happy with the extent that this was invading my code and required extra indices to make it all work, all of which increased the cognitive load of developing the application.  There’s work afoot around multi-tennancy requirements, but until that’s addressed, for me at least, it rules cassandra out.  The cassandra team are working on it and there’s some interesting proposals (the namespace one seems interesting) and I’m sure once it’s complete it will really make cassandra the first choice for an enterprise keystore.


Congratulations to Bitbucket

Filed under: business,development,mercurial,startups — danielharrison @ 12:01 am

I saw that bitbucket has been acquired by aussie company Atlassian.  I was a pro user as I had a few private repositories (hg didn’t originally support sub repositories).  I was always impressed by the customer service at bitbucket and from my dealings I got the impression they were good guys who had the customers interests first.  I changed credit cards and paypal subscriptions stopped working for me and rather than make a big deal out of it, Jesper basically stopped charging me money.  I got it working again eventually, but it’s that kind of attitude that convinced me that they had my interests as a customer first and that I’d made a good choice over competitors or doing it myself.  I know this experience means I recommended them and as a early stage startup it’s an experience that I’ll remember when I’ve (hopefully) got paying customers 😉

So I saw my billing had been cancelled and now it looks like with my current usage I won’t have to pay anything.  It also looks like there’s been a few UX changes around teams etc.  I like the strategy of at the same time as announcing it, it’s rebranded and working.  I previously introduced Atlassians suite into my former workplace (confluence, bamboo, jira, greenhopper, crowd, … ) running over subversion and it always seemed that not having a SCCM system was a weak point to their competitors; so it seems like this is a good strategic investment.  When evaluating tools, the competitors for the most part seemed to be SCCM companies with a layer on top.  The reason I chose Atlassian was that integrated layer on top with confluence, bamboo, jira etc meant for an internationally distributed team, it gave us the focal point for development that we needed.  It will be interesting to see if this is offered for on-premises installation as Atlassian tools are java based and bitbucket with hg I suspect is python based.  I looked at running hg with jython when it first came out but it had a few native modules which would have had to be ported from c to get it running.   Maybe python is ok though, my experience is the people who tend to look after and maintain these systems tend to be biased towards a particular model, eg java or .net, python might be ok for unix guys, but for windows I’m not sure.  Asking either to play outside their comfort area was playing with fire in terms of support, at least in my previous company that’s why we maintained ‘native’ versions with some neat technologies that were baked in house.

So congratulations bitbucket and I’m looking forward to see where it goes from here.


Australia == end of the earth September 3, 2010

Filed under: Uncategorized — danielharrison @ 12:19 am

Australia tends to have comparatively overpriced books due to various protectionist government policies. Being the bookish sort this means I tend to order a lot of books online. My preferred provider is bookdepository as it has free shipping to Oz and it takes between 4 days and 2 weeks for books to get here. Today I had to order a few books from Amazon as they weren’t available on bookdepository (Steve Blanks 4 steps to the Epiphany if you’re interested) and got a bit of a shock. Std shipping is 18-32 days !? Expedited shipping to get the same as bookdepository is ~$AUD50 which is almost equivalent to the order cost. You hit this occasionally with vendors in the US but didn’t really expect this from Amazon. Books I’ve ordered from pragmatic programmers and o’reilly with std shipping have tended to get here in a week or so.

My current thinking is it must be coming via balloon. If it took 80 days in the proverbial book, then this timetable would be about right.


Thoughts on blocking autoplaying content, firefox4 September 2, 2010

Filed under: Uncategorized — danielharrison @ 1:23 am

I recently started using the firefox4 beta. It seems like it’s really coming along and should be a great new version. I have noticed an interesting side effect though. As most of the plugins I use aren’t supported yet I get the default experience; this includes flash content playing by default. I had the flashblock plugin and it really sped up and made my browsing experience  much better.   As an example, one of the current news sites I previously used, smh.com.au autoplayed video broadcasts with no site wide way to turn it off.  Memo to news sites:  I can read faster than you can read it to me, autoplaying just makes me go somewhere else, including not buying the physical copy.  The thing that I find most annoying is disruptive sound via flash.  I’m typically playing music on the computer as I work and surfing to a site that suddenly has spoken or other audible content is quite jarring, particularly when it comes through at the maximum volume.

I was thinking about the implications with html5 video and multimedia.  Flashblock stops any animation and sound coming through as most of this type of content is flash which makes it easy to screen. When html5 video really takes off how would you achieve the same effect?  It’s a core html feature, so unless there’s a mechanism to disable sound in the browser and make video not run by default as browser preferences, this could quickly get annoying.  Maybe a good candidate for a plugin.  I’m guessing someone’s already thought about this, but I couldn’t see options in beta4 at the moment.


Wave Good Bye August 5, 2010

Filed under: collaboration,development,internet,Uncategorized — danielharrison @ 3:44 am

It looks like google wave’s been sent to the knackers.  It was an ambitious product trying to change the technology we use to collaborate.  I’m sure we’ll see it come back in various products but as a standalone product it looks like it won’t be around any more.  I remember when it first came out, the general consensus at least in the office I was working in, was; neat technology but what problems can it help me solve, is this really that much better than email?  There’s been alot of casualties in the groupware space and I guess google wave is another victim in the war on collaboration.  The current email communication hegemony seems like ripe pickings for disruption; technology stack from another era, massive implications and cost savings if you can make people more productive etc.

My startup knowtu operates in the enterprise collaboration and communication market which wave kind of did and the lessons I think I see are:

  • Email still rules and will for the foreseeable future.
  • Technology is important but by itself doesn’t solve problems.
  • Good enough wins.

While I think wave had it’s issues, it’s disappointing to see it end particularly as this seemed to have a local Australian connection with a large contingent in Google’s Sydney office.   I always felt over time the tech platform meant neat stuff would be built on top and slowly it would succeed.  Alot of the tech is open source so maybe it will come back at some point in the future, I guess we’ll just have to wait and see.


Turning off delicious automatic posting August 3, 2010

Filed under: Uncategorized — danielharrison @ 11:43 pm

I have a delicious job that posts my bookmarks to this blog every day in the rationale if I’ve found it interesting someone else would too. It also keeps some activity going on the blog when I don’t really have time to keep it up to date.

I’ve been thinking about this for a while but I think it’s a bit noisy. It posts every day which makes it quick to drown out the human authored content. What I plan to do is take a more curated approach, either posting once a week or only posting references with commentary on my thoughts which seems more appropriate for a stream of consciousness blog. I’ve seen this on other blogs and I think it works better. In the meantime if you are interested in what I’m reading and find interesting on the web you can see my bookmarks here: http://delicious.com/daniel.harrison.au


links for 2010-08-02

Filed under: Uncategorized — danielharrison @ 2:02 am