Daniel Harrison's Personal Blog

Personal blog for daniel harrison

Playing with Cassandra Again September 30, 2010

Filed under: cassandra,development,internet,Uncategorized — danielharrison @ 12:59 am

I’ve been recently playing with the latest version of Cassandra again.   Some new things going in the direction I like is that it seems to be growing into a more enterprise keystore model rather than something that is solving a specific high volume websites requirements only.  To me it felt like there’d been a lot of work in beefing up the management and making it solve a more generic problem.  The programmatic adhoc schema editing was a good improvement and based on the direction, 1.0 is shaping up to be really good.

My previous access code was using the thrift API directly.  For this prototype I tried out a few libraries; Pelops and Hector.  Both seemed to still be thrift focused and I’m not sure how this works with the change to AVRO.  Thrift always felt clumsy to me.  Technologies like thrift and avro, where you’re expressing a language independent communication protocol that various languages need talk in, in my view can’t help bleeding those idioms and generality up to the client.  It means client access code often feels, well slightly awkward.  It feels a bit like the good old days with IIOP/CORBA and EJB communication.  My personal preference is targeted hand coded adapters which feel like a good fit for the language, but the downside of course is that the clients can lag and not always be available for your choice of language.  So it’s a tradeoff as always.  Hector seems like it’s actively trying to avoid this but still has wrappers where if feels a bit thrifty, eg HKSDef instead of KSDef used to create a keystore.  If you are trying out and evaluating these libraries I would highly recommend you bite the bullet and just get the cassandra source for your targeted library and build it yourself.  Due to the fast moving nature it looks like the current releases are out of date and to get it working you really need the latest trunk versions of everything.  For example I don’t think beta2 of 0.7 cassandra is available as a package but it seems to be required with the current version of Pelops and Hector, Pelops is source only on github, so you’ll likely be building things yourself anyway.  I was impressed by both and it feels like there’s alot of room for future improvement and both seem to be shaping up as strong client access libraries.

Another good thing is that it seems like there’s some valuable resources coming through.  At the moment it’s a lot of google and reading the forums to nut out problems.  I bought the ‘Cassandra, the definitive guide’ rough cuts book from Oreilly and it seems like it’s taken a lot of the information, focused it and made it a good source for explanation of idioms and general wisdom.  So my recommendation would be to buy as it seems like it’s going to be an invaluable reference.

My biggest problem for using cassandra at the moment is support for multitenancy.  For the problem I have in mind it requires text indexing and content that is private per account.  With a model like cassandra you need to know what you will be searching for first and basically you build column families representing those indexes.  Now in my case I have users, accounts (many users), objects (storing text) and various indices around that text that drive my application.  Think a little bit like an RDF store with accounts and users.  Now in a traditional database model I would probably store this as a separate database for each account.  This may mean each running datastore instance has 10’s to 1000’s of databases.  With cassandra and the way this is structured this would not be advisable.  Each keystore maintains memory etc and to take advantage of it’s model of replication etc it’s more advisable to have less keyspaces.  Now one of the easy wins in the database server world of having separate databases per account is you’re guaranteed to not see other accounts data, you’re connecting to the datastore for that client which makes it very easy to guarantee and maintain security.  Under cassandra this makes it an application concern at the moment.  For my prototype I wasn’t happy with the extent that this was invading my code and required extra indices to make it all work, all of which increased the cognitive load of developing the application.  There’s work afoot around multi-tennancy requirements, but until that’s addressed, for me at least, it rules cassandra out.  The cassandra team are working on it and there’s some interesting proposals (the namespace one seems interesting) and I’m sure once it’s complete it will really make cassandra the first choice for an enterprise keystore.


Congratulations to Bitbucket

Filed under: business,development,mercurial,startups — danielharrison @ 12:01 am

I saw that bitbucket has been acquired by aussie company Atlassian.  I was a pro user as I had a few private repositories (hg didn’t originally support sub repositories).  I was always impressed by the customer service at bitbucket and from my dealings I got the impression they were good guys who had the customers interests first.  I changed credit cards and paypal subscriptions stopped working for me and rather than make a big deal out of it, Jesper basically stopped charging me money.  I got it working again eventually, but it’s that kind of attitude that convinced me that they had my interests as a customer first and that I’d made a good choice over competitors or doing it myself.  I know this experience means I recommended them and as a early stage startup it’s an experience that I’ll remember when I’ve (hopefully) got paying customers 😉

So I saw my billing had been cancelled and now it looks like with my current usage I won’t have to pay anything.  It also looks like there’s been a few UX changes around teams etc.  I like the strategy of at the same time as announcing it, it’s rebranded and working.  I previously introduced Atlassians suite into my former workplace (confluence, bamboo, jira, greenhopper, crowd, … ) running over subversion and it always seemed that not having a SCCM system was a weak point to their competitors; so it seems like this is a good strategic investment.  When evaluating tools, the competitors for the most part seemed to be SCCM companies with a layer on top.  The reason I chose Atlassian was that integrated layer on top with confluence, bamboo, jira etc meant for an internationally distributed team, it gave us the focal point for development that we needed.  It will be interesting to see if this is offered for on-premises installation as Atlassian tools are java based and bitbucket with hg I suspect is python based.  I looked at running hg with jython when it first came out but it had a few native modules which would have had to be ported from c to get it running.   Maybe python is ok though, my experience is the people who tend to look after and maintain these systems tend to be biased towards a particular model, eg java or .net, python might be ok for unix guys, but for windows I’m not sure.  Asking either to play outside their comfort area was playing with fire in terms of support, at least in my previous company that’s why we maintained ‘native’ versions with some neat technologies that were baked in house.

So congratulations bitbucket and I’m looking forward to see where it goes from here.


Australia == end of the earth September 3, 2010

Filed under: Uncategorized — danielharrison @ 12:19 am

Australia tends to have comparatively overpriced books due to various protectionist government policies. Being the bookish sort this means I tend to order a lot of books online. My preferred provider is bookdepository as it has free shipping to Oz and it takes between 4 days and 2 weeks for books to get here. Today I had to order a few books from Amazon as they weren’t available on bookdepository (Steve Blanks 4 steps to the Epiphany if you’re interested) and got a bit of a shock. Std shipping is 18-32 days !? Expedited shipping to get the same as bookdepository is ~$AUD50 which is almost equivalent to the order cost. You hit this occasionally with vendors in the US but didn’t really expect this from Amazon. Books I’ve ordered from pragmatic programmers and o’reilly with std shipping have tended to get here in a week or so.

My current thinking is it must be coming via balloon. If it took 80 days in the proverbial book, then this timetable would be about right.


Thoughts on blocking autoplaying content, firefox4 September 2, 2010

Filed under: Uncategorized — danielharrison @ 1:23 am

I recently started using the firefox4 beta. It seems like it’s really coming along and should be a great new version. I have noticed an interesting side effect though. As most of the plugins I use aren’t supported yet I get the default experience; this includes flash content playing by default. I had the flashblock plugin and it really sped up and made my browsing experience  much better.   As an example, one of the current news sites I previously used, smh.com.au autoplayed video broadcasts with no site wide way to turn it off.  Memo to news sites:  I can read faster than you can read it to me, autoplaying just makes me go somewhere else, including not buying the physical copy.  The thing that I find most annoying is disruptive sound via flash.  I’m typically playing music on the computer as I work and surfing to a site that suddenly has spoken or other audible content is quite jarring, particularly when it comes through at the maximum volume.

I was thinking about the implications with html5 video and multimedia.  Flashblock stops any animation and sound coming through as most of this type of content is flash which makes it easy to screen. When html5 video really takes off how would you achieve the same effect?  It’s a core html feature, so unless there’s a mechanism to disable sound in the browser and make video not run by default as browser preferences, this could quickly get annoying.  Maybe a good candidate for a plugin.  I’m guessing someone’s already thought about this, but I couldn’t see options in beta4 at the moment.