Friday, May 9, 2014

Las Juju Baby!

Last week was my first sprint with the Juju team. We flew in from all over the world and gathered at the Flamingo, Las Vegas for a week long sprint. It was the first time I met everyone in person. It was also the first time I lost my voice (due to a bug I caught on the plane).

As I silently observed the dynamics of the group and absorbed as much knowledge as I could, I became impressed by the organisation and productivity of the company. The lightning talks at the end of each day gave me shivers as I realised the potential and momentum of this thing called Juju and the talented and professional team that I was now a part of.

IBM was also in town the same weekwhich was no coincidence. During their conference, IBM announced their next generation of Power Systems which will provide the backbone for their scale out and cloud computing services.  Canonical took the opportunity to announce Ubuntu 14.04 LTS's, and thus Juju's, support for the new  IBM POWER8 Linux servers. We were given a six minute slot to demo Juju deploying Hadoop, Websphere, SugarCRM with MariaDB and Memcached onto a POWER8 system. All services were fully deployed and running in 178 seconds. The audience was impressed and excited. We had three minutes to spare.

So, what did it take to get Juju running on POWER8? As a newbie to the core team I was put on bug hunting. We were about two weeks out from our first demo with IBM. Juju on PPC was not a happy camper, many tests failed. Dave Cheney spun up a PPC VM for me, I SSHed in and got hacking. One by one, bugs were filed, fixed and tests passed. A theme emerged among the bugs I was finding and fixing. They each relied on an erroneous assumption about the environment they were running in. One assumed ssh keys had been generated, another that the host architecture was amd64.

Broadly speaking, the bugs were  indicative of poor test isolation. The tests were usually run using the gc compiler on a machine with network access and an amd64 architecture. Change to the gccgo compiler, turn off network access (or in the VM's case, use proxied network access) and use a ppc64el architecture and you'll soon see the many unexamined assumptions made by the test suites. Like a ship in new waters, gaps in the test fixtures sprung leaks which soon sunk the boat. Taking the matter of test isolation further, many tests rely (as of this writing), unnecessarily, on building jujud. This bug captures the many tests that fail if jujud does not compile. With luck, I will land a branch to resolve this in the coming week.

When I say 'erroneous assumption' or 'unexamined assumption' I really mean an unacknowledged assumption. Such assumptions are unavoidable and  necessary to make any progress. What is important is to accept this fact and welcome the revelation when these assumptions are revealed. In any reasoning, you are relying on, to a greater or lesser degree, an unexamined a priori assumption that will be challenged and usually trumped by a posteriori knowledge. Running Juju on POWER8 was an experience that revealed and challenged many assumptions in the code base. Addressing those assumptions has not only allowed Juju to run on POWER8, but has hardened the code base to sail further into uncharted waters.

No comments:

Post a Comment