Monday, February 22, 2010

Primitive Obsession obsessions

After seeing the Primitive Obsession code smell crop up in a few different places recently I got to thinking about it in the context of static and dynamic languages.

Essentially this code smell refers to using a primitive data type (int, string, etc), or a cluster thereof, where you could instead be using an explicitly defined type. Probably the most common variant of Primitive Obsession would be a method like:

def is_point_within_bounds( x, y )
# ...
end

I would call this variant Primitives Travelling in a Pack. You'll probably see these two x and y values hanging out together all over the place. Often this variant will be addressed with the Introduce Parameter Object refactoring, leading to something like:

class Point < Struct.new(:x,:y)
end

def is_point_within_bounds( point )
# ...
end

This is a pretty uncontroversial refactoring. It tends to make the code easier to read, and is often the initial stage in the 'budding of' of a full-fledged class (I believe I have the wonderful GOOS book to thank for that term). Once the class is there, it often becomes a locus for a little cluster of methods, often pulled in from surrounding client code. This is a powerful way of letting your design evolve based on needs.

There are other more subtle variants of Primitive Obsession. Some people would say that two methods like:

public class Rental {
  void increaseRentalPeriod( int numWeeks ) {
   //...
  } 

  int getRentalPeriod() {
   //...
  }
} 

would be better refactored to:

public class WeekSpan {
    public int numWeeks;
}

public class Rental {
  void increaseRentalPeriod( WeekSpan period ) {
   //...
  } 

  WeekSpan getRentalPeriod() {
   //...
  }
} 

Applying this refactoring can again be the first step in budding off a new class. However I would say that the main advantage of this refactoring is the explicit typing which it adds when in a statically typed language. Before refactoring there was an implicit typing going on that said "ints in this context actually represent a time period, in weeks". After the refactoring that typing has been made explicit. One outcome of that is that type errors which were previously implicit are now explicit, and will be caught by the type system. Before the refactoring you could have written client code like

someRental.increaseRentalPeriod( numDaysToIncreaseRentalBy() );

and your compiler wouldn't utter a squeak, despite the fact that you're passing in a number of days, rather than the number of weeks which the method expects. With the explicit WeekSpan type, that error wouldn't sneak through.

This explicit typing also helps with future refactoring. If we later decided that it would be better to represent the rental period using a number of days rather than a number of weeks we could use a automated refactoring tool to trivially replace all instances of WeekSpan with DaySpans (or we could just refactor WeekSpan itself into a generic TimePeriod). If we didn't have the explicit WeekSpan class then we'd be reduced to manually searching for all the instances of the primitive int type which happened to actually be representing a time period in weeks.

This is all good stuff, and makes an awful lot of sense in a static language. However, I personally am not convinced that introducing these single-field data types makes as much immediate sense in a dynamically typed language. Both of the typing advantages are lost in a dynamic language. We don't know the type of anything until runtime so we can't check for type violations (however our unit tests help with that), and our refactoring tools can't take advantage of static type information.

The other advantages of introducing these non-primitive types remain, but I would err towards the YAGNI principle. Leave the primitives where they are, and make a mental note that they might need some attention in the future. Once you see a need to move to a next step in the budding off process (e.g. moving a method into the nascent data type), then that's the time to apply the refactoring. Doing so just for the sake of having an explicit type smacks of premature optimization to me.

Postscript

It's interesting to note that some languages (such as Google's Go) allow you to declare an explicit type which simply aliases a primitive. For example in Go you could write

type WeekSpan uint;

This might be the best of both worlds. You get the explicit typing and attendant type safety, without the potential overkill of creating an new single-field class. If you later discovered that budding off a new class was required it would be trivial to refactor that semi-primitive WeekSpan type into a full fledged WeekSpan class.

Friday, February 19, 2010

Encapsulating user interaction events in Flex

When developing the presentation layer in a Flex application I like to follow an MVC/MVP pattern. I also like to keep my views nice and skinny, with as much logic as possible in the controller/presenter. However, I do like to encapsulate some of the details of the UI itself within the view, and so I shy away from exposing raw user interaction events (button clicks, list selections, etc) outside of the view. Instead I like to have the view capture those events and translate them into what I call user action events, which represent higher-level, user-centric interactions. So instead of publishing a 'list item selected' event, the view publishes a 'blog post selected' event. For example, here what a typical event handler would look like in a view's MXML file:

private function onListItemDoubleClick(event:ListEvent):void {
  dispatchEvent( new Event(EVENT_postSelected) );
}

This allows small user experience tweaks (e.g. double clicks to select vs single clicks) without affecting clients of the view. More importantly it helps the view expose a consistent, user-centric level of abstraction - its interface talks in terms of user actions, not UI minutia. A view's controller would register listeners for these user action events via the generic IEventDispatcher::addEventListener(...) exposed by the view, and process them appropriately in the registered listener callback:

public class Controller
{
  //... 

  public function bindToView(view:IView):void {
    _view = view;
    //...
    _view.addEventListener( View.EVENT_postSelected, onPostSelected );
  }
  
  //...

  private function onPostSelected(event:Event):void {
    //... process user action here
    //...
  }
}

One thing to note in passing (we'll come back to it) is the ugly fact that while bindToView(...) is dealing with an abstract interface (IView), it still needs to reference the concrete View implementation in order to get to the View.EVENT_postSelected constant. In real terms the controller class has a dependency on the concrete view implementation.

Back to the story. I want to make sure that my controller processes these user actions correctly, which for me means I need good unit test coverage of the listener callbacks which the controller registers. These callbacks are private methods, so therefore I need some way of simulating the occurrence of these user actions within the unit tests for my controller. Typically when unit testing a controller I arrange for it to be bound to a stub implementation of the view. To simulate these user actions I could have that stub implementation derive from EventDispatcher. During test setup the controller would be bound to the stub view, and as part of that process would subscribe to these user action events. Subsequently my test code could artificially fire the appropriate event within a test case using EventDispatcher::dispatchEvent(...) in order to simulate a given user action occurring.

public class StubView extends EventDispatcher implements IView
{
  // other stub stuff here
}

public class ControllerTests 
{

  [Before]
  public function setUp():void {
    // instantiate controller, instantiate stub view, bind controller to stub view, etc...
  }

  [Test]
  public function doesSomethingWhenPostSelected():void {
    // ... set up test
    simulatePostSelected();
    // ... verify expectations
  }

  // .. other tests

  private function simulatePostSelected():void {
    _stubView.dispatchEvent( new Event( View.EVENT_postSelected ) );
  }
}

This has always felt a little hacky to me, and I started to wonder if that feeling indicated a design flaw (I find that when something feels wrong when writing tests it often points towards a design deficiency). So recently I started experimenting with another approach. Instead of the controller registering event callbacks on the view directly with addEventListener(...), the view exposes methods which do the registration on the controller's behalf. I call these interest registration methods. Instead of the controller calling view.addEventListener( SOME_EVENT_TYPE, my_callback ), it calls view.addUserActionListener( my_callback ).

 // ... in the view MXML
 // ...
 public function addPostSelectedListener(listener:Function):void {
   addEventListener( EVENT_postSelected, listener );
  }

// ... in the controller
// ...
  public function bindToView(view:IView):void {
    _view = view;
  //...
    _view.addPostSelectedListener( onPostSelected );
  }

In production code the view implements these interest registration methods in the same way as the controller did before - by calling addEventListener(...) with the appropriate event type and the callback supplied by the caller. The code is the same, it's just moved from the controller to the view.

The effect this has is quite powerful however. The fact that events are used as the mechanism for invoking the callback is hidden within the view. This becomes interesting when you come to test your controller. Instead of subscribing to events, your stub implementation of the view can implement the interest registration methods by just holding the callback in a instance variable. When the test wants to simulate a user action it can ask the view for the registered callback and call it directly, bypassing the Flex eventing mechanism entirely.

public class StubView implements IView
{
  public var postSelectedListener:Function;
  public function addPostSelectedListener(listener:Function):void {
    postSelectedListener = listener;
  }

   // .. the rest of IView implemented here
}

public class ControllerTests
{
  // ... every thing else as before

  private function simulatePostSelected():void {
    _stubView.postSelectedListener( new Event( 'ignored' ) );
  }
}

What's more interesting is the effect this change has on the interface exposed by the view. Previously we had a view which exposed implementation details - the fact it was handling client callback registration using EventDispatcher), and most tellingly the string constants which identify different event types. We also had different levels of abstraction being exposed in the same interface. Typically the other parts of the view interface worked on the high level of user actions, not on the low level of events being fired of a particular type. With the migration to the interest registration methods we have a consistent level of abstraction, and we can hide all the messy implementation details of the eventing mechanism. Those annoying public event type strings become private and hidden away within the view; an implementation detail. We can even totally encapsulate the fact that the view inherits from EventDispatcher. The explicit interface also feels more typesafe to me. It's impossible to accidentally register for an invalid event code, and it's immediately obvious from the view's interface which user actions it reports.

There are some drawbacks to this approach. Because we're abstracting away the details of the event dispatching mechanism we're also hiding some of the advanced facilities that mechanism provides. For example the approach as described doesn't allow a client of the view to unregister its interest, which would usually be accomplished using IEventDispatcher#removeEventListener(...). I would argue that this is a reasonable price to pay. If that functionality is required in some situations it would be straightforward to add, and because it would be explicitly added as a new method in the view's interface it would be clear that the view was expecting clients to use that method.

All in all I'm very happy with how this experiment turned out. By paying attention to the 'smells' that my tests were exposing I discovered a valid issue with the way different parts of my code interacted. I would argue that changing that interaction to make the tests cleaner does seem to have lead to a cleaner design overall.

Thursday, February 18, 2010

Ruby Facets: the mash method

I keep meaning to write up some of the useful gems (if you'll pardon the pun) which are hidden in the super-handy Facets gem. Today I'll cover Enumerable#mash.

Let's say you have a list of Users, and you'd like to create a hash which lets you look up the Users based on their login. You might write something like:

def create_login_hash_for( users )
  user_login_hash = {}
  users.each do |user|
    user_login_hash[user.login] = user
  end
  user_login_hash
end

With Enumerable#mash, you can trim that down to:

def create_login_hash_for( users )
  users.mash do |user|
    [user.login,user]
  end
end

This is much more succinct. More importantly, it expresses the intention more clearly.

Wednesday, February 3, 2010

Partial commits with git

I've been using git for a few months now. Like all the best dev tools it has slowly permeated my workflow to the point where I'd feel pretty crippled without it. In fact, I don't think I would ever go back to a centralized system like svn at this point. In a pinch I'd fall back to using git locally and then using the appropriate git-to-foo integration to push my local changes from my git repo to whatever the centralized repository was.

So, git is great. However, there's one practice which is common amongst git users which I am still uncomfortable with (even though I do it myself on occasion). I'm referring to the practice of staging some, but not all, of the changes in your working tree to the index, and then committing that partial change set. For example, let's say I've been hacking away on my stamp collector application. I added the ability to 'tag' stamps (folksonomy now being mainstream enough to appeal to your average philatelist). While I was working, I also fixed a couple of badly named methods that I happened to come across during my other changes. With git I can decide to commit these changes separately, through judicious use of the index. I can say 'add changes to files x and y to the index, but leave out z for now', then commit, then add the changes to file z to the index, and then commit that change. If I want to get really fancy I can even add some of the changes to a file to the index, but not all changes. This way I can seperate my changes into two logical commits, one for the feature addition of tagging stamps, and one for the miscellaneous method name cleanup.

This is definitely one of those features that is very cool once you realize the power it offers. It means someone else reviewing my commits will have an easier time, and it even means that I could roll back my feature addition change but still keep the method name cleanup work. Still, I would submit that this can be a Bad Idea.

Why does this concern me? Because any such partial commits haven't been tested before being commited. At this point you might be thinking "Well, don't know about you, but I run my tests before each commit". The point here is that the tests run before you commit are run against your working tree, which in the world of git isn't necessarily the same as your commit. In the Good Old Days of svn, the changeset in your working tree was always the same as the changeset in your commit. With git and the index (aka the stage, aka the cache, don't get me started on that) that's not the case. Your working tree may contain changes to files x, y and z, but you've decided to only commit the changes to files x and y. Or even more extreme, you've decided to commit the changes to file x but only some of the changes to file y (maybe the other changes where related to your method renaming binge). So you're running your tests against one set of code (your working tree, with changes to x, y, and z), but your commit is another set of code. I think that ninety nine times out of a hundred this distinction won't matter. Your "technically untested" commit would still have passed the tests anyway. Plus, what you're almost certainly following it up straight away with another commit which will be the same as your working tree. What are the chances that someone will manage to stumble upon an invalid state like that. Probably really low. But still, kinda scary I think. It seems a bit like saying "well, this race condition is really unlikely to happen, right...". I'd rather not have to even think about that. Another thing to consider is the 'cool' use case that someone could decide to revert one commit but not the other.At that point you have code sitting on head that hasn't been tested.

One solution to this would be to ban the use of commits that don't include all changes in the working tree. Yeah, right. Not likely, and not helpful. Staging is definitely a useful way to add more structure to your change history, and not something I'd like to leave behind.

I wonder if a better solution could be achieved with the judicious use of commit hooks. What if we had a pre-commit script that would take the change set that was about to be committed (NOT whatever is in the working tree) and run tests against that tree. If the tests fail the commit fails. Now we would be left in a place of greater confidence than before. We wouldn't even need the discipline to run the tests by hand, because even if we forgot then git would be running them for us anyway. To be honest I'm not sure of the feasibility of creating this kind of setup. I'd love to find out more about how feasible it would be.

Option number 3 would be to have a CI pipeline such that you don't ever commit changes to a branch that others can touch until the CI system has had a chance to run tests against your commit. Instead, you would always be commiting changes to a private branch. The CI system would detect your commit, run a build, and then merge your changes into a shared branch if and only if your commit passed muster. I don't think this would prevent commits in the middle of a batch of commits being pushed to your private branch from bein un-tested, but it would prevent the system ever getting into a state where the shared head is untested. This is an idea I'm planning to blog about more at some point in the future.

In conclusion, while I find myself sometimes taking advantage of this powerful feature of git, I always feel a little nervous doing so, and try and take extra care.