Anagram Code Kata Part 5 – Domain Objects Over Language Primitives

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

In this post, we will discuss some reasons why you might want to avoid using language primitives directly in place of domain objects.  Specifically, I have been using String variables and objects to represent words up to this point.  Esteban suggested I create a Word class for a few good reasons which I’ll layout for you.  Out of sheer luck and good timing, Dru Sellers of CodeBetter.com also wrote on this subject shortly afterward and confirmed Esteban’s reasoning.

Use String Class or Create Word Domain Object?

I mentioned in the previous post that Esteban had suggested creating a Word class instead of just passing around strings everywhere through my application.  One good reason is that we don’t own the String class and so further modification to our string-based implementation is difficult because the logic is scattered throughout the application, instead of centralized under the responsibility and definition of one domain class.  It is much more difficult to refactor the word-specific logic with its behaviors and attributes spread throughout the code.  Even if your Word class doesn’t grow any more beyond a seemingly unnecessary wrapper of the String class, the code is more cohesive and ready for change should the need arise.

But more importantly, you designed the code thinking in a true object-oriented mindset.  You have to keep in mind that the String class is someone else’s implementation, and is hardly ever sufficient in and of itself as a domain object within your solution.  Think about it, the behaviors that a string object performs are so generic and multipurpose that you likely don’t need two-thirds of the class as defined (and there’s likely a few behaviors you really need that aren’t there).  Of course nearly every application on earth makes use of the String class; but because of this fact, it has no meaning in and of itself within any given application.  You would have to look at how the stings are actually used within the code in order to understand its unique application within the app’s context.  Of course that task of research is much easier for everyone (including the original author of the code) if it’s all encapsulated within a dedicated domain object class.  True object-orientation means describing in code form the properties, behaviors, and interactions/relationships of real world objects within your problem domain.

Esteban gave me a great example to illustrate these points.  He said that you can always represent money as a decimal (and even when you use a domain object, it’s got to have a decimal language primitive underneath the covers).  However, what happens when you need to attach metadata to the amount (like currency denomination), or if you need to change decimal precision?  You would have to go through all of the code and make sure your use of decimal language primitives is modified uniformly in order to retain consistency.  Also, mathematic operations involving money are hardly ever the same as their counterparts involving standard decimals, because currency deals with discrete values to a certain decimal precision.  Typically when the behaviors and properties within our system begin to get complex, we are cognizant enough to create domain objects in order to bring it all under one class.  We definitely don’t want to over-architect features and interactions before we need them, but I think there is power in this principle of abstracting away language primitives and instead encapsulating their use within domain objects located in just one place in your codebase.  I believe it is one thing we can keep in mind to help guide us to better object-oriented thinking, and avoid language-oriented coding.

As mentioned above, Esteban’s thoughts were confirmed nearly verbatim by Dru Sellers of CodeBetter.com in his blog post that he wrote just a few days after I had the conversation with Esteban.  A great coincidence no doubt, and worth a read; here’s the link:

Business Primitives (1/2) 

Word Class Implementation

So basically I created the following class implementation and then replaced string with a reference to this new class, Word:

public class Word
{
    private string wordStr = string.Empty;

    public Word(string wordStr)
    {
        this.wordStr = wordStr;
    }

    public override string ToString()
    {
        return wordStr;
    }

    public override bool Equals(object obj)
    {
        return wordStr == ((Word)obj).ToString();
    }

    public override int GetHashCode()
    {
        return base.GetHashCode();
    }

    public int GetCanonicalHashCode()
    {
        char[] letters = wordStr.ToCharArray();
        Array.Sort<char>(letters);
        return new string(letters).GetHashCode();
    }
}

I have defined an overridden implementation for Equals(object) (so that the test assertions and other IEnumerable.Contains() queries work) and GetHashCode() (solely to satisfy a compiler warning).  I also moved the GetCanoncialHashCode() method from AnagramGrouper, in order to better encapsulate it as a behavior a Word knows how to do innately.

One other change I had to make was to convert strings into Word objects in our NewlineFileParser, which I accomplished by using an IEnumberable.Select() call as shown below:

return File.ReadAllLines(filePath).Select<string, Word>(x => new Word(x));

Let’s Revisit AssertWasCalled One More Time

Trust me, I am groaning with you, even as I wrote that heading text.  The last thing we need is for me to rehash the topic again and flip flop my stance yet another time.  Yes, that’s right I’ve changed my mind again.  First I couldn’t understand what utility asserting methods were called would have under normal test scenarios.  Then I changed my mind that perhaps using it would help my test specifications have clearer intent of what I am asserting.  After an email conversation with David Tchepak, I think I’m now back to my original stance.  Here is what Dave said that had me reconsidering, and I think it’s pretty sound reasoning (emphasis added by me):

“But seeing as you asked for it, here goes. :)

“I try and avoid AssertWasCalled like the plague. Generally I don't care that some method was called, I care that my class does the work it needs to. If that involves calling a dependency then great, but that is not my class' reason for existence.

“I prefer your original approach of stubbing out everything in the setup and having that tested indirectly. One reason I prefer this is I find it makes it easier to refactor: the assertions in my test don't change, the class still does the same thing. However I can add or change dependencies by changing some wiring in the setup, and then make sure I haven't stuffed anything up as my assertions still pass. I prefer that my tests specify what I want, not how it does it. To me, AssertWasCalled reeks of over-specification. The one exception is where I hit the external boundaries of my code, so where I want to send an email or something without side effects that I can test. Then the core of the behaviour is the call itself, so I'm happy to assert on that then.”

And with that I’ll promise to never bring this up again…unless of course I get swayed by someone else. :)  In all seriousness, I think this is an interesting discussion and so if you have insight, please share via comment below.

Summary

Please leave your thoughts in the comments below in regard to creating domain objects over using language primitives (or heaven forbid, the AssertWasCalled debate).  As far as this coding kata exercise, I need to take a high-level look at where this should go next.  Perhaps the next post will tie up loose ends and see how our code performs on large text files as input.  It may be that we will need to refactor our architecture to achieve better runtimes.  If not, maybe we can still discuss where would could have headed if it had been necessary.

View Full Article

Anagram Code Kata Part 4 – Will it Write My Code for Me?

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

In this post, we work on implementing the other main dependency, IAnagramsFinder.  In doing so, you will discover that I have been a little naïve with this BDD methodology. Apparently I expected a different experience than what I ran into doing the exercise.  In gathering feedback, I had a very interesting discussion about Object-Oriented Programming with Esteban that I will summarize and ask for further discussion from all of you.  I’ll also show you one way to debug your code when running the MSpec tests via the console test runner.

New Thoughts on Mocks vs. Stubs

During Part 2 of this series, Tobias left a comment about using mocks and AssertWasCalled, rather than stubs and defining fake responses.  I didn’t understand how it would help, as it appeared they would accomplish roughly the same thing.  With stubs and fake responses, if you don’t get back the return you are looking for, then you can infer that the dependencies weren’t called since you had previously hard-coded their return values when called.  With mocks and asserting methods were called, you directly assert that the manager class did in fact call it’s dependencies.

When I looked more closely into using the AssertWasCalled method in Rhino.Mocks, I noticed that my strategy of using stubs and faked return values wasn’t very explicit in signifying that the ultimate responsibility of the manager class was to delegate to its dependencies.  I mean the test context does rigorously stub out the dependencies, but the actual assertions weren’t clear what we were testing.  Therefore, I decided to pop in a pair of AssertWasCalled statements (as found below), leaving the rest of the test the same as it was back in Part 2.

It should_result_in_list_of_anagram_sets_and_both_dependencies_should_have_been_called = () =>
{
    fileParser.AssertWasCalled(x => x.ExtractWordListFromFile(filePath));
    anagramGrouper.AssertWasCalled(x => x.FindAnagramSets(wordListFromFile));

    result.ShouldEqual(expected);
};

However, when I thought to simplify the test code by removing most of stubbing setup, it appeared I had to leave most of the stub logic in tact in order to test if anagramGrouper.FindAnagramSets() was called with parameter wordListFromFile, which was one of the stubbed fake responses.  The one dependency (the file parser) needed to pass it’s return value to the next dependency (the anagrams finder).  I could not think of a way to accomplish testing that the manager class facilitated that without the stubbing logic (fake responses) that I created.

It would be great if someone could enlighten me since I’m such a newbie to mocking and stubbing.  Otherwise, I feel the intent of the test is now more clear with the AssertWasCalled checks that I added, even if that I means I’m needlessly mixing stubbing and mocking.  But one last question:  should I try to stick more closely to the rule of thumb guiding me to stick to one all-encompassing assertion per test if possible?  Let me know your thoughts.

Implementing the AnagramGrouper Dependency

So let’s continue onward with some more actual design and code.  We are tackling the AnagramGrouper dependency, whose sole responsibility is to take in the parsed collection of words and find anagram sets.  Here is the test spec (one thing to note is that in some of my earlier tests for the AnagramsFinder manager class and IFileParser, I had used string arrays liberally; I decided to go with IList<string> collections instead):

public class AnagramGrouperSpecs
{
    [Subject(typeof(AnagramGrouperSpecs))]
    public class when_given_word_list_collection
    {
        static AnagramGrouper sut;
        static IEnumerable<IList<string>> result;
        static IEnumerable<string> wordList = new[] { "wordA", "wordB", "Aword", "wordC" };

        Establish context = () =>
        {
            sut = new AnagramGrouper();
        };

        Because of = () =>
        {
            result = sut.FindAnagramSets(wordList);
        };

        It should_result_in_anagram_sets_collection_of_length_1 = () =>
        {
            result.Count().ShouldEqual(1);
        };

        It should_contain_2_specific_words_in_the_anagram_set = () =>
        {
            result.First().ShouldContain("wordA", "Aword");
        };
    }
}

Here too I have basically two assertions (but split it up differently than last time); should I also try harder to avoid this as well?

When contemplating how to further breakdown this responsibility of finding anagram sets into subtasks, I figure we would need one piece of code that would compare two words and answer whether they are anagrams of each other, and another piece of code that would keep track of the various anagram sets (perhaps a collection or dictionary of string arrays).  With what BDD has revealed to me thus far, I figure this means we are going to see two more dependencies (and interfaces, and spec tests) added to the codebase.

So I start with the word comparison task, thinking I could perhaps leverage the String.GetHashCode() method for assigning meaningful value to the list of characters in the string.  However, string hash codes give importance to letter casing and order of characters, so a modified strategy would have to be utilized, even though GetHashCode() seems quite close to filling our need.

Now I probably should have come up with this ingenious algorithm all by myself, but…it is what it is and this is the best, most concise solution that I can put together to solve the problem.  With that said, I ended up doing some internet searching in order to proof-of-concept my GetHashCode() idea.  I came across pure genius in the form of Brian Mullen’s blog post entitled LINQ group by and GroupBy, specifically the second-to-last code snippet.  Just String.ToCharArray() the word and then Array.Sort() the resultant array; now turn it back into a string and do GetHashCode() now.  (As a side note, we can worry about case-insensitivity later if desired by simply doing a String.ToLower() on the word before getting the hash code.)  Now anagrams will have the same hash code (Brian uses the term “canonical”, which I think is very fitting); it’s literally perfect in every way!

So now for the other dependency, I figure we could use a Dictionary<int, IList<string>>, where the key is the canonical hash code and the value is a string list collection representing an anagram set of words.  These two dependency implementations are so simple and straight forward that I have a hard time seeing a reason to continue breaking out full-fledged object dependencies with interfaces and Dependency Injection.  However, I will make a case for moving the canonical hash code logic outside this AnagramsGrouper class, but that will come later.

So let’s write some code to pass our test/spec:

public class AnagramGrouper : IAnagramGrouper
{
    public IEnumerable<IList<string>> FindAnagramSets(IEnumerable<string> wordList)
    {
        Dictionary<int, IList<string>> results = new Dictionary<int, IList<string>>();

        foreach (string word in wordList)
        {
            int canonicalHashCode = GetCanonicalHashCode(word);

            if (results.ContainsKey(canonicalHashCode))
            {
                results[canonicalHashCode].Add(word);
            }
            else
            {
                results.Add(canonicalHashCode, new List<string>(new[] { word }));
            }
        }

        return results.Values;
    }

    public static int GetCanonicalHashCode(string word)
    {
        char[] letters = word.ToCharArray();
        Array.Sort<char>(letters);

        return new string(letters).GetHashCode();
    }
}

Maybe you noticed already, but there is a bug in my logic.  At that time, I didn’t see it and needed to debug the code while running.  Setting a debug break point in your code doesn’t work like you’re probably used to, and this is because we’re running the MSpec tests via its console test runner.  I had to do some searching to figure this one out, and that’s what we’ll cover next.

How to Debug MSpec Specifications and the Code They Test

After a short ride on the Google super highway, I found a blog post containing a November 2009 comment from Aaron Jensen, author of MSpec, that gave a few possible solutions.  I opted to go with the Debugger.Break() solution, which I put at the very top of my FindAnagramSets method.  It is a tad awkward because it will show a Windows error dialog box that says the program encountered a problem and Windows is searching the web for a solution.  If you wait through that, it will give you an option to debug the program via a Visual Studio dialog where you select the currently running instance of Visual Studio as your debugger instance.  You just remove the statement when you’re done (and you’ll really want to remember to remove it from your production code).  Does the trick, though it is a bit obtrusive to your code base.

If you haven’t already figured it out, the bug is that my FindAnagramSets method is returning all anagram sets found in the dictionary.  An anagram set of just one word is not really an anagram set, which is why our test specification which is based on that assumption failed.  The fix is to filter out anagram word sets with less than two words before returning, like this:

// Return anagram sets with more than one word in it
return results.Values.Where(x => x.Count > 1);

That will make the code pass the tests now.  Remember to remove the debugger statement we added.

BDD, OOP…WTH?!

Just kidding.  But I do want to discuss a reaction I had to BDD after implementing this anagram finding solution.  After I completed the code and finished the tests, I was feeling pretty great.  But then I started thinking about how I didn’t really go down the path that I felt BDD was originally trying to lead me down with AnagramGrouper, which was to create two more dependencies, just as I had done for the first managing class AnagramsFinder.  I mean I felt pretty frustrated because I was digging BDD to this point.  I’m not quite sure what my original expectations were, but I might have thought BDD was magically going to right my algorithms for me as well.

I decided to discuss the whole thing with Esteban to get some perspective on the issue.  He told me he didn’t know a thing about BDD and how it’s supposed to work, but he figured BDD would lead me to decoupled, cohesive code design, but he couldn’t see how it would aide with algorithmic solutions to the problems at hand.  I’m not sure why I was so naïve, but it makes complete sense.  I mean these methodologies are designed to keep you disciplined to some proven, guiding principles, but you still need to use your brain and creativity to actually solve the problem.  I guess it would help to get some feedback and suggestions from you all in regard to knowing when it is time to switch from BDD architecture design mode to algorithm design mode within the broken down dependencies that handle the very specific sub-responsibilities.  Please let me know if you have found some guiding concept or question you’ve come up with that you put to the test in order to lead you into the right coding mindset for that particular context (design or algorithm; or do I even have the right categories?).

Perhaps it’s just that this design process feels so new to me.  Maybe I have solely been in the coding, algorithm mindset this whole time throughout my career.  Perhaps it will become more natural with practice and experience.  I guess it just feels like I’m learning to drive a manual, stick-shift car or something, and I am a little rough with the clutch when transitioning between gears.

To take this discussion further, Esteban made a comment in our conversation about his personal thoughts on these methodologies (like BDD) and Object-Oriented Programming.  He has come to believe that we invent these methodologies like flavors of the week “because as programmers we don’t really get OOP.”  Perhaps OOP just comes more naturally to his thought process that most people (or he admits it’s possible he could just be completely ignorant), but he feels that there’s really just a few simple rules and guidelines to keep in mind at all times in order to keep your objects behaving like real life objects and not acting like data constructs.  I encouraged him to give his theory some additional thought and make it a little more formal by writing it down.  He did this and the following is a link to the resultant blog post:

3 Simple Rules to Good Object Oriented Code

Please read through that post and comment on it.  His idea needs to be criticized and/or validated.  He admits he may have no clue what he’s talking about, but I said that’s why it needs to be put to the test by posting it in the open.  We both think it’s an interesting theory, but we’re unsure if it even holds any water.  Perhaps Esteban’s theory is too simplistic to be practical for many programmers.  Provide feedback and lots of it.

Summary

So I think I need to wrap this post up about now.  I think I have raised enough discussion points to write a book, but please do take the time to comment on what I’m doing wrong, what I’m doing right, or how to make the BDD mentality more flowing and natural.  I promised to talk about having a Word class as opposed to using primitive strings everywhere, and that will definitely be in the next post.  Also, I think our AnagramGrouper class may need some more testing and flushing out of how it should handle spaces, punctuation, casing, and so on.

Thanks for sticking with me thus far.

View Full Article

Anagram Code Kata Part 3 – Use Common Sense

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

In this post, we will discuss how far to take the idea of creating mockable contracts via interfaces for testing purposes and to keep responsibilities modular.  As I solicited feedback from commenters, I was getting hung up on applying this concept to every object and dependency.  We’ll talk about how I’m now thinking that overly strict adherence to the practice can drive you nuts and decrease productivity.  It is meant to help guide your design process along rather than hinder it.  We’ll finish this post off with one of our two dependencies implemented in an extremely simple manner.

IFileParser Dependency

When we left off last time, we had defined two separate responsibilities that AnagramsFinder was going to push down into its dependencies.  As I started to write a specification test for the newly created NewlineFileParser, I realized it would be difficult to stub based on providing a file path to a text file as the only parameter.  I began having thoughts that maybe I could break up another pair of dependencies for file parsing, namely obtaining a file stream (or file contents string) from a path and then parsing the file contents into a list of multiple lines.  But then again, my first of the two dependencies is still dependent on the physical file system and I cannot get away from this unless I somehow mock up an in-memory file system.  This began to feel silly to me as I was hoping I would just use the built-in .NET APIs for file systems and string parsing.

After emailing Dave and Esteban asking for guidance, the revelation I received via these two fine gentlemen is that file parsing is not an essential purpose of this application nor exercise (at least not yet; we won’t be focusing on performance unless it appears to be grossly deficient as we go along).  What we could do is write integration tests instead, where we have a set of simple test files sitting in a folder that we run our NewlineFileParser against to ensure it working as we expect.  But I don’t think I’m even going to do this at this point.  I really do want to get to the core of the application.  Plus, I feel certain that I can trust the implementation of File.ReadAllLines() from the .NET Framework.

Unless it needs to be changed further down the road, the following code implementation will likely be sufficient for the remainder of this coding exercise:

public class NewlineFileParser : IFileParser
{
    public IEnumerable<string> ExtractWordListFromFile(string filePath)
    {
        return File.ReadAllLines(filePath);
    }
}

I feel great about this because I haven’t gone insane trying to mock and test everything in sight, nor have I broken up every single responsibility in such small chunks that it becomes tedious.  Instead, I have broken up the two core responsibilities of AnagramsFinder which means I can change and test the two dependencies independent of each other and from their manager class.  Even though I haven’t written a ton of code yet, I feel the tests specs have driven my design into a cohesive, decoupled design that should prove flexible if requirements change or if I encounter friction later on and need to rethink my design.  A flexible architecture for the application is being flushed out based on a “top-to-bottom” design view of the requirements.

Create Stubs or Assert Methods were Called

Tobias Walter left a comment on my last post with some suggestions that I would like to get clarification on and discuss further with my growing audience of three now.  He compared some of my implementation to Dave’s implementation on his Favour test driving logic over data post, but the comparison was lost on me.  It seemed Tobias was saying that I had gone too far breaking up the responsibilities into such small dependencies.  I am unsure what his argument was, but as far as I could tell both Dave and I broke up a managing class into two separate responsibilities so that the more crucial of the two could be tested independently.  Also in doing so, we both created stubs that were given specific dummy responses to method calls using Rhino.Mocks.  One more comment I would like to say is that I broke the file parsing responsibility away into its own dependency more because of the need to isolate the anagrams grouping than because file parsing is of itself a difficult task.  If Tobias or anyone else could try to clarify his argument, that would really help me understand the merits of his suggestions.

Another suggestion that Tobias brought up that I would like to petition feedback on is to use the mocking feature of testing whether a method was called instead of creating a stub programmed with a dummy response.  I have to admit, I’m not sure when to use one method over another.  I guess I’ve felt in the past that testing whether a method was called seems silly, since it seems odd to have your test know the implementation and inner method calls of the subject under test.  But I am now realizing that my stubs appear to be doing the same thing, but with dummy input and output programmed in.  If you have an opinion or experience on this subject, please leave some feedback in the comments of this post.

What’s Next?

In my next post I hope to summarize this mocking debate brought up above regarding two differing methods of verifying my manager class is correctly managing its dependencies.  I also hope to begin driving out the design and implementation of the AnagramGrouper dependency via behavior specification tests.  Thanks and any feedback is greatly appreciated.

View Full Article

Anagram Code Kata Part 2 – Mocking and SRP

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

In this post, we will rewrite our first specification/test from the previous post, where we didn’t feel very confident about the direction we were heading.  We got some great feedback and will now better focus on test driving logic and not just data; we will also be more careful in extracting the requirements from problem statement and in giving our classes a Single Responsibility (the first of the SOLID principles of Object-Oriented Design).  As we are test driving our design from a “top-down” perspective, we will encounter the need for dependencies not yet implemented while creating the higher level classes.  Instead of halting our rhythm and progress, we will utilize a mocking framework to stub out non-existent implementations according to interfaces.  This also encourages us to use Dependency Inversion (last of the SOLID principles; same link as above) and we can prove the validity of our modules by testing them in complete isolation from their dependencies.

Don’t Focus on Data (Yet) and SRP (Single Responsibility Principle)

I had a feeling I was focusing too much on testing data, and I also knew it seemed to odd to be writing my very first test to be dependent on solving a specific instance of the problem (especially a test case with thousands of words in it), instead of working on solving the problem generally.  After some terrific feedback from David Tchepak in the comments of my last post, I now understand that wasn’t my only mistake.  I was also trying to give my AnagramsFinder class too many responsibilities (especially with just one ParseTextFile method), namely extracting the words out of the text file and then grouping the anagrams together.  Not only that, I also was returning a count of anagram sets when the problem statement specifically asked for output of the anagram sets themselves.  Even though this is a simple problem scenario, it can be foolish to make assumptions or simplifications like I did that are not part of the requirements.

Let us first focus on the responsibilities required of our AnagramsFinder class.  It needs to parse a list of words out of a text file given the file path and it also needs to group words together into sets that are anagrams of each other.  However, we just named two responsibilities; implementing both in the same class would make it less cohesive and therefore less maintainable.  This is because our class would have more than one responsibility that could change in the future, and because these responsibilities are in the same class, they could be considered coupled and one could be affected by modifications to the other.  All of this train of thought falls under the Single Responsibility Principle from the SOLID principles (see links at the beginning of the post).

To solve this predicament, we will make AnagramsFinder a managing class of dependencies that individually solve these separate concerns.  Each of these dependencies will adhere to the Single Responsibility principle, as does this higher abstraction manager class.  Its responsibility can be summarized as managing the inputs, outputs, and execution order of its dependencies.  We shall name the dependencies directly after each of their responsibilities, namely FileParser and AnagramGrouper.  However, I don’t want to go implement these dependencies and throw off the flow of fleshing out the design and logic of my manager class.  We are trying to design from a more “top-down” approach, instead of focusing on the lower level data concerns of the solution too early in the design process.  To accomplish completing the design of our manager class without actually implementing the dependencies, we will code to interfaces (namely IFileParser and IAnagramGrouper).

DIP (Dependency Inversion Principle)

The advantages of using interfaces are that we abstract out the actual implementation of the dependencies, allowing us to be more modular and less coupled between dependencies.  One class is not strongly tied to a specific implementation of its dependency, but rather the general contracts made available by the interface’s defined method signatures.  What this really means is that we can swap out actual implementations of the dependency without the consuming class being modified in the least bit.  This makes for truly maintainable code, especially when requirements change down the road after version one of the application is up and running.  This is what the Dependency Inversion Principle (again, from SOLID) is all about.

We can take this a step further by passing the necessary dependencies into our consuming class at construction.  This allows for us to “inject” into the object any implementation of the interface we see fit; this technique is aptly named Dependency Injection (DI).  Our manager class doesn’t have to fret about any concrete details regarding its dependencies’ actual implementations, nor about any of the construction ceremony associated with “newing” up and initializing said dependencies.  We won’t be using any DI containers (as our dependency needs are very light in this exercise), but the use of the Dependency Inversion Principle does set us up nicely to use a mocking framework so that we can finish fleshing out our tests without implementing any of the dependencies yet.

Mocking Dependencies

We would like to finish the design of our top level AnagramsFinder manager class, but not fully commit to how its dependencies will be implemented yet.  We have a general idea of how we want our consuming class to interact with its dependencies via interfaces.  Let’s go ahead and look at our interfaces we named earlier:

public interface IFileParser
{
    IEnumerable<string> ExtractWordListFromFile(string filePath);
}

public interface IAnagramGrouper
{
    IEnumerable<string[]> FindAnagramSets(IEnumerable wordList);
}

Now, we could create concrete implementations of the interfaces to be used in our tests that return simple dummy results.  An example could look like this:

public class TestFileParser : IFileParser
{
    public IEnumerable<string> ExtractWordListFromFile(string filePath)
    {
        return new[] { "wordA", "wordB", "Aword", "wordC" };;
    }
}

This is helpful because it keeps the one class we are interested in from being dependent on any real logic in its dependencies.  We want to be able to test our modules in isolation from any dependencies, so that no secondary or outside influences skew the test results.  We are basically providing dummy dependencies to our tests that have no logic that could cause side effects.

However, the software development community has provided tools called mocking frameworks that can create these dependency stubs for you.  This can save you from creating concrete interface implementations that have no use except for in testing.  The mocking framework I will try out is Rhino.Mocks from Ayende.  To get set up, you merely need to download the latest build zip archive, extract out the Rhino.Mocks.dll library to our Libraries folder created in the last blog post, and add a reference to the DLL in our Visual Studio solution.  I will next show you how to use Rhino.Mocks as we rewrite our first specification test.

Rewrite of Our First Specification/Test

Armed with SRP, DIP, mocking, a more strict adherence to the actual problem statement, and a renewed focus on “favoring test driving logic over just testing data,” we now rewrite our first specification as follows:

[Subject(typeof(AnagramsFinder), "Finding Anagrams")]
public class when_given_text_file_with_word_on_each_line
{
    static AnagramsFinder sut;
    static IEnumerable<string[]> result;
    static string filePath = "dummy_file_path.txt";
    static IEnumerable<string> wordListFromFile = new[] { "wordA", "wordB", "Aword", "wordC" };
    static IEnumerable<string[]> expected = new[] { new[] { "anagram", "gramana" } };

    Establish context = () =>
    {
        var fileParser = MockRepository.GenerateStub<IFileParser>();
        fileParser.Stub(x => x.ExtractWordListFromFile(filePath)).Return(wordListFromFile);

        var anagramGrouper = MockRepository.GenerateStub<IAnagramGrouper>();
        anagramGrouper.Stub(x => x.FindAnagramSets(wordListFromFile)).Return(expected);

        sut = new AnagramsFinder(fileParser, anagramGrouper);
    };

    Because of = () =>
    {
        result = sut.ExtractAnagramSets(filePath);
    };

    It should_result_in_list_of_anagram_sets = () =>
    {
        result.ShouldEqual(expected);
    };
}

I should mention a few notes about the code.  First, we need to add a using Rhino.Mocks statement at the top of our specification class file.  Second, our AnagramsFinder instance variable named sut stands for the “Subject Under Test,” a mannerism I picked up from David Tchepak, whom I’ve mentioned several times throughout this post series.

Our use of Rhino.Mocks is found when we call the MockRepository.GenerateStub method against an interface.  We then proceed to tell the stubbed object how to behave by specifying dummy return values when given methods of the object are called with given parameters.  The last line of our “context” setup is to then inject these two newly generated, stubbed dependencies into our manager class for testing.  It’s also interesting to note that none of my parameters and expected outputs really make much sense.  This is done on purpose to show that the data really doesn’t matter to this class, as we are not testing the logic of any data processing by the dependencies (remember, test in isolation).  It is true that this test really isn’t testing any real logic at this point, and it may even never evolve into testing any real logic either.  However, I think the important point is that it aided us in fleshing out the design, which we previously didn’t know how we were going to implement.  Not all tests will created equal in regard to validating our logic, but they all will play a part in driving the design of our code.  Hopefully these statements are true, and I would love to hear feedback on this topic in the comments.

I also want to touch on the Specification pattern of Behavior Driven Design and testing.  Specification tests are commonly set up into a workflow of context establishing, behavior performing, and results asserting.  The MSpec framework is designed to encourage this test organization.

Red, Green, Refactor and MSpec Test Runner Output

To make our first test run, we need to create the basic outline of our AnagramsFinder class as outlined in our test.  This includes a constructor that takes our two dependencies and an ExtractAnagramSets method:

public class AnagramsFinder
{
    private IFileParser fileParser;
    private IAnagramGrouper anagramGrouper;

    public AnagramsFinder(IFileParser fileParser, IAnagramGrouper anagramGrouper)
    {
        this.fileParser = fileParser;
        this.anagramGrouper = anagramGrouper;
    }

    public IEnumerable<string[]> ExtractAnagramSets(string filePath)
    {
        throw new NotImplementedException();
    }
}

I am using the ConsoleRunner according to the method described in Rob Conery’s introductory MSpec and BDD post, including output to an HTML report.  After running our specifications, we get the following output:

Specs in AnagramCodeKata:

AnagramsFinder Finding Anagrams, when given text file with word on each line
¯ should result in list of anagram sets (FAIL)
System.NotImplementedException: The method or operation is not implemented.
   ... Stack Trace here ...

Contexts: 1, Specifications: 1
  0 passed, 1 failed

…and the HTML report:

htmlReportError

A highly encouraged tenet of Test Driven Development is the practice of “Red, Green, Refactor.”  It is meant to denote the evolution of the results and state of your tests.  You are encouraged to write the test and then do the minimum work necessary to get the code base to compile and run.  You are first running your test in a control state where you know it should fail.  Most test runners will show the color Red in regard to failed tests, and thus the name of the first step.  Your next stage is to implement the code to make the test pass and turn the output color to Green, commonly indicating passing tests.  The Refactor stage is a time to pause and see if any code can be reorganized or simplified.  Then “rinse and repeat as necessary”, an applicable instruction from shampoo bottles.

To make our test pass, we implement real logic to coordinate the calls into our dependencies, like so:

public IEnumerable<string[]> ExtractAnagramSets(string filePath)
{
    var wordList = fileParser.ExtractWordListFromFile(filePath);
    return anagramGrouper.FindAnagramSets(wordList);
}

…and the successful outputs of our test runner:

Specs in AnagramCodeKata:

AnagramsFinder Finding Anagrams, when given text file with word on each line
¯ should result in list of anagram sets

Contexts: 1, Specifications: 1

 

htmlReportSuccess

Final Thoughts and Questions

So what do you think of the new test?  Are we better able to see the Single Responsibility of the AnagramsFinder manager class?  Are we more inline with solving the actual problem statement?  Your feedback would be much appreciated.

In the next post, we will begin work on test driving the design and implementation of the dependencies…unless of course I get feedback that I should be focusing on a different area first or that this last rewrite still needs more help.

View Full Article

Anagram Code Kata Part 1 – Getting Started

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

This post will provide resources that serve as an introduction to the MSpec BDD testing framework.  We will also set up our Visual Studio solution, explain the Anagram Code Kata problem statement, and write our first specification and test.

Machine.Specifications (MSpec)

The author of this BDD testing framework is Aaron Jensen of CodeBetter.com.  Probably the best introduction to the framework (and lightly touching on general principles of BDD) is Rob Conery’s post entitled Make BDD Your BFF.  MSpec is hosted on GitHub using the Git source control software (and here is the link for the repository again).  Download the code and build it to get the necessary libraries that you need into a Bin folder.  My MSpec libraries are a few months old, as I haven’t upgraded my source since my initial download back then.  I have not set up TestDriven.NET and am not sure it will be necessary for this simple coding exercise.  I am using NUnit for the actual testing framework underneath MSpec; the following article may be of interest to you if you would like to use a newer version of NUnit than what MSpec targets:

Using Latest NUnit Version with MSpec

MSpec helps you organize your code into Specifications, a core BDD concept.  It is inspired by early BDD testing frameworks like Ruby’s RSpec.  The syntax and arrangement can be a little odd for a static language like C#, but I think I like the direction it is heading.  Again, the best way to get familiar with the syntax, set up the MSpec Test Runner, and configure HTML reporting output is to read Rob’s introductory post on BDD and MSpec.

Setting Up the Visual Studio Solution

Let’s begin by creating a C# console application named AnagramCodeKata targeting .NET Framework 3.5.  The solution name (and solution folder) will be the same.  For our dependencies on MSpec (and NUnit), I have created a physical file system folder named Libraries and made it a sibling folder to the project’s physical folder (same level as the solution file).  That folder contains the following DLL libraries from the Bin folder of the compiled MSpec source code:

  • CommandLine.dll
  • Machine.Specifications.ConsoleRunner.exe
  • Machine.Specifications.dll
  • Machine.Specifications.NUnit.dll
  • Machine.Specifications.Reporting.dll
  • nunit.framework.dll

Then within our Solution Explorer, we only need to make references to 3 of these libraries (namely Machine.Specifications.dll, Machine.Specifications.NUnit.dll, and nunit.framework.dll).  I have also created two folders within our project in Solution Explorer, naming them Reports and Specifications.  The Reports folder will contain the HTML-generated report of running our tests every time.  The Specifications folder will hold our specifications and tests.  This is what our current Solution Explorer looks like now:

anagramScreenshot

Also notice the MSpecRunner button at the top, which I’ve created to execute a custom External Tool that points to Machine.Specifications.ConsoleRunner.exe and passes in the appropriate parameters.  This helpful setup was put forth by Rob Conery in the link to his introductory blog post that I provided above.

Anagram Problem Statement

So let’s go ahead and look at the Anagram problem we are trying to solve.  Here’s a quick summary of what we are trying to accomplish:

“The challenge is fairly simple: given a file containing one word per line, print out all the combinations of words that are anagrams; each line in the output contains all the words from the input that are anagrams of each other.”

Unfortunately, Dave Thomas’ wordlist.txt link is broken on his blog post, and when responding to commenters asking for a fixed link to the text file, he pointed us all to some website that aggregates links to several word lists from around the Web.  Many of us readers wanted to know the exact word list file he used so that we could compare our results to his.  In his post, he states that there should be 2,530 sets of anagrams and 5,680 total words participating in those anagram sets.  The tough part about this is that we’ve got to have the same word list text file.  I searched the web and found a few, but I only found one that might be the same list.  It is from some guy’s personal GitHub repository:

http://github.com/krist0ff/code_kata/tree/ba975c3a64f11a81db2f3716b40de046f1ca7ef4/kata6

If you’re visiting this blog post, months or years after it was originally written, I obviously can’t guarantee this file will still be out there.  You can try emailing me or commenting here and I can hopefully send it to you via email if I still have it.  But more importantly, as we progress with this series, we’ll find out if it’s even the right word list or not.

Lastly, I’d like to look at the objectives for the coding kata that Dave Thomas outlines:

“Apart from having some fun with words, this kata should make you think somewhat about algorithms. The simplest algorithms to find all the anagram combinations may take inordinate amounts of time to do the job. Working though alternatives should help bring the time down by orders of magnitude. To give you a possible point of comparison, I hacked a solution together in 25 lines of Ruby. It runs on the word list from my web site in 1.5s on a 1GHz PPC. It’s also an interesting exercise in testing: can you write unit tests to verify that your code is working correctly before setting it to work on the full dictionary.”

I don’t think we’ll focus on runtime as much, but we’ll certainly revisit our algorithm as necessary if we have unacceptable wait times for the code to finish.  To complete this section, let’s go ahead and add the wordlist.txt to our solution at the root of of our Project node.

How About Some Actual Code?!

So let’s now start off by creating our first specification.  I have created a new C# file named AnagramsFinderSpecs.cs under our Specifications folder in our solution.  I attempted to write our first test, but already feel like I’m not inline with “favoring test driving logic over just testing data.”

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using Machine.Specifications;

namespace AnagramCodeKata.Specifications
{
    [Subject(typeof(AnagramsFinder), "Finding Anagrams")]
    public class when_given_text_file_with_word_on_each_line
    {
        static int result;
        static AnagramsFinder sut;

        Establish context = () =>
        {
            sut = new AnagramsFinder();
        };

        Because of = () =>
        {
            result = sut.ParseTextFile("wordslist.txt");
        };

        It should_result_in_count_of_2530_anagram_sets = () =>
        {
            result.ShouldEqual(2530);
        };
    }
}

If you can’t make out the funky structure and syntax, again refer back to Rob Conery’s introductory post to all of this stuff that I link to at the top.  He does a great job of explaining what’s going on here.

Also, please remember that I have no clue what I’m doing and I’m just trying to copy people who know what they are doing.  I am hoping that I’ll be corrected by those who have a better idea of how to approach all of this.  This is a good stopping point because of my desire for feedback and the feeling of uncertainty I’m already getting.

Seriously, please guide me in the right direction or validate where I’m headed, because even I’m not sure where that is yet.  I will say this, I’m pretty sure I shouldn’t be reading in wordslist.txt yet, and that this might be where I need to start using a mocking framework.

Stay tuned, there’s more to come.  I just can’t tell what it will be about yet.  It’ll still be fun though, I promise.

View Full Article

Custom Setup Bootstrappers

At work, we have a line of desktop products for one of clients installed on the user’s Windows machine via MSI installers.  We use Visual Studio Setup Projects to create the MSI installers.  Visual Studio also provides the ability to create setup bootstrappers to make sure the user has all the prerequisites needed for the MSI install technology and any dependencies required by your desktop product.  These bootstrappers are unmanaged code, because you can’t always depend on the .NET framework already being there (at least not until everyone is off of XP and lower).  I have developed a solution that has worked for the last year or more, but has felt clumsy and slapped together.  Recently I found the open source project called dotNetInstaller which has better provided me with the control and flexibility I need for custom logic during bootstrap and install time.  I’ll first introduce you to my “hack” solution.

It has been difficult for me to find ways to tie into the bootstrapper logic workflow to add in custom logic during installs, and Visual Studio is very rigid in its flexibility and basic in its configuration options in this regard.  Until recently, I had settled on writing managed, C# code for this custom logic meant to run before or after the MSI installer, and then creating another unmanaged setup.exe bootstrapper to ensure the .NET framework was installed just so that executable could run.  So the user would run this new unmanaged setup.exe bootstrapper (making sure .NET framework was installed), which would then call my managed custom logic executable, which would then call the MSI installer’s unmanaged, Visual Studio-generated setup.exe bootstrapper (ensuring more prerequisites were installed), which would then finally install the MSI, and which would then lastly return to my managed custom logic executable for any post-install, clean up logic.  I describe this solution in full (and how to generate the same bootstrappers that Visual Studio creates using MSBuild outside the IDE) in a StackOverflow.com answer that I submitted:

http://stackoverflow.com/questions/1106768/how-can-i-reverse-engineer-an-installer-that-was-written-with-ghost-installer/1121846#1121846

Well, recently I found the dotNetInstaller project on CodePlex.com and have found it to be the control and flexibility I need for custom logic when bootstrapping installers.  Not only does it give you great control over prerequisite install order, it also gives you the ability to run code after MSI install and to also filter on CPU architecture types, OS versions, and other registry or file system conditions.  I have currently setup branching scenarios for the prerequisites based on if the user has a 32-bit (x86) or 64-bit (x64) OS.  This utility will even let you brand the bootstrapper with a banner and icon, which is a change of pace from the plain and unlabeled bootstrapper that Visual Studio and MSBuild produce.

If you have custom bootstrapping needs, I have to highly recommend you give dotNetInstaller a look.  Hopefully it will prove as flexible and powerful for your needs as it has for mine.

View Full Article

Anagram Code Kata – BDD & MSpec

Many forms of martial arts utilize the kata as a means for detailed, focused practice and training.  The power of this practice comes from repetition and focus on simple fundamentals, “while attempting to maintain perfect form.”  The principle goal is to “try out different combinations of techniques in a safe, practice environment to ultimately find out how to defeat your opponent”.  In the last few years, this concept has taken hold within the software development world in the form of Code Kata, a phrase coined by Pragmatic Programmer Dave Thomas (and here is his blog dedicated to the Code Kata).

My inspiration for all of this is David Tchepak’s Calculator Kata series of posts.  Hopefully he will stop by and provide some helpful guidance and advice.  I have selected the Anagram Kata (code kata number 6) for me to practice.  My goals for this series of posts are:

Below I will keep an updated list of blog posts in this series.  Hope you enjoy it!

View Full Article

Trying Out SyntaxHighlighter

This is just a test post to see if I'm smart enough to add the javascript SyntaxHighlighter to my blog. ;)

public int ScoreDice(int[] selectedDice)
{
    int score = 0;

    foreach (IScoringCombo scoreCombo in ScoringCombos)
    {
        IEnumerable<int> satisfyingIndices = scoreCombo.SatisfiedBy(selectedDice);

        if (satisfyingIndices.Count() > 0)
        {
            score += scoreCombo.ScoreDice(selectedDice);

            // Remove the dice at satisfyingIndices from selectedDice collection so not double-scored
            selectedDice = selectedDice
                .Select((value, index) => index)
                .Except(satisfyingIndices)
                .ToArray();
        }
    }

    return score;
}
View Full Article

Web Deployment Tool

It appears that the IIS team at Microsoft has officially completed a 1.0 version of the deployment utility I mentioned at the end of my previous post about Visual Studio Web Deployment Projects almost two years ago.  It includes a command-line tool named MSDeploy that helps with deploying and syncing web projects across all of the web servers in your server farm.  The installation also includes all kinds of packaging and deployment features that seem really handy.  You can even do migrations from IIS 6 to IIS 7 or package your apps to be distributed via the Microsoft Web Platform Installer.

I thought that maybe the tool creates MSI installers, but it appears the package may actually just be a ZIP file that you can import and export through the IIS Manager or various other command-line tools or PowerShell Cmdlets.

Here is the page where you can read more about the toolset and download the installer:  http://www.iis.net/extensions/WebDeploymentTool

View Full Article

Variable Declaration Restrictions

I came across a surprising restriction on variable declarations that I thought I would share.

My own investigation into C# was prompted by an answer given to a Stack Overflow question regarding an odd Objective-C compiler error.  The answer seemed to allude to a language specification somewhere that doesn’t allow variable declarations in one-liner embedded statements of control structures (loops, switch statements, if-else statements, etc.).  Not only that, Objective-C wasn’t unique in this restriction; it appears Java and even C itself have had this restriction for quite a while as well.

Naturally this got my curiosity going and I decided to see if C# had also carried over such a restriction.  The answer is yes; it appears that Java, C#, and Objective-C have all kept this language restriction from their common ancestor C.  Let’s get right to what it looks like:

code_screenshot

The red-underlined errors read “embedded statement cannot be a declaration or labeled statement” and the green-underlined warnings read “the variable ‘i’ is assigned but its value is never used.”  You’ve probably seen the warning quite a few times before, but the compiler error really caught me by surprise.  I mean it makes sense that declaring a variable in the one-line body of a control structure is pointless, as it would immediately go out of scope (and usefulness) after declared.  But I thought it odd for the compiler to actually deny such a program statement, as if it put the compiler in an invalid state or maybe it was somehow dangerous.

One thing I do want to point out that appears to be different from the other languages (or at the very least Objective-C) is that the variable declaration is allowed in the switch statement, as illustrated in the image above.

To figure out why this isn’t allowed, I decided to actually look into the C# 3.0 Language Specification to see if there were any comments or explanations regarding this issue.  I ended up finding the C# language grammar in Appendix B.  I was able to parse through the grammar and find this restriction in the grammar itself.  Take a look:

B.2.5 Statements

statement:
labeled-statement
declaration-statement
embedded-statement

embedded-statement:
block
empty-statement
expression-statement
selection-statement
iteration-statement
jump-statement
try-statement
checked-statement
unchecked-statement
lock-statement
using-statement
yield-statement

block:
{ statement-listoptional }

statement-list:
statement
statement-list statement

...

selection-statement:
if-statement
switch-statement

if-statement:
if ( boolean-expression ) embedded-statement
if ( boolean-expression ) embedded-statement else embedded-statement

switch-statement:
switch ( expression ) switch-block

switch-block:
{ switch-sectionsoptional }

switch-sections:
switch-section
switch-sections switch-section

switch-section:
switch-labels statement-list

switch-labels:
switch-label
switch-labels switch-label

switch-label:
case constant-expression :
default :

...

You’ll notice that there is the concept of a statement and an embedded-statement; also note that statement is a superset of embedded-statement, adding two other types of statements: labeled-statement and declaration-statement.  If a particular grammar rule requires an embedded-statement, then those two latter types of statements are not allowed in that context.  If we wanted to allow those two statement types, our grammar rule should specify the generic statement category instead.

Also noteworthy is that an embedded-statement can resolve to a block, which includes braces around a statement-list.  A statement-list is one or more statement entities, which would now allow us to use the two statement types excluded from embedded-statement.

Something interesting that is not included in the grammar snippet above, the following statement constructs (and language keywords) require the block statement construct, bypassing the option of a grammar resolution to embedded-statement directly and allowing you to leave off surrounding braces: try, catch, finally, checked, and unchecked.

Almost all of the other statement constructs allow you the option of a one-liner embedded-statement (or which can be resolved into a block), including if, if-else, while, do-while, for, and foreach.  The one surprising exception is the switch construct, which allows for a statement-list after each switch-label.  As a side note, this allows you to put useless braces (via block) around your statement-list (or anywhere inside it) after any of your switch-label statements if you wanted.  But back to the point, these facts explain the existence of (and in one case, lack of) compiler errors regarding embedded statements in the code screenshot near the top.

Now you will know exactly what the compiler means when it gives you the error that an “embedded statement cannot be a declaration or labeled statement.

View Full Article