Hi. This is Mike from Software Confidence. Welcome to another episode of Software Success Disciplines.
In the previous episodes we put together a simple web application and got it under source code control.
In this episode we're going to introduce continuous integration. Its an automated way of assessing the health of our check-ins to subversion, so we'll soon be aware if we introduce a compilation error, or break a test.
This is a great service to yourself - because it lets you relax as you evolve your software - and also a great service to your team, because it helps make each team member responsible for what they commit.
We start by looking at how continuous integration fits into the software development process.
Because continuous integration works hand in hand with source code control, I'm going to set the scene by showing how source code control fits into my day to day workflow as a software developer.
Usually, it all starts with a requirement statement of some sort, written on an index card. Say I pick up a card stating that a logging capability must be added to the say method in the hello world project.
Fine, I say, no problem - the next thing to do is to have a cup of tea….with my nan.
Then I open up a terminal window, and check out the trunk of the hello world project.
Open the project in my favourite text editor and get ready to make the change.
Usually I'd start by writing a test to capture essence of the required change, and I suggest you should too, but I'll skip it in this case, because what I'm trying to do here is establish the position of continuous integration in the software development process, not demonstrate the perfect process from beginning to end.
Having wriggled out of that one, I'm now going to go ahead and just make the required change.
I know from previous projects that I can use commons-logging for what I need to do, so I download the jar … and place it in the appropriate lib directory - production in this case because its going to be used by production code.
Then I write the necessary code, using the LogFactory class, and write out an information message that the say method was called.
After making a change, I discipline myself to run the tests - and I find I have a compilation error. I'm being told that the compiler doesn't know anything about this commons logging LogFactory class. Clearly I need to make the commons-logging.jar available to the compiler when its compiling the production code, so I check the compile target in the ant file to see what its classpath is.
Well, when I look at the compile-production target, I see the javac task used there doesn't have a classpath at all, so its no wonder I have compile errors. Adding a classpath is straightforward - a nested classpath element, containing a path element - copy it, paste it, and get it to point to the production jar files.
Now, I can try again….and I get past the compilation step, but the tests fail. I check what the junit report tells me by looking at it in firefox….one failure….which is a noclassdeffound for my new commons logging class.
So back to the ant file I go again….and return the compliment to the unit tests target by pasting in the path to the production libraries. As an aside, the smell of duplication is beginning to intensify in this build file now, so I may have to look at that in the near future. I'll make a note of it on an index card, so I can be sure to come back to it.
Right - now I re-run the tests…and we're green.
Its time to check in to subversion. svn ci - ci stands for check in - with an appropriate comment using the minus m clause …. adding logging ….press enter … and I get confirmation that my changes were committed to the server. On the blackboard, I'll summarise what I just did, like this.
Now imagine John is working away on the same code. This terminal window represents John at work.
At some stage after I checked in the commons logging changes to subversion, John tells the server to give him any changes that other people made. This is called updating, and its not something we have covered yet, but we should still be able to follow along with what John did. He would have issued the svn up command - up stands for update, and it brings his working copy up to date with the latest changes made in the repository.
I'll show you where to go if you want to read up on all the various subversion commands.
Point your browser to http://svnbook.red-bean.com/
This is an online version of an O'Reilly book on subversion.
Scroll down a little and you'll see that there is a released version dealing with subversion 1.5, and an in-progress version dealing with 1.6.
1.5 is fine for our purposes, so click on the multiple page html edition.
There is a very extensive table of contents and you're more than welcome to browse through it if you wish, especially if you're on vacation at the beach, but I'd like to get us down to chapter 9 as quickly as possible - because I'm not on vacation at the beach.
Chapter 9 is the subversion command reference section, and at the very bottom of this list you'll find svn update.
Click on it - and you get a nice summary of what its about.
I'll draw your attention to these letters here - A B D U and the rest. Don't try to remember the letters, just remember that they're mentioned here, but note especially that U means Updated.
Before that detour, we were sitting with John, watching him do his subversion update. Alright John, show us what you did.
You can see the changes to the build file and hello world.java coming in. Both changes are prefixed by the capital letter U. I drew your attention to this letter two seconds when we were reading the subversion book - so subversion is telling us that these two files are being updated. We expected this, because we watched me update them and check them in.
Now what happens if John runs his tests - he gets a failure - the same failure I got earlier when I didn't have commons-logging on the compile classpath. But why is that?
I'll cut straight to the chase and tell you that his lib/production directory doesn't contain commons logging.jar. Thats because adding a file to a working copy, as I did by copying commons logging.jar into lib/production, is not sufficient to have subversion manage it for you.
Lets go back to my original working copy for a minute. Now time for another subversion command. svn st. - st. stands for status - when I run it, I see a question mark beside commons logging.jar.
The question mark means the file is not managed by subversion, and is only sitting here on disk in my working copy. I need to tell subversion to manage the file for me, and I do that using the svn add command. svn add lib/production/commons-logging.jar. And now again I need to check in, with an appropriate comment…and I see confirmation that the jar was committed.
Putting on John's hat again, I do a subversion update with svn up, and now I can run my tests successfully.
Back at the blackboard, I'll sketch what happened, with the check in that caused the problem represented as a red line that flows through to John's working copy. Note that as far as mike is concerned, all is well with the world while John's working copy is broken.
As it stands, this picture doesn't seem like an especially bad problem, because John can be sure when he inherits a breakage after doing a subversion update, the mike was the cause. But what if more people were involved? What if 4 or 5 other people had committed changes before John did his update? John knows that something was wrong with one of those check ins, but not which one specifically, so he has to go on a hunt to find out, potentially debugging all kinds of unfamiliar code, and then inform the originating developer so they can fix it.
This scenario is an especially annoying way to develop software, and can cause disharmony on teams, as you open up the possibility of that software development classic "well, John - it works fine on MY machine".
Continuous integration helps get around this "it works on my machine problem" by setting up an automated team member whose single responsibility is to check out every check in, and make sure it compiles and passes its tests and can be deployed. It does this as soon as a check in is made, so problems are caught early, while they're still fresh in the minds of those who introduced them. Because its an automated team member, a machine in other words, it has an independent impartial view, so any temptation we may have to say "it works on my machine" is pretty much eliminated.
It works like this. Setup a build server, that represents the automated team member. The build server monitors for check ins to the source code control server. When it finds one, it checks out the code, compiles it, tests it, and makes whatever other kinds of checks we have configured it to do. At the end it produces a report. Notifications of build breakages can be issued in all kinds of imaginative ways: sending emails, systray notifications, making big, highly visible screens red, changing the colour of lava lamps, electrocuting people. Thats my favourite - electrocuting people.
The point here, is that in this case, when Mike checked in a broken build, the build server identified it very soon afterwards, and there was no doubt that Mike's check in introduced the problem. Usually this means that on receipt of notification from the build server, in whatever form it comes, Mike will go ahead and fix the problem, before other team members waste time checking what went wrong and who was responsible.
So the real benefit of continuous integration is that we are always aware of the health of the software checked into subversion, and when its unhealthy, we get a chance to act on it early.
As with source code control, there are many options to choose from when it comes to build servers. I'm going to avoid that discussion entirely, and just go with Jetbrains Teamcity. It comes in two flavours. Professional is free and supports up to twenty builds, I think, and Enterprise is not free and is pretty much unlimited in terms of number of projects.
Choose a directory to install team city in. I'm going to put it in tools. decompress the gzip archive, then untar the tar archive. This will create a TeamCity directory for you. There are several subdirectories and files in here, but we'll ignore them for now and go and see whats in the bin directory. Again lots of files, some of which are part of a tomcat installation, judging by the catalina word, but we'll focus just on the teamcity server shell script. On windows you'll use the dot bat file.
If we run the script it tells us that it expects to be told whether to start or stop the server. Lets start it and see what happens. It logs to a directory above us. Here you can see a mixture of standard tomcat log files, and some team city specific log files. Lets see what teamcity has to say for itself. Lots of stuff scrolling by, but I can just about make out the magic words server started - about five lines from the bottom. That should mean we have something to look at in our web browser.
Teamcity listens on port eight triple one, and because this is the first time through, we have to accept the license agreement.
The server requires a username and password, I'm just gonna use teamcity for both, for the sake of this demo, but you may make the decision on your team to use personal ids so you can track who did what to the server.
The next step is to tell teamcity about our project. First we state what the name is - hello world, then we tell it how to build the project by adding a build configuration.
Whats the difference between a project and a build configuration, you may ask.
In teamcity terms, a project really isn't much more than a name. Each project can have one or more build configurations, and each build configuration express what we want to build and how we want to test it. You may have one build configuration for unit tests, which runs very quickly and gives you early warning of build breakages. You might have another that runs your integration tests, and might be a bit slower, and a third that runs your acceptance tests and is slower still.
We've really only got one build configuration suggesting itself at the moment. We're just going to build trunk, so hello world trunk is a good name to start with. The rest of the settings can be left defaulted for now, and we can move on to telling it about our source code control server.
The first thing we have to do is define where our server is located, by creating and attaching a vcs root. As mentioned earlier, there are many source code control systems out there, so we start by telling teamcity which one we're using - subversion.
After a second or so, teamcity updates the page, ready to take all the subversion specific details. We give our vcs root a name - hello world trunk seems pretty descriptive, and then we enter the subversion repository url, which hopefully will look familiar from the import into subversion. You may need a username and password, but that will be the same as what you used when doing the subversion import. Test the connection.
If you get errors, hopefully teamcity will give you good descriptions of what went wrong, but the first couple of things to look at would be the subversion url, is it correct, and do you need to enter a username and password, and are they correct. In my case, the connection has been successful, so I can move on. We get confirmation that the vcs root has been created, so we go to the next step.
Now we have to say what tool we're using to build our project. We're using ant, which is the default, but as you can see there are many possibilities. We get a chance to say what the name of our ant file is, and again we're inline with the default of build.xml, so all that remains for us to do is state what ant target we'd like run for us. The only sensible one we have at the minute is ant war. The rest of the settings here are fine at their defaults for what we want to do. We may come back to some of them as our needs get more sophisticated, but until that time we confirm what we've got. Confirmation that the trunk build configuration was saved is presented at the top, and a run button has appeared towards the right.
I can never resist pressing a run button. Never. Oh, do I love run buttons.
Lets see what happens. We get a warning saying that there are no agents capable of running this build configuration.
Lets step out of this for a minute and talk about agents. This team city process that we've been interacting with in firefox for the last couple of minutes is the teamcity server. Its a central process that knows about what we want to build - what subversion repository, what ant target to call and so on - the build configuration. When it comes to actually running a build, the server delegates the work to another process, and this process is called an agent. Its the agent that does the work of compiling and testing our code, under the instruction of the server. Its a two way conversation, as the agent updates the server on progress and the final result.
There can be any number of agents … running on any number of machines on our network … allowing us to assemble whats known as a build farm. You can scale the build farm to the size that your particular project needs - within reason.
So agents are a pretty essential part of the teamcity story, and right now we need one. Because we have one build in the build queue on the server … and zero agents.
If we go back to the command line for a moment, and look again at what came in the team city installation, we see the invitingly named directory build agent. it contains some files and subdirectories, but as usual we make a bee line for the bin directory.
agent.sh looks like a useful place to start, and indeed it tells us that it is the place to start. agent.sh start seems to kick off something, and we're told there is a log file to look at. log files are about the only form of data file I can bring myself to look at, so lets do that.
A bunch of stuff suggesting the agent is downloading stuff from the server, which doesn't sound unreasonable, and then the agent suggests it has exited for an upgrade. Looking at the server again, we see that we now have one disconnected agent registered, and its promising us that it will upgrade. So we sit twiddling our thumbs for a little while, and then we're told we have zero disconnected agents, and one connected agent. Clicking to see what the connected agent is doing, we find that not only is it connected, but its running the build. And then it stops.
Checking back with the project page shows us that we have a successful build of hello world, with exactly one passing test. Teamcity allows us to view all kinds of things about a build by clicking on it. We can get a breakdown of the tests run, which looks a little useless with only one test but you will come to this page a lot as the project evolves. We can see the build log, which is essentially system.out from running the build, and other messages from team city itself. This can be very useful when it comes to debugging broken builds, or misconfigured teamcity instances. Just to get familiar with whats in here, we can see the junit output confirming the run of our unit test, and the result of the ant war task confirming creation of war.
A continuous integration server earns its keep when it detects build breakages, so lets check that our arrangement of teamcity can do that for us. Lets deliberately break the build.
Locate our hello world test code. and make it impossible for it to pass by changing hello to goodbye. Goodbye world - this surely must be the most depressed test case I've ever written. I guess I need chocolate.
Now we need to check in our change to subversion to see if team city catches the broken build. svn st - remember subversion status - shows us the status of our working copy - we see that helloworldtest has an M before it, meaning it has been modified, confirming the fact that we need to check it in.
svn ci - remember ci stands for check in and like many of the other subversion commands, takes a comment in the form of a -m switch. So check in my changes, with the comment "making test fail to ensure teamcity is working", and we get some lines confirming that the changes have been committed to the server.
What does teamcity tell us about our change. Clicking on trunk tells us nothing just yet. But if we wait a while - the wait has been cut out of this video clip to save on time, then we see that there is one pending change. we can see that its our expected change with the right comment … and it contains the right file, and we're given the option to see details of the change. ... If we click on the edited file, we get to see the differences in this file relative to the previous version. We get confirmation that hello was changed to goodbye. So all seems well so far.
If we go back to look at our build configuration we see that the change is still pending, so teamcity has not run it. Clicking around shows nothing obvious. So we go towards the settings for the build configuration. When we first set things up, we manually ran the build by impulsively clicking on the run button as soon as it appeared. Not we see that there is a step 4 in the process of setting up a build called build triggering, and it sounds like it might be the place we need to look. Straight away we see that the very first check box is to enable building on vcs checkins. Checking that box and saving the change will probably get us over the hump.
Looking at the project page again shows us that trunk has started building - in fact its just updating itself with the latest contents of subversion, and soon afterwards it registers a test failure. The drop down summary confirms that its the test we expect (well, we only have one test) and clicking on it gives us a full detail explanation of the failure. Now there is no escaping the fact that mike broke the build by changing hello to goodbye, so he'd better fix it.
Easy to do, just change goodbye back to hello, … check into subversion with an appropriate comment, … wait for teamcity to pick up the change, … and confirm that we're back to green.
Its all very well having TeamCity check the health of our builds for us, but its useful only if the team quickly become aware when a build break happens. Earlier I mentioned electrocution as a means of notifying people of breakages, but now I think its time to say that I was only joking. We don't have the technology to do that just yet, so we have to make do with something more mundane.
One very effective way of doing that is to have a large screen placed such that the entire team can see it, displaying a summary of the build status. Here you can see the idea. The entire background is green, indicating a successful build. If the build breaks, the screen goes red.
The software that does that is called piazza, and we're going to install it.
Go to google code to download it, … and then unzip it. Its creates a piazza directory with the piazza jar in it. Copy that jar into webapps/ROOT/WEB-INF/lib of your team city installation ... and start or restart the teamcity server. Remember to also have at least one agent running.
Go to localhost eight triple one to view the teamcity web app again, … and click on our hello world project. Now there's a link to the Piazza Build Monitor. Click to see what happens - we're told there are no builds being monitored by Piazza - we need to enable the status widget of whatever builds we do want to monitor.
Ok, back we go, sounds like there is some TeamCity configuring to be done. Choose the Trunk build configuration and edit its settings. The very last check box, you'll see, is enable status widget. Check this, and save the changes.
Navigate back to the hello world project, and have a look at Piazza now. Look at all that green.
When TeamCity finds changes in subversion and starts a new build, Piazza gives us a live view of the build progress. We can see build 4 is in progress, updating its source code, compiling, and finally going green.
Unfortunately build 5 happened so fast we didn't even get to see a progress bar, but you do get to see the end point of a broken build - a big red screen.
Piazza is a great way to summarise our build status, and its really hard to ignore on a big screen. Red means electrocution, green means tea. Speaking of which, lets take a break.
Hey! Who ate all the chocolate biscuits! All you've left me is ginger nuts. Man! I hate ginger nuts.
Remember this.
Lets go back and look at that duplication.
The duplication alarm bells started when I added this classpath to the compile production target.
Lets see if we can extract it out….by declaring an independent path element..….with a meaningful id.
And now lets refer to that path, rather than repeat its contents.
Leaning on the good habit of taking small steps, lets check if we can still compile, test and build a clean war.
Good - we can.
Now lets use our production compile path in the other place its required - when running the unit tests.
And again check that we can build.
And we can.
We'll apply the same trick to the test compile path.
Extract out the path and make it standalone.
Give it an id.
And refer to it.
One more check to make sure we've not broken anything.
We're still good.
The unit-tests target uses this path too, so we'd better go down there and clean it up.
That makes more sense to read doesn't it - the unit test run uses the production compile path, the test compile path, and the directory containing the test classes themselves.
One final check to make sure it all works.
It does.
This is better than it was. Some would argue that more can be done, but personally I'm happy, so I'll cross the task off as done, and go out and get some proper biscuits. There is no way I'm eating those ginger nuts.
While I'm away, you can listen to a summary of what we've done, then treat yourself in whatever way you feel is appropriate.
We saw that when working with a team, when people make breaking check ins, as they will, we all do, the effect can ripple through the team, it can be hard to debug, its annoying and it slows you down.
Continuous integration addresses this problem by being an automated team member solely responsible for checking the status of the build.
There are many build server options available, but we chose TeamCity Professional. This is very much a personal choice. I really like TeamCity, but its the doing of continuous integration thats important - not the particular software behind it. If you choose Hudson, or cruise control, or whatever, then thats perfectly fine.
Its critical that the team gets notified early and in a manner thats hard to ignore when a build break happens. Again, many ways to go about this, and you can choose your particular mix, but a highly visible build status screen like we put together with piazza is very effective.
Knowing that my back is covered by my build server relaxes me. It lessens the potential for drama and nasty surprises in my day to day process, allowing me to focus on writing the best software I can write. Having it in place tips the balance just that little bit more in favour of me having a successful project. So I would say that, continuous integration, like source code control, is not an option if you want your project to succeed. And as you've just learned, its not that hard to put in place.
Indirectly we saw some habits that are good to cultivate. Writing things down on index cards, coming back to clean things up, taking small steps. These are just some things to consider and ponder on, and allow them to seep into your own working practice, if you see fit.
Next time we're going to make our software a little more complex than a jsp that says hello. We're going to start evolving a system that enables us do stock quoting using a web service. It'll be a great way to talk about integration testing. I'm sure you can't want.
One final thing. My friend Doug complaining that my screencasts are too boring. So Doug, this one's for you.
Thanks for watching and see you next time.