Distributed builds

compared with
Current by Kohsuke Kawaguchi
on Apr 11, 2014 01:21.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (33)

View Page History
Jenkins supports the "master/slave" mode, where the workload of building projects are delegated to multiple "slave" nodes, allowing a single Jenkins installation to host a large number of projects, or to provide different environments needed for builds/tests. This document describes this mode and how to use it.

h1. Contents
A "master" is an installation of Jenkins. When you weren't using the master/slave support, a master was all you had. Even in the master/slave mode, the role of a master remains the same. It will serve all HTTP requests, and it can still build projects on its own.

Slaves are computers that are set up to build projects for a master. Jenkins runs a separate program called "slave agent" on slaves. In other words, there is no need to install the full Jenkins (package or compiled binaries) on a slave node. There are various ways to start slave agents, but in the end a slave agent and Jenkins master needs to establish a bi-directional byte stream (for example a TCP/IP socket.)

When slaves are registered to a master, a master starts distributing loads to slaves. The exact delegation behavior depends on configuration of each project. Some projects may choose to "stick" to a particular machine for a build, while others may choose to roam freely between slaves. For people accessing Jenkins website, things works mostly transparently. You can still browse javadoc, see test results, download build results from a master, without ever noticing that builds were done by slaves.  In other words, the master becomes a sort of "portal" to the entire build farm.


Follow the [Step by step guide to set up master and slave machines] to quickly start using distributed builds.

Jenkins has a built-in SSH client implementation that it can use to talk to remote sshd and start a slave agent. This is the most convenient and preferred method for Unix slaves, which normally has sshd out-of-the-box. Click Manage Jenkins, then Manage Nodes, then click "New Node." In this set up, you'll supply the connection information (the slave host name, user name, and ssh credential). Note that the slave will need the master's public ssh key copied to \~/.ssh/authorized_keys. ([This is a decent howto|http://www.linuxproblem.org/art_9.html] if you need ssh help). Jenkins will do the rest of the work by itself, including copying the binary needed for a slave agent, and starting/stopping slaves. If your project has external dependencies (like a special \~/.m2/settings.xml, or a special version of java), you'll need to set that up yourself, though.

This is the most convenient set up on Unix. However, if your are on Windows and you don't have ssh commands with cygwin for example, you can use a tool like PuTTY and PuTTYgen to generate your private and public pair of keys.

For connecting to Windows slaves through cygwin sshd, see [SSH slaves and Cygwin] for more details.


h2. Have master launch slave agent on Windows

If the above turn-key solutions do not provide flexibility necessary, you can write your own script to start a slave. You place this script on the master, and tell Jenkins to run this script whenever it needs to connect to a slave.

Typically, your script uses a remote program execution mechanism like SSH, RSH, or other similar means (on Windows, this could be done by the same protocols through [cygwin|http://www.cygwin.com/] or tools like [psexec|http://www.microsoft.com/technet/sysinternals/utilities/psexec.mspx]), [psexec|http://technet.microsoft.com/en-ca/sysinternals/bb897553.aspx]), but Jenkins doesn't really assume any specific method of connectivity.

What Jenkins expects from your script is that, in the end, it has to execute the slave agent program like {{java \-jar slave.jar}}, on the right computer, and have its stdin/stdout connect to your script's stdin/stdout. For example, a script that does "{{ssh}} {{{_}myslave{_}}} {{java \-jar \~/bin/slave.jar}}" would satisfy this.
(The point is that you let Jenkins run this command, as Jenkins uses this stdin/stdout as the communication channel to the slave agent. Because of this, running this manually from your shell [will do you no good|JENKINS:Launching slave.jar from from console]).

A copy of {{slave.jar}} can be downloaded from {{[http://yourserver:port/jnlpJars/slave.jar]}} . Many people write scripts in such a way that this 160K jar is downloaded during the script, to make sure the consistent version of {{slave.jar}} is always used. The [SSH Slaves|SSH Slaves plugin] plugin does this automatically, so slaves configured using this plugin always use the correct {{slave.jar}}.
A copy of {{slave.jar}} can be downloaded from {{[http://yourserver:port/jnlpJars/slave.jar]}} . Many people write scripts in such a way that this 160K jar is downloaded during the running of said script, to ensure that a consistent version of {{slave.jar}} is always used. Such an approach eliminates the slave.jar updating issue discussed below. Note that the [SSH Slaves|SSH Slaves plugin] plugin does this automatically, so slaves configured using this plugin always use the correct {{slave.jar}}.

{info:title=Updating slave.jar}Technically speaking, in this set up you should update {{slave.jar}} every time you upgrade Jenkins to a new version. However, in practice {{slave.jar}} changes infrequently enough that it's also practical not to update until you see a fatal problem in start-up.

Another way of doing this is to start a slave agent through Java Web Start (JNLP). In this approach, you'll interactively logon to the slave node, open a browser, and open the slave page. You'll be then presented with the JNLP launch icon. Upon clicking it, Java Web Start will kick in, and it launches a slave agent on the computer where the browser was running.

This mode is convenient when the master cannot initiate a connection to slaves, such as when it runs outside a firewall while the rest of the slaves are in the firewall. OTOH, if the machine with a slave agent goes down, the master has no way of re-launching it on its own.

h2. Other Requirements

Also note that the slaves are a kind of a cluster, and operating a cluster (especially a large one or heterogeneous one) is always a non-trivial task. For example, you need to make sure that all slaves have JDKs, Ant, CVS, and/or any other tools you need for builds. You need to make sure that slaves are up and running, etc. Jenkins is not a clustering middleware, and therefore it doesn't make this any easier.  Nevertheless, one can use [a server provisioning tool|http://en.wikipedia.org/wiki/Provisioning#Server_provisioning] and [a configuration management software|http://en.wikipedia.org/wiki/Comparison_of_open_source_configuration_management_software] to facilitate both aspects.



h1. Example: Configuration on Unix

This section describes my current set up of Jenkins slaves that I use inside Sun for my day job. My master Jenkins node is running on a SPARC Solaris box, and I have many SPARC Solaris slaves, Opteron Linux slaves, and a few Windows slaves.
This section describes Kohsuke Kawaguchi's set up of Jenkins slaves that he used to use inside Sun for his day job. His master Jenkins node ran on a SPARC Solaris box, and he had many SPARC Solaris slaves, Opteron Linux slaves, and a few Windows slaves.
* Each computer has an user called {{jenkins}} and a group called {{jenkins}}. All computers use the same UID and GID. (If you have access to NIS, this can be done more easily.) This is not a Jenkins requirement, but it makes the slave management easier.
* On each computer, {{/var/jenkins}} directory is set as the home directory of user {{jenkins}}. Again, this is not a hard requirement, but having the same directory layout makes things easier to maintain.
* All machines run SSHD. Windows slaves run cygwin sshd.
* All machines run {{sshd}}. Windows slaves run {{cygwin sshd}}.
* All machines have ntp client {{/usr/sbin/ntpdate}} installed, and synchronize clock regularly with the same NTP server.
* Master's {{/var/jenkins}} have all the build tools beneath it \--\- a few versions of Ant, Maven, and JDKs. JDKs are native programs, so I have JDK copies for all the architectures I need. The directory structure looks like this:
{noformat}/var/jenkins
+- java-1.5
{noformat}
* Master's {{/var/jenkins/.ssh}} has private/public key and {{authorized_keys}} so that a master can execute programs on slaves through {{ssh}}, by using [public key authentication|http://www.google.com/search?q=ssh+keygen].
* On master, I have a little shell script that uses rsync to synchronize master's {{/var/jenkins}} to slaves (except {{/var/jenkins/workspace}}). I also use this the script to replicate tools on all slaves.
* {{/var/jenkins/bin/launch-slave}} is a shell script that Jenkins uses to execute jobs remotely. This shell script sets up {{PATH}} and a few other things before launching {{slave.jar.}} Below is a very simple example script.
{code}#!/bin/bash
* Finally all computers have other standard build tools like {{svn}} and {{cvs}} installed and available in PATH.

Note that in the more recent Jenkins packages, the default JENKINS_HOME (aka home directory for the 'jenkins' user on Linux machines, e.g. Red Hat, CentOS, Ubuntu) is set to /var/lib/jenkins.


h1. Scheduling strategy

If you have interesting ideas (or better yet, implementations), please let me know.

h1. Node monitoring

Jenkins has a notion of a [“node monitor”|http://javadoc.jenkins-ci.org/hudson/node_monitors/NodeMonitor.html] which can check the status of a slave for various conditions, displaying the results and optionally marking the slave offline accordingly. Jenkins bundles several, checking disk space in the workspace; disk space in the temporary partition; swap space; clock skew (compared to the master); and response time.

Plugins can add other monitors.

h1. Offline status and retention strategy

Administrators can manually mark slaves offline (with an optional published reason) or reconnect them.

Groovy scripts such as [Monitor and Restart Offline Slaves] can perform batch operations like this. There is also a CLI command to reconnect.

Then there is a background task which automatically reconnects slaves that are thought to be back up. The behavior is configurable per slave (or per cloud, if using cloudy provisioning for slaves) via a [“retention strategy”|http://javadoc.jenkins-ci.org/hudson/slaves/RetentionStrategy.html], of which Jenkins bundles several (plugins can contribute others): always keep online if possible; drop offline when not in use; use a schedule; behave according to cloud’s notion of load.

h1. Transition from master-only to master/slave

Projects that are newly created on master/slave-enabled Jenkins will be by default configured to roam freely.

h1. Master on public network, slaves within firewall
h1. Access an Internal CI Build Farm (Master + Slaves) from the Public Internet

One might consider setting up the Jenkins master on the public network (so that people can see it), while leaving the build slaves within the firewall (because having a lot of machines on the internet is expensive.) There are two ways to make it work:
* Allow port-forwarding from the master to your slaves within the firewall. The port-forwarding should be restricted so that only the master with its known IP can connect to slaves. With this set up in the firewall, as far as Jenkins is concerned it's as if the firewall doesn't exist.
One might consider make the Jenkins master accessible on the public network (so that people can see it), while leaving the build slaves within the firewall (typical reasons: cost and security) There are several ways to make it work:
* Equip the master node with a network interface that's exposed to the public Internet (simple to do, but not recommended in general)
* Allow port-forwarding from the master to your slaves within the firewall. The port-forwarding should be restricted so that only the master with its known IP can connect to slaves. With this set up in the firewall, as far as Jenkins is concerned it's as if the firewall doesn't exist.  If multiple hops are involved, you may wish to investigate how to do ssh "jump host" transparently using the ProxyCommand construct.  In fact,  with a properly configured "jump host" setup, even the master doesn't need to expose itself to the public Internet at all - as long as the organization's firewall allows port 22 traffic.

* Use JNLP slaves and have slaves connect to the master, not the other way around. In this case it's the slaves that initiates the connection, so it works correctly with the NAT firewall.
h1. Running Multiple Slaves on the Same Machine

 It is possible to run multiple slave instances on a Windows machine, and have them installed as separate Windows services so they can start up on system startup. While the correct use of executors largely obviates the need for multiple slave instances on the same machine, there are some unique use cases to consider:
Using a well established virtualization infrastructure such as [Kernel-based Virtual Machine (KVM)|http://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine], it is quite easy to run multiple slave instances on a single physical node.  Such instances can be running various Linux, \*BSD UNIX, Solaris, Windows.  For Windows, one can have them installed as separate Windows services so they can start up on system startup. While the correct use of executors largely obviates the need for multiple slave instances on the same machine, there are some unique use cases to consider:
* You want more configurability between the configured nodes. Say you have one node set to be used as much as possible, and the other node dto be used only when needed.
* You may have multiple Jenkins master installations building different things, and so this configuration would allow you to have slaves for more than one master on the same box. That's right, with Jenkins you really can serve two masters.
* You may wish to leverage the easiness of starting/stopping/replacing virtual machines, perhaps in conjunction with Jenkins plugins such as the [Libvirt Slaves Plugin|https://wiki.jenkins-ci.org/display/JENKINS/Libvirt+Slaves+Plugin].
* You wish to maximize your hardware investment and utilization, at the same time minimizing operating cost (e.g. utility expenses for running idling slaves).

Follow these steps to get multiple slaves working on the same Windows box:
* Open a shell prompt, cd into the slave work dir.
* First run "jenkins-slave.exe uninstall" to uninstall the one that the jnlp-launched app installed. This should remove it from the service list.
* Now edit jenkins-slave.xml. Modify the id and name values so that your mutliple multiple slaves are distinct. I called mine jenkins-slave-a and Jenkins Slave A.
* Run jenkins-slave.exe install and then check the Windows service list to ensure it is there. Start it up, and watch Jenkins to see if the slave instance becomes active.
* Now repeat this process for a second slave, beginning with configuring the new node in the master config.
# If you use binary-unsafe remoting mechanism like telnet to launch a slave, add the {{\-text}} option to {{slave.jar}} so that Jenkins avoids sending binary data over the network.
# When the same command runs outside Jenkins just fine, make sure you are testing it with the same user account as Jenkins runs under. In particular, if you run Jenkins master on Windows, consult [How to get command prompt as the SYSTEM user].
# Feel free to send your trouble to {{one of our mailing lists\|http://jenkins-ci.org/content/mailing-lists}}

h1. Other readings

* Jenkins Build Farm Experience [Volume I|http://blogs.sonatype.com/people/2009/01/the-hudson-build-farm-experience-volume-i/], I|http://blog.sonatype.com/2009/01/the-hudson-build-farm-experience-volume-i/], [Volume 2|http://blogs.sonatype.com/people/2009/02/the-hudson-build-farm-experience-volume-ii/], 2|http://blog.sonatype.com/2009/02/the-hudson-build-farm-experience-volume-ii/], [Volume 3|http://www.sonatype.com/people/2009/02/the-hudson-build-farm-experience-volume-iii/] and [Volume 4|http://www.sonatype.com/people/2009/02/the-hudson-build-farm-experience-volume-iv/]
* HudsonWindowsSlavesSetup
[http://community.jboss.org/wiki/HudsonWindowsSlavesSetup]