HOWTO Distcc over SSH with Portage
From Gentoo Linux Wiki
[edit] What will we be doing, and why?
distcc allows you to distribute a compilation over several machines, significantly reducing build times. There is already an official Gentoo distcc guide, but it only covers how to set up distcc using its own daemon, distccd. Sometimes, you have to use distcc over SSH instead, perhaps because of restrictive firewalls, or because you just don't want another daemon listening to the network. This guide covers how to configure distcc and Portage to get your Emerge jobs distributed by distcc using SSH.
[edit] Requirements
- sys-devel/distcc
- net-misc/openssh
- At least two computers running the same version of sys-devel/gcc on the same architecture (e.g. X86, SPARC) (or not - see HOWTO Distcc server on Windows and TIP AMD64-x86-distcc for how to run compilation nodes on slightly different machines).
- A decision as to which machine should be the front end. This is the one you will run emerge on and the one that will do all the linking, running Autoconf, etc., so it should probably be the machine that will end up using the packages in the end. The compilation nodes can be anything, as long as they have sys-devel/distcc installed and run the same version of gcc as the front end. In particular, they do not need to have installed any packages that what you are compiling depends on.
[edit] Setting up distcc
The distcc ebuild creates a distcc user, but it cannot log in, as is standard secure practice for daemon accounts. We will now change that.
First, it needs a home directory. This should be done on all nodes.
# mkdir /etc/distcc # mkdir /etc/distcc/.ssh # usermod -d /etc/distcc distcc
On the compilation nodes, it also needs a valid shell:
# usermod -s /bin/bash distcc
Then we set up our SSH keys, since distcc can't supply passwords at login. Do this on the front end machine.
# ssh-keygen -t dsa -f /etc/distcc/.ssh/id_dsa
When ssh-keygen asks for a passphrase, just hit enter so it sets none.
Distribute the public key to all compilation nodes:
# scp /etc/distcc/.ssh/id_dsa.pub <compilation-node>:/etc/distcc/.ssh/authorized_keys
Start second way
Beaware : You should append each new host to the old one...
# scp /etc/distcc/.ssh/id_dsa.pub <compilation-node>:/root/id_dsa.pub # cat `/root/id_dsa.pub` >> /etc/distcc/.ssh/authorized_keys # rm /root/id_dsa.pub
Maybe would be done in a more elegant way :)
End Second way
A more elegant way
# for i in <compilation-nodes>; do cat \ /etc/distcc/.ssh/id_dsa.pub | ssh $i \ "cat >> /etc/distcc/.ssh/authorized_keys"; done
On all nodes, make sure SSH is happy with the permissions:
# chown -R distcc:daemon /etc/distcc //Only on the front end # chown portage:portage /etc/distcc/.ssh/id_dsa # chmod 600 /etc/distcc/.ssh/id_dsa # chmod 644 /etc/distcc/.ssh/id_dsa.pub //Only on compilation nodes # chmod 644 /etc/distcc/.ssh/authorized_keys
Now try to log in as the distcc user from your front end machine to a compilation node using the just created keys:
# ssh -i /etc/distcc/.ssh/id_dsa distcc@<compilation-node>
If you get a prompt for a password, check syslog on the compilation node to see why SSHd didn't like the SSH key. One possible reason could be that the SSH server does not allow empty passwords. Make sure that you set "PermitEmptyPasswords yes" in /etc/ssh/sshd_config on the compilation nodes.
If that fails and if in /var/log/ssh/current file you see something like that: [sshd] User distcc not allowed because account is locked execute
# passwd -u distcc
to unlock this account.
If that succeeded, you now need to collect the public SSH host keys of all compilation nodes so distcc doesn't get stuck waiting for you to confirm each host identity. A simple way to do this is to use ssh-keyscan:
# ssh-keyscan -t rsa <compilation-node-1> <compilation-node-2> [...] > /var/tmp/portage/.ssh/known_hosts # chown portage:portage /var/tmp/portage/.ssh/known_hosts
You should then manually verify each host key, perhaps by logging in at the physical console of each machine and running ssh-keyscan locally. Your paranoia may vary.
Unfortunately, distcc can't supply arguments to ssh, so we need a wrapper script it can call that supplies the correct arguments. Point your favorite editor to /etc/distcc/distcc-ssh and enter:
#!/bin/bash exec /usr/bin/ssh -i /etc/distcc/.ssh/id_dsa "$@"
and make sure the file is executable
# chmod a+x /etc/distcc/distcc-ssh
[edit] Setting up Portage
distcc is controlled by a couple of environment variables. One place to set such variables so that they will only be used for Portage is in /etc/make.conf. Add the following lines at the bottom:
DISTCC_SSH="/etc/distcc/distcc-ssh" DISTCC_HOSTS="localhost/2 distcc@<compilation-node>/<num-parallel-jobs> [...]"
(see the distcc manual page for the syntax of the DISTCC_HOSTS line).
To take advantage of all your new compilation power, you need to run a lot of jobs in parallel. Find the MAKEOPTS line, still in /etc/make.conf, and change the number after "-j" to the sum of the allowed number of jobs on each node, e.g.
MAKEOPTS="-j6"
for three machines with two jobs each.
Lastly, we need to notify portage that we will be using this feature so also add the following line below:
FEATURES="distcc"
[edit] Test drive
That should be it. To watch it in action, start an emerge on the front end and watch top on a compilation node - you should see some gcc processes owned by the distcc user. If it doesn't seem to work, try setting
DISTCC_VERBOSE="1"
in /etc/make.conf on the front end and see if you get any informative messages.
[edit] TODO
- I'm not happy with the security implications of letting a daemon account log in, even if it is just with SSH keys. Try to find another way. [See my comments in "discussion and bugs" above.]
This procedure might be improved by using keychain
Response: The security of this approach is based on your ability to control access to the keys and access to the computer which has permission to login. Anyone who has an account on the host computer could potentially gain access to the client computer(s). Other than that, this approach should be sufficiently secure. You can add another layer of security by using IPTABLES to limit access based on the computer's ip address or network. You can also disable the login account whenever it is not in use. You could also achieve a further layer of security by creating the client computer as a special purpose (untrusted) virtual machine, that way even if compromised the attacker wouldn't be able to do much. But such extreme measures aren't really needed; on the other hand, the isolation provided by a virtual computer could have other benefits.
A very different approach is to use ssh instead of NFS with the method shown here. HOWTO Emerge on very slow systems.
[edit] Sharing SSH keys
Command to run for sharing SSH keys on 2 linux servers to login without authentication
