They Make Mageia – the Sysadmin team : Installation and configuration of software on Mageia servers

In the Mageia project the sysadmin team is responsible for the setup and maintenance of all the Mageia infrastructure, for users and contributors alike. To help people understand what this team does, and to share some ideas with other sysadmins, we will publish a series of posts to explain the things that we do.

Our main tasks are :

Installation of the servers in the datacenter
Installation and configuration of various software on Mageia Servers
Various admin tasks such as user permissions update, package removal, package movement between repositories etc.
Development and maintenance of various tools, such as components of the package build system

This first post will talk about the process used for the installation and configuration of software on Mageia servers, and explain some of the reasons we do this, and why you might want to use similar process when managing your own servers.

A summary of the process used to set up software on Mageia servers could be :

All software is installed using packages
All packages are built using the build system
All packages are installed and configured using puppet

The reasons for doing this might already be obvious for many Mageia packagers. This post will try to explain it for people who are not necessarily packagers.

Building and installing the software

One of the most common sysadmin tasks is software installation or update. When your Linux distribution provides packages for the software that you want to use, it is easy; the package can be installed using the packages manager.

However, in many cases the software that you need won’t be available in the distribution you are using, or will not be the version that you want to use.

Doing manual builds

Many people think that the simplest solution in this case is to download the source code for the software, follow the build instructions and run the provided install script or Makefile. There are however many problems when doing this:

Installing build dependencies :
You need to have all the build dependencies installed on your system. In most cases you will want to build the software on a dedicated build server in order to avoid installing too much build dependencies on your target machine, and then copy the resulting binaries on your machines.
Managing dependencies :
If you are using tarballs or an NFS server to distribute your binary software to the target machines, you will need to find and install the required dependencies on each machine where the software will be used. You will want to write the list of dependencies somewhere, so you can know what needs to be installed when you need to use that software on a new machine.
Updating software :
When building software, you often need to set specific options to a configure script, or in a configuration file. Some software can be difficult to build, or require some complex operations or configuration to build. When updating the software to a newer version, you will probably want to keep the same build configuration as before, to avoid introducing unneeded changes that could cause breakage. All of this is difficult to remember and unless you do it everyday you will probably forget. So you will want to save the build instructions in a file, to be able to reuse them the next time you need to update this software.
Patching the software :
Sometimes the software will require some small modifications to build, or to run, or you want some new feature. You can apply the changes before starting the build process. However those changes will be lost the next time you extract a new source tarball to update the software.
To be able to apply the same changes on the next versions, you will want to save those changes as a patch somewhere.
Keeping a changelog :
It’s easy to forget why you made some changes or updated a particular piece of software. Especially if you are not alone and working within a
team, you will want to keep a changelog of the changes you make on the software you build and install.
Knowing what version of your software is installed, uninstalling or updating software :
Knowing what version of a software is currently installed is very useful. It’s also very useful to be able to uninstall software, or install a different version for testing or debugging, or revert to a previous version if there’s a problem. If you use the standard install procedure, most software will install files in many different places on the system. Knowing which files are installed where, and which version is in use, is difficult if not impossible. When installing a new version of software, the install scripts will overwrite the old files, but will not remove obsolete files, which can create confusion or problems. Some people avoid this kind of problem by installing each piece of software in its own directory, including the version number in the directory name. This can, however, create other problems: most software is not intended to be installed this way and will require complex tricks to run. And the selection of the version that should be used needs some complex symbolic links or PATH environment variable updates. This becomes even more complicated when you install different software with dependencies between them. In order to avoid this kind of problem, the best solution is to use some tools to track which files are installed where, and which version.

Using packages

To solve all those problems you will want to use specific tools. You might want to start writing your own scripts and tools to manage these kinds of things. It makes sense to use the tools already available, instead of using your own specific tools. The best tools to solve this kind of problem are the Package Managers.

If you have never created packages before, you will need to spend some time learning how to do it. However this can save you a lot of time afterward. Using packages to build and install software has a lot of advantages:

Installing build dependencies :
Packages will permit you to build your software on a dedicated machine, producing packages that will be installed on the target machines without requiring you yo install any of the build dependencies. The package source also allows you to define the list of build dependencies, so that they can be installed automatically on your build machine when you want to build the package, and possibly removed afterward.
Managing dependencies automatically :
Tracking dependencies is time consuming and not always easy. Fortunately, most packaging systems will analyse the files included in the package to automatically detect the dependencies of the software. The perl, python, ruby, PHP, C/C++ libraries will usually automatically be detected during the package build. For the dependencies that cannot automatically be detected, it is also possible to define explicit dependencies in the package. All of those dependencies can then be installed automatically by the package manager when installing the software.
Updating software :
The source package contains the build instruction. The process used to build software will sometimes change a little, but most of the time it will be the same for all versions of the software. In that case, updating a package to a newer version is as simple as updating the version number in the package source and typing the command to start the package build. Rebuilding your software with a different option is also something that can be done easily.
Patching the software :
Source packages allow you to include patches that will be applied during the build of the package. This makes it easier to track the changes you applied to your software. When updating the software to a newer version, the same changes can easily be applied.
Keeping a changelog :
Packages include a changelog that you can use to explain the reasons for you changes, so you or other team members can know why you did some change 6 months ago. If you are using a source control tool like git or subversion to manage your source packages (which is recommended), you can also use the commit logs as a changelog.
Knowing what version of your software is installed, uninstalling or updating software :
Packages allow you to track the version of the software installed on your system. It also allows you to easily install, uninstall or update software, or find which package provides a file.

Building packages in a clean environment and deploying your packages

Why you should use a build system

The package manager is a very nice tool to manage the build and installation of software. However this is not enough. The package manager will build the software using the tools and libraries available on the current system, after installing the listed build dependencies.

There are a few problems with this :

Multiple-distribution support :
Sometimes your infrastructure will be using different distributions, or different releases of the same distribution. A package built on one distribution will not necessarily work on an other one, because the version of some components are different or incompatible. In order to avoid this kind of problems, the packages should be built using the same distribution as where they will be used.
Clean building environment :
The source packages include a list of build dependencies that should be installed to build the package. However when building the package on your own machine or on a specific build server, there are usually other packages that are already installed and will potentially be used during a package build. It is therefore very easy to forget that you installed some package, forget to include it in the package build dependencies and not notice the problem because the build works for you. You system might also have some custom configuration or updates that you forgot you installed, that can also impact package build. The problem will not be noticed until someone needs to rebuild the package on another system. If you want to increase the chances of being able to reproduce a package build in the future, you should use a clean environment for building packages. Usually this is done by creating, for each package build, a new minimal chroot for the selected distribution.

Once your package has been built, it needs to be copied onto the machines where it will be installed. Distributions usually provide tools to manage this (apt-get on Debian, urpmi on Mageia, yum on Fedora, etc.). Those tools will automatically download and install a package and its dependencies from a server sharing a package repository. In order to use those tools, you need to set up a package repository, which is a directory containing all the available packages and some metadata.

All of this can be managed manually, however it is much better to set up a package build system that will manage all of this for you automatically. Using a package build system has many advantages :

Less error-prone :
Building packages in a chroot, copying the resulting files to the correct repository and regenerating repository metadata is not very difficult, but is time consuming and error-prone if done manually. Using an automated packages build system will save time and avoid many errors.
Easier :
If building and installing packages in the repository is a difficult and time consuming task, you or other members of your team will be tempted to avoid it and do it using another solution.
Revision control tools and traceability :
Using a revision control tool like git or subversion to manage your source package changes is recommended. An SCM tied to the build system will ensure that any package available in the package repository is also available in the source control repository, giving traceability to your packages.
Enforcing packaging policies :
Some tools are available to enforce some packaging policies (rpmlint for rpm, lintian for deb). Having some packaging policies is useful to have consistent packages. The package build system can be configured to automatically run some policy tests and refuse upload of packages not complying with the policy.
Monitoring :
The build system allows monitoring of the latest builds. A web interface provides a review of the latest build, build logs. A mailing list can be used to receive build notifications. When working with a team it allows all team members to follow the latest changes.
Automation :
Some specific packages may require additional tasks to be done when they are updated. Sending an email, extracting some files to update a website or other tasks can be scripted so they are done automatically when this specific package is updated. This is what is used to extract the Mageia installer files in the mirror tree when the package is
updated.

How to install a Mageia Build System

This will be the topic of an other post.

Configuring your software

Installing software is usually only the first part of the work that is done by sysadmins. The second part is the configuration of the software. The package will do the first part of the initial configuration, however more configuration is usually needed.

There are various ways to do it :

Edit the configuration directly on the server manually
Avoid doing any manual configuration on any of your servers, but use a configuration management tool such as Cfengine, Puppet or Ansible to do it for you.

When using a configuration management tool, you don’t directly update the configuration on your servers, but write some rules in your configuration management repository that will be applied automatically by your configuration management tool. This is not as fast as directly editing the configuration on your server, but there are many advantages :

Manage your infrastructure like a software project :
Configuration tools allow you to manage your infrastructure like any other regular software development project. Many of the tools available to software developers can be used : source revision control, patch submission, code review, automated tests, etc.
Teamwork :
If working with a team, you can store your server configuration rules in a common source revision control repository, making it possible to follow and review all changes to the servers made by all members of the team.
Documentation :
Having some documentation about what software is installed on your servers and how they are configured is very useful, especially when there can be more than one person working on it. However a very common problem with documentation is that someone writes it initially, but nobody maintains it and it quickly becomes outdated. It is very easy to make a change and forget to document it, or forget that a documentation exists somewhere and needs to be updated. But having accurate documentation is important, and sometimes having outdated documentation providing false information can be worse than no documentation. The configuration management repository can be used as a kind of documentation on your infrastructure, and as it is what is actually used to set up your infrastructure, it is much more likely to be accurate than any other documentation. In software development, having self-documented code or code that can be understood without documentation is in general better than having documentation maintained separately, and this is the same for system administration.
Testing environment and reproducibility :
Using a configuration tool allows you to easily reproduce the configuration of a server on another server. This is useful when you need to replace or add a server, or if you need to set up a testing environment.
Maintaining a correct configuration :
The configuration management tool will run at regular intervals to check that the configuration is still correct and apply any necessary changes.
Reusability :
Like in software development, the use of a configuration management tool allows you to reuse the components that you created. It is usually possible to create modules with parameters to change the behaviour of the module. You can sometimes find existing modules on the internet, but unfortunately most of them will require important modifications to be adapted for your own use case. Earlier versions of puppet were missing important features such as parameterised class, so creating reusable components was difficult, but this has evolved.

Which configuration management tool to use ?

When we started the setup of Mageia servers, at the beginning of the Mageia Project, we decided to use puppet, after looking at the different tools available. Puppet looked like the most interesting configuration management tool at the time. In the last few years, things have evolved a lot in this area, and there are other alternatives to puppet that you might want to check before deciding which one to use :

The Mageia puppet modules

The puppet modules that we use to configure the Mageia servers are available on an svn repository.

What needs to be improved

The current process is good, but there are still some things that could be improved. If you want to contribute but don’t know what could be useful, here are some ideas.

Automatic packages generation :
Some languages such as Perl, Python or Ruby provide their own packaging system for libraries. Many people like to use those packages rather than RPM packages because in the distribution packages are not always available or up to date. The advantage of those packages is that they are made upstream, so are immediately available. The problem is that they are usually not very well integrated with the rest of the system, and require using two different packaging systems, which makes things more complicated. In many case, people are not experienced packagers, so they simply use the available language specific packages, because they think packaging is too complex or don’t have time to do it. If a conversion from those packages to RPM was more simple, people could benefit from good availability of packages well integrated with the rest of the system. Fortunately those languages’ specific packages usually contain all the information that is needed for RPM packaging (descriptions, license, dependencies, etc.), so creating an RPM package is usually a simple conversion of that information into RPM format and can be automated. Thanks to the work from Jérôme Quelin on cpan2dist it’s possible to generate Mageia RPM packages from CPAN modules automatically. This is what allows us to have 3300 Perl packages available in the distribution.
We need to have similar tools for python, ruby and other language packages.
Build System setup :
We are using a build system to build our packages because we already have a build system available to build the Mageia distribution, so it’s not much more work to configure it to also have our own package repository. However installing a Mageia build system is currently not an easy task and people who don’t build a complete distribution will not want to spend too much time to configure a build system. So we need to improve the build system setup process to make it more simple. What is currently missing is some documentation to explain how to do this, using our puppet module. A future blog post on this blog will explain how to do this.
OBS Support :
Another alternative is to use OBS (Open Build Service), which is a nice build system with support for multiple distributions. However it is still lacking support for Mageia. We need to fix that so that it becomes possible to manage Mageia repositories using OBS.
Mageia support in Ansible and Salt Stack :
Ansible and Salt Stack are both interesting configuration management tools. However, they are still missing urpmi support for package installation. Update: Philippe Makowski has been working on a urpmi module for Ansible, but this is not yet integrated upstream, and contributions are welcome.

7 Responses to They Make Mageia – the Sysadmin team : Installation and configuration of software on Mageia servers

Roller says:

June 6, 2013 at 10:59 pm

Thanks for sharing this with us, I believe that this will help a lot to get more people helping with packaging. I know I am going to start learning more about it. I look forward to seeing the next post.

-Roller
philippem says:

June 7, 2013 at 6:32 am

just for information, I quickly wrote a small module so you can
use urpmi with Ansible (http://ansible.cc/)

seems that it will not be included upstream yet
(https://github.com/ansible/ansible/pull/2972)
but you can find the module here :
https://github.com/pmakowski/ansible/blob/devel/library/packaging/urpm

of course, contributions are welcome.
- boklm says:
  
  June 7, 2013 at 9:31 am
  
  Thanks for the information. I updated the blog post.
Filip says:

June 7, 2013 at 8:01 am

Thanks, boklm. I’m impressed. Looking forward for next great insights.

I found a sentence which could be a bit unclear to non native speaker (negating second part of the sentence or not):
Tracking dependencies is not always easy and time consuming.

This sounds better to me:
Tracking dependencies is time consuming and not always easy.
- boklm says:
  
  June 7, 2013 at 9:35 am
  
  Indeed, this sentence was unclear. I updated it with your suggestion. Thanks !
Pingback: Links 9/6/2013: Android Tablets Domination, Many PRISM Links | Techrights
Pingback: Los que hacen Mageia – equipo Sysadmin : Instalación y configuración de software en los servidores de Mageia | Mageia Blog (Español)

They Make Mageia – the Sysadmin team : Installation and configuration of software on Mageia servers