Table of contents of this page:
- At least 2 SysAdmins with access to each server
- 1.2. Use standard procedures where possible, if there is no good reason to proceed differently
- 1.3. Document your setup, specially the non-standard procedures
- 1.4. Respect the environment set up by another admin
There should be *at least* two sysadmins with access to every server.
In order to reduce the high bus factor we currently have, and increase our resilience
Sysadmins are quite allergic to others messing with their systems and the good ones out of respect never do that until explicitly asked. Nevertheless login should be possible for multiple people on a standard server to reduce the bus factor and they should be either “gifted to automagically understand how things are set up”, or there should be some documentation written somewhere accessible to them in order to help them understand how that specific server is set up (which might be different from other sysadmin recipes, as valid as the other one).
Some day, another admin might have to enter your setup to help, and the more standard setup, the easier for them to get comfortable patching the system.
Standard procedures is more a concept than a documented reality. Let’s just use some guidelines by saying that standard is what is provided by the distro or packaged tools and non standard is what is overly clever. Keep It Simple and Silly because the next person needing to touch it might do so in a crisis in the middle of the night not awake yet.
Every sysadmin has its own recipe on how to improve this or that section of the server, which deviates from the more standard procedure documented in other places.
Document your changes, so that this important knowledge is not lost (in case of the unfortunate event that a bus “visits” you or something else happens), and in case of need, another person can easily find out your changes, and how to proceed there.
To avoid scattering info in wiki pages all around, we can benefit from using the wiki structure Roles and Teams, which has childs for all Teams in tiki.org.
For instance, things related to SysAdmin, should be found as child pages from the Tiki Admin Group substructure.
Also for sysadmins is the Community Infrastructure Blog for Community Infrastructure sysadmins to log their activities so that all other sysadmins can know what has been changed, configured, done, etc...
There should be backups (that should be documented also, as suggested in the previous rules), but it’s way much better if we don’t have to deal with retrieving and restoring backups, which usually involves some unexpected surprises or some content lost due to uncontrolled factors, etc.
The safest way to enter a server that was set up by another admin is to fetch data only, to place it somewhere else, and play/fine tune it there to split services, clone servers, etc.
If there was a very good reason to change something (e.g., patch a system after an important vulnerability has been reported and the main sys admin is unavailable for too long), discuss with others admins when possible before doing the change.
And report back about your changes as soon as possible to that other admin, or other admins, when you had to apply them urgently for some important reason.
This way the community can stay responsive, if anything bad happens, fetch sites from the server and set them up on a new one managed by someone else. These concepts are being succesfully used by several admins in some servers (amette/Nelson, Xavi/Ferran/Alex, Xavi/Alex, Xavi/Carlos, ...; Though not extensively tested, amette didn’t yet find a bus he liked enough to jump in front of it amette , but xavi had a road accident that took him off for many months and the servers set up this way evolved organically where needed and users of services didn’t notice it Xavier de Pedro .