From Misc and sysadmin team
UPDATE: Servers are now back to life and testers back to final isos!
As some people may have seen, we suffered from a severe power outage yesterday, around 00h05 CET time in one of our hosting datacenter.
It seems that an electrical problem stopped some servers at the Lost Oasis server room in Marseille, with the net effect of stopping 4 servers (valstar, alamut, jonund and ecosse as well as the virtual machine running on alamut aka friteuse_tmp). It also impacted all servers of zarb.org that still provides support for some services (like www, mailing-list, secondary DNS, SMTP, etc.).
Perenoel, one of the great Lost Oasis guys, went to the building during the night to take care of the issue and so the servers got power again around 00:20 CEST time. Lost Oasis people worked until 4 o’clock in the morning to fix all servers.
Now all but 2 servers, Valstar and Jonund, are back online.
Jonund is just a build node, we have another one and we are in freeze, so we can cope with the failure without much trouble.
Valstar is the main SVN and LDAP server, so almost everything depend on it. Impacted services:
- Identity, no access ( no account creation )
- forum, bugzilla, transifex : mostly read only access, no one can log in, but currently logged in people are still ok
- most @mageia.org aliases ( emails are still in queue on zarb )
- shell access ( rabbit, champagne )
- some Sympa lists ( @ml.mageia.org ), mostly board one
- buildsystem ( no scheduler, no mirror for builders )
- automated administration of all servers ( no puppetmaster )
The rest ( website, blog, xymon, mailling list, svnweb ) should be ok. We are still looking into it. Lost Oasis told us they would go look at our server in the afternoon, we will keep you informed of the changes with a mail on our list.
Sysadmins will also be looking at making the infrastructure more resilient to such problems (for example, a 2nd LDAP would have solved most issues, and this is already planned ).
If you have any questions please ask on the sysadmin mailing list or on the #mageia-sysadm IRC channel on Freenode , where we will be happy to answer you.
Update (13:10 CEST): all systems are back, up and operational now. \o/