[SOLVED] Most of mageia.org temporarily down

Since a little over two hours ago, most of our servers can no longer be reached.

The issue is under investigation and we hope to have all of our servers back online soon.

Update: One of our sysadmins explained that our servers which can no longer be reached, are the ones in Marseille.

Update2: The temperature in the data center had risen too much after a problem with the electricity. Someone from MarsNet in Marseille is willing to try to start our servers, when checking theirs in the same data center, tomorrow morning.

Update 3, 0ctober 4, 10:25 UTC: The cooling system in the data center is broken and will be repaired. No attempt has been made to restart our servers, because it is still too hot for them to be able to function well.

Update 4, 0ctober 5, 09:04 UTC: The two broken cooling systems will be fixed today.

Trebot’s suggestion to temporarily redirect calls to http://www.mageia.org to this blog entry is a good one. Edit: On October 6, a slightly different solution was implemented.

Update 5, October 6, 09:18 UTC: The cooling systems should to be good now. The next step is for a sysadmin to travel to Marseille to restart our servers.

Update 6, October 6, 10:05 UTC: It is expected that up to two more days are needed for recovery.

Update 7, October 7, 10:52 UTC: Using the mirrorlist to install is supported again since last night.

Update 8, October 8, 20:55 UTC: Today we have been waiting for confirmation that the temperature in the data center is low enough now. We hope to get that confirmation when the weekend is over.

Update 9, October 9, 10:22 UTC: It was safe to start the servers again, which was done by someone from the DC. All booted up nicely, apart from the main one.

Update 10, October 9, 14:00 UTC: Everything works again 🙂

This entry was posted in Mageia, sysadmin and tagged , . Bookmark the permalink.

Curious about Mageia? Download it, give it a try and tell us how you feel about it.

Want to bring something to it? Learn how you can contribute and donate.

35 Responses to [SOLVED] Most of mageia.org temporarily down

  1. Marc says:

    Maybe we start a donation campaign for buying own servers for the Mageia organization?

    It looks a bit like amateurs that we depend on servers in Marseille that are not cooled. Never heard of this before with other seraoius Linux distributions.

    • papoteur says:

      The data room is cooled, but the cooling system is out of service.

    • trebot says:

      It is not the first time that a data center has had a cooling system failure that resulted in the shutdown of servers. Cooling systems are rated for certain maximum external heat. If it is too hot outside the cooling system will eventually fail and the cooling equipment will eventually burn out. Servers as do all computers have a maximum ambient temperature at which they can function and when reached they stop working.

      I worked at one place where the data center cooing system failed. It tool several days to get the HVAC engineers out with new equipment and a crane to place it on the roof. This leads to the question should Mageia at some point consider having some sort of fail over system in place at another location even if it is one that just posts a temporary home page notifying the status of the servers?

    • Filip says:

      I’m not sure where you got this info. We have our own servers and they are cooled (but yeah this malfunction was very serious).

      How will you help?

  2. Jens Persson says:

    You can not be serious! Quoting John McEnroe.

  3. trebot says:

    It may be useful to temporarily to redirect calls to http://www.mageia.org to this blog entry.

  4. Pingback: La mayor parte de mageia.org temporalmente fuera de servicio | Mageia Blog (Español)

  5. Dieter says:

    Anyone who uses Mageia and uses a standard installation will not receive any updates as the mirror list only comes from mageia.org.
    That needs to be changed urgently.

    • marja says:

      Fully agreed. Some of us are trying to write a static mirror list, to be placed on one of our servers that is still online.

      • trebot says:

        If you perform a fresh install there is no mirror list available locally. I suggest in future that the basic list be installed as a fall back so that if the mirrorlist server is not available that list can be used. I ran into this problem yesterday but knew that I could use the “Specific” option and provide the URL to a mirror. Fortunately I knew the URL https://mirrors.kernel.org/mageia/distrib/9/x86_64/ but many users, especially new users would not know what URL to use.

  6. Peter Lawford says:

    it is most importantly to move off mageia’s server from datas center in France to data centers located in more seriously countries in EU

  7. Marc Paré says:

    Thanks for keeping us appraised of the downtime issues and thanks to all of you who are working hard on getting the servers back online.

    I would like to suggest that once the servers are back online, that someone post a documented timeline leading to the breakdown; then the timeline afterwards of the attempts to fix the issue; then, followed with a suggestion list of the possible fixes preventing this from happening again. There should obviously be some failsafe in case the datacentre shuts down again suddenly.

    This will keep the membership well-informed of the issues and perhaps convince some to donate to the organization to help fund the needed fixes.

    Marc
    marc.pare@parentreprise.com

    • marja says:

      It would be very nice if we could do that, but I doubt it will happen.
      For such a blog post, we need more people in Sysadmin Team (of course) and in Atelier Team. One of the responsibilities of Atelier Team is to write blog posts. You may have noticed that we have a problem writing enough of them. It would be great if a native English speaker, with enough time to communicate with other teams, would join us to help with this blog.
      Sysadmin Team is short on people, too, but at least three sysadmin candidates are starting to be trained.

      • Don says:

        Marja,

        What is the process of joining the Atelier team? In case someone was interested in helping with those blog posts?

        • marja says:

          Hi Don,

          When all of mageia.org is back online, then creating an account on identity.mageia.org would be mandatory, unless it already exists.
          Then it would be needed to become a member of the atelier-discuss mailing list and to send an introductory mail, containing some personal information (including being a native English speaker, if that is the case), of course only what the new volunteer wants to share, and mentioning the interest in helping with the blog. Subscribing to a mailing list can be done on ml.mageia.org (also when all of mageia.org is back).

      • Marc Paré says:

        I will take a close look at my availability this weekend, but think I will be able to help out with blog translations.

        Contact me directly if you need anything in a hurry in EN.

        I have been with Mageia since the start (Mandrake). Mandriva drives my business computers as well as many of my clients.

        Marc
        marc.pare@parentreprise.com

        • marja says:

          Thanks for the offer, Marc.

          If our servers are still down when you read this, and you see any spelling or grammar errors in the current blog post, then please inform me. Note that both British English and American English are deemed correct. Edit: and errors with regard to the meaning of words, too, of course.

  8. Peter Lawford says:

    All my congratulations and gratitude to those who have hard worked to recover so quickly the situation; I daily use mageia for my business and it is for me a work tool;
    unfortunately my knowledges in sysadmin are inadequate to participate at maintening
    mageia; once more thank you very much

  9. Peter Lawford says:

    october7, 2023 at 4:54 pm: can’t access (403 forbidden) to mirrors list: http://mirrors.mageia.org/distrib

    • katnatek says:

      The mirrorlist is in other server, i don’t know how they solve the issue but from drakrpm-editmedia you can add the repositories, so I guess the same is valid for fresh installs, just I can’t select a specific mirror in the direct way

  10. I’d like to see IoT being used to sync everything in case one server or one computer goes down, that would mean that the website or the svn never will be unreachable.

  11. Peter Lawford says:

    October 8, 2023 at 6:01 pm
    impossible access to:
    bugzilla: https://bugs.mageia.org
    mirrors list: http://mirrors.mageia.org/distrib
    wiki: https://wiki.mageia.org

    • marja says:

      No one has been to the Data Center yet. The last message we had from the DC was on Friday, that it would take up to two days before they would recover from the problems. After that we heard nothing (probably because it is weekend).

  12. Wladimir says:

    wiki and bugzilla not accessible

    • marja says:

      You’re right. When the cooling problem of the data center is completely fixed, then a volunteer can go there to check our servers and start them.

  13. Pingback: Panne générale de nos serveurs – from a kde POV

  14. Paul Blackburn says:

    Thank you Marja for keeping us up-to-date.

    It is now about 17:30 (UK) and it looks like all the mageia servers are up and running again.

    Bedankt. merci, thank you, grazie, danke, to everyone who helped to get the servers recovered.

  15. Brian R. says:

    Thank you Sysadmin and Atelier teams for your work during the data center outage. I’ve been through a couple of them and they are nightmares. Much appreciated and a big sigh of relief to see Mageia back online.

    • marja says:

      We are very relieved, too. It happens too often, that after such an incident drives need to be replaced, because the heat made them unusable. Fortunately, that did not happen to us. However, not all organizations using the same DC were so lucky.

      • Andrew Piubellini says:

        Out of interest, if this incident (hypothetically) *did* cause the drives in Mageia’s servers to fail, do you have backups that you could have restored from?

        If not, that’s something that Mageia would want to invest in, as soon as possible.

        • neoclust says:

          We got no issues with our harddrives, we lost NOT infos etc.

          We are ( the sysadmin team ) working on renewal the mageia hardware.

          stay tuned 🙂