Turn off your servers
TL;DR Turn off your servers. Use the scream test. Address fears. Proceed to lightswitch ops.
One easy way to reduce your CO2 emissions (and save some money) is to turn off your servers! Probably not all of them and not forever but there are certainly machines that are not fully utilized. You certainly know about one or two servers standing around that no one has used in months, or machines that are used only during business hours.
The power consumption of a server running idle is still a significant part of the maximum power consumption.
source green software foundation
The scream test
Finding out if a server is still used sounds like a complicated task, especially if the owner information is outdated or missing. But Microsoft found a solution for this problem: The scream test
You can probably imagine how this works. Turn a server off where you have a suspicion, that it is never or rarely used and wait until someone comes screaming.
From the article: Microsoft describes their process as follows:
What’s the Scream Test? Well, in our case it was a multistep process:
- Display the message “Hey, is this your server, contact us?” on the sign-in splash page for two weeks.
- Restart the server once each day for two weeks to see whether someone opens a ticket (in other words, screams).
- Shut down the server for two weeks and see whether someone opens a ticket. (Again, whether they scream.)
- Retire the server, retaining the storage for a period, just in case.
Microsoft was apparently able to turn off 15% of their servers this way. This is a significant reduction in energy consumption and a big cost saving.
Servers with a schedule
In your organization, there will certainly be servers that have a fixed schedule when they are used. Many services are only needed during business hours and can be turned off outside of them. Even if these services have many 9s in their availability requirements, these might only be present during defined hours.
One thing you need to consider with such systems are batch jobs. Often they are run during the night. When you turn off the systems at that time, they obviously can not run. You need to move them into the scheduled on-time and leave enough time for them to finish. Or maybe it is even possible to run them during some low-demand hours during the day?
Also, consider possible dependencies when planning to shut down systems during off-hours. Another service or system might rely on the machine you intend to power down, potentially operating on a different schedule. For instance, a database server or an authentication system might be required by other applications that run batch jobs or process user requests asynchronously. Depending on your organization, you can just test it out (another scream test), try to map the dependencies or synchronize downtimes over all systems.
An if it doesn’t come up again?
This is a valid concern that needs to be addressed and prevents many people from starting to power down machines. Think of it this way: Would you rather discover potential issues during a controlled test on your own terms? Or would you prefer to face this discovery in an incident situation when the server is going down at the worst possible moment?
The key is to approach shutdowns systematically. Keep detailed records of which servers are being shut down, why, and when. By using a process as described in the scream test (regular restarts) you can be sure that the machines come up again.
Light switch ops
If you can implement this all this in a systematic and consistent manner, it is known as “Lightswitch Ops”. This methodology involves treating server management with the same simplicity as flipping a light switch—turning resources on and off based on actual need. However, for Lightswitch Ops to truly take off within your organization, it’s crucial to address and overcome the fears and uncertainties often associated with shutting down servers. These fears typically stem from concerns about downtime, unavailability, or operational disruptions if a server isn’t turned back on as expected.
By starting simple and expanding the process to more and more servers you can build up trust and in the end, make it a habit. Lightswitch ops should be your end goal and not the first step (in the Green Software Maturity Matrix this step comes in Level 3 and not at the beginning).
Conclusion
Turning off unused or underutilized servers is a straightforward step that can significantly reduce energy consumption and operational costs. By identifying where this practice is applicable and implementing it systematically, organizations can contribute to a more sustainable IT landscape.
Small actions, like flipping the switch on an idle server, can lead to substantial impacts. So, ask yourself: Are all your servers truly necessary right now? If not, it’s time to turn them off and take a step toward a greener future.