Lessons From The Global Windows Outage

Cricket Liu, Chief DNS Architect at Infoblox, emphasises that the recent global Windows outage caused by a CrowdStrike bug underscores the significant risk of running DNS and DHCP services on Windows Servers, advocating for a transition to more resilient, dedicated infrastructures.

Cricket Liu, Chief DNS Architect at Infoblox

The massive, worldwide outage of Windows computers caused by a bug in CrowdStrike software underscores a lesson we should all take to heart: You shouldn’t run critical network services (such as DNS and DHCP) on Windows Servers.

On July 18, a bug in a software update from CrowdStrike inadvertently caused widespread system crashes on Windows computers, disrupting the operations of airlines, retail chains, and many others. Despite the fact CrowdStrike quickly withdrew the update, the damage had already been done, and recovering from the resulting outage will take those impacted hours, if not days. The impact of the global Windows outage was massive, but it was amplified because mission-critical network services in many organizations are running on Windows, and this cascaded into network service outages, extending recovery times.

Most organizations spend millions creating robust network infrastructure that prevents any single networking device’s failure from impacting the company’s operations. However, the operations of all your company’s network devices are dependent on critical network services, such as DNS and DHCP. Window Servers are not the appropriate place to host these network services. Windows Servers should be focused on their critical role supporting identity (Active Directory) services. While the Windows outage caused by the CrowdStrike incident was unusual in its global scale, Windows Server failures are a far-too-common source of network outages. In addition, Windows Servers are a frequent source of vulnerabilities, resulting in a need for constant patching. Their vulnerability makes them a favorite target of attackers, too. For instance, several recent ransomware incidents involved attacks on Windows Servers and resulted in enterprise-wide disruption of networks, which made incident response much harder and the impact and cost of the incidents much larger.

Running critical network services on Windows Servers increases the likelihood of a failure of DNS and DHCP, and such a failure can disable the rest of your infrastructure, the part not already affected by the original vulnerability. We strongly recommend that organizations run DNS and DHCP on infrastructure separate from their Windows infrastructure and not subject to its vulnerabilities. Dedicated DNS and DHCP servers, running on operating systems other than Windows and hardened against attack, are the best defense against an outage like this occurring again.