In-Depth Technical Document on the CrowdStrike BSOD Incident
Introduction
On July 19, 2024, a critical incident involving CrowdStrike's Falcon® sensor update led to widespread Blue Screen of Death (BSOD) issues on Windows systems globally. This document provides an in-depth analysis of the incident, including historical context, technical breakdown, remediation steps, mapping to the NIS 2 Directive, and recommendations for future prevention.
Historical Context
In 2010, George Kurtz, then CTO of McAfee, faced a significant issue where a faulty update caused McAfee to misidentify the critical system file svchost.exe
as a virus. This misidentification led to widespread crashes on Windows XP systems, resulting in the infamous Blue Screen of Death (BSOD). This incident significantly impacted McAfee, contributing to its subsequent sale to Intel.
Fourteen years later, in 2024, George Kurtz, now CEO of CrowdStrike, faced a similar challenge. A CrowdStrike Falcon® sensor update caused Windows systems to crash globally, reviving the dreaded BSOD. This recurrence underlined the importance of rigorous update testing and robust incident response mechanisms.
Incident Summary
On the morning of July 19, 2024, many Microsoft Windows users around the world experienced the Blue Screen of Death (BSOD) following a new update from CrowdStrike. The issue stemmed from a logic error in the CrowdStrike Falcon sensor update, leading to a null pointer dereference in the csagent.sys
driver. This caused Windows hosts running Falcon sensor versions 7.15 and 7.16 to crash.
The BSOD affected various critical sectors, including telecommunications, banking, airlines, railways, supermarkets, hospitals, and news networks, highlighting the widespread impact of such software failures.
Insider Stock Sale
In addition to the technical details, it's important to note that CrowdStrike's Chief Security Officer, Shawn Henry, sold 4,000 shares of CrowdStrike stock on July 15, 2024, just days before the incident. This sale totaled approximately $1.485 million, at an average price of $371.32 per share. The sale was part of a prearranged 10b5-1 trading plan established on December 20, 2023, which is designed to prevent insider trading by allowing insiders to set up predetermined plans for selling stock. Following the transaction, Henry still owns 183,091 shares, including unvested restricted stock units.
The proximity of this sale to the subsequent IT outage has raised questions and scrutiny from regulators and shareholders, despite the transaction being part of a prearranged plan. The incident and the timing of the stock sale have led to high volatility in CrowdStrike’s stock, which saw significant drops in value following the outage.
https://www.barrons.com/articles/crowdstrike-insiders-sold-stock-cac5e509
Technical Breakdown
- Faulty Update Deployment:
- Date and Time: July 19, 2024, at 04:09 UTC.
- Action: Sensor configuration update to Windows systems.
- Result: Logic error causing system crashes and BSOD.
- Resolution: Update remediated on July 19, 2024, at 05:27 UTC.
- Note: Not related to a cyberattack.
- Affected Systems:
- System Versions: Falcon sensor for Windows versions 7.11 and above.
- Impact Duration: Online between 04:09 UTC and 05:27 UTC.
- Symptom: System crash, Blue Screen of Death (BSOD).
- Configuration File Primer:
- Channel Files: Part of Falcon’s behavioral protection mechanisms, updated several times a day.
- Directory: Located at
C:\Windows\System32\drivers\CrowdStrike
. - Naming: Files start with “C-”.
- Technical Details:
- Channel File 291:
- Filename: Starts with “C-00000291-” and ends with
.sys
. - Role: Controls named pipe execution evaluation.
- Error: Targeted malicious named pipes used in cyberattacks, resulting in OS crash.
- Filename: Starts with “C-00000291-” and ends with
- Logic Error: Corrected by updating content. No further changes beyond updated logic.
- Channel File 291:
- Correction:
- Action: Logic error corrected by updating content.
- Outcome: No further changes required. The file continues to protect against named pipe abuse.
- Clarification:
- Not Related to: Null bytes within Channel File 291 or any other Channel File.
- Information: Available on the blog and Support Portal.
Remediation Steps
- Booting into Safe Mode or Windows Recovery Environment:
- Restart the computer and press F8 before Windows loads.
- Select Safe Mode or Windows Recovery Environment.
- Deleting the Faulty Driver File:
- Navigate to
C:\Windows\System32\drivers\CrowdStrike
. - Delete the file matching
C-00000291*.sys
. - Reboot normally.
- Navigate to
- Cloud Environments:
- AWS: Detach and attach the EBS volume to a new EC2 instance, delete the faulty driver file, and reattach the volume.
- Azure: Use Azure CLI to create a rescue VM, run mitigation scripts, and restore the fixed OS disk.
Intel vPro Remediation Steps
For IT-managed devices using Intel vPro with Intel AMT activated, IT departments can address the issue to minimize additional downtime:
- Preparation and Access:
- Access Intel Endpoint Management Assistant (Intel EMA) to find the affected device.
- Use the Hardware Manageability tab to connect to the device using KVM.
- Access Recovery Mode:
- Enter the Recovery mode on Windows OS via Intel EMA.
- Navigate to Troubleshoot > Advanced Options.
- Select Command Prompt.
- Handling BitLocker:
- If BitLocker is enabled, enter the Recovery Key.
- Obtain the BitLocker Recovery Key from your Microsoft Profile at myaccount.microsoft.com.
- Delete the Faulty Driver File:
- Restart the Device:
- Alternatively, use Intel EMA to send the Force Reset command.
- Verification:
- Verify that the system is functioning correctly and the BSOD issue is resolved.
Restart using the command line:
shutdown /r
Execute the command to delete the faulty driver file:
del C:\Windows\System32\drivers\Crowdstrike\c-00000291*.sys
NIS 2 Directive Mapping
The incident underscores the critical need for compliance with the NIS 2 Directive to enhance cybersecurity resilience. Below is the mapping of the incident response to the NIS 2 Directive:
- Risk Management Measures (Article 21.1):
- Ensuring appropriate and proportionate measures to manage cybersecurity risks.
- Addressing and mitigating risks associated with software updates.
- Specific Measures (Article 21.2):
- Risk Analysis and Information System Security (21.2(a)):
- Importance of strong risk analysis and security policies.
- Incident Handling (21.2(b)):
- Effective incident handling demonstrated by CrowdStrike’s swift response.
- Business Continuity (21.2(c)):
- Highlighted the need for robust business continuity plans, including backup management and disaster recovery.
- Supply Chain Security (21.2(d)):
- Emphasized monitoring updates from suppliers and managing third-party vendors.
- Secure Development and Maintenance (21.2(e)):
- Necessity of secure development practices to prevent such issues.
- Assessing Effectiveness of Cybersecurity Measures (21.2(f)):
- Reviewing and updating procedures following the incident.
- Basic Cyber Hygiene and Training (21.2(g)):
- Ensuring personnel are trained for efficient incident response.
- Cryptography and Encryption (21.2(h)):
- Managing encryption keys, especially when accessing Safe Mode with Bitlocker enabled.
- Access Control Policies (21.2(i)):
- Ensuring only authorized personnel can deploy updates.
- Secure Communication (21.2(j)):
- Maintaining secure communication channels for incident response.
- Risk Analysis and Information System Security (21.2(a)):
- Supply Chain Security (Article 21.3):
- Considering vulnerabilities of suppliers and service providers.
- Ensuring suppliers maintain high security standards.
- Corrective Measures (Article 21.4):
- Taking necessary corrective measures without undue delay.
- Quick identification and rollback of the faulty update by CrowdStrike.
- Coordinated Security Risk Assessments (Article 22):
- Ensuring coordinated risk assessments of critical supply chains.
- Emphasizing the need for coordinated risk assessments.
- European Cyber Crises Liaison Organization Network (EU-CyCLONe) (Article 23):
- Establishing a network for coordinated management of large-scale cybersecurity incidents.
- Demonstrating the importance of having a coordinated network like EU-CyCLONe.
Recommendations for Organizations
- Enhanced Testing and Monitoring:
- Implement automated testing and robust monitoring solutions to detect issues early.
- Regularly update and test disaster recovery plans.
- Strong Governance Practices:
- Establish clear guidelines and accountability for software updates.
- Develop comprehensive change control processes with rollback plans.
- Diverse Cybersecurity Infrastructure:
- Avoid relying solely on a single vendor for cybersecurity solutions.
- Regularly assess third-party providers and MSSPs.
- Efficient Rollback Procedures:
- Develop automated rollback mechanisms and version control.
- Ensure all updates are thoroughly tested in controlled environments before deployment.
- Comprehensive Incident Response Plans:
- Create detailed response plans and conduct regular drills.
- Ensure prompt communication with all stakeholders during incidents.
- Cross-border Cooperation:
- Foster information sharing and collaboration among organizations in different countries.
- Participate in coordinated risk assessments and incident response initiatives.
Aftermath at Airports
Atlanta airport is jam-packed. Mothers w/ small children on the floor in the terminal. Line for customer service is 1/2 mile long. No rental cars. Few hotel options. People’ve been stuck here 3 days.
— Cole T. Lyle (@ctlyle1) July 22, 2024
If there weren’t bigger national news- @Delta CEO would be dragged to testify. pic.twitter.com/tGSRGyxYG5
Delta has completely dropped the ball on this.
— Multifamily Madness (@MultifamilyMad) July 22, 2024
My family and I have been stranded at Atlanta airport for 4 days now.
0 car rentals
0 hotels (was able to get one through a rewards program 25 minutes from the airport for tonight)
0 available flights
4-5 hours in line to even… pic.twitter.com/O6GRssOrdk
DTW airport #delta terminal screens and delta help desk lines. pic.twitter.com/fBm5JgRqs0
— Candy DS💙💛〽️ (@CSchachat) July 19, 2024
This is the line at Orlando Airport MCO to speak with @Delta. There is one employee working the desk!
— MNConservative🇺🇸⭐️ (@RealJMPeterman) July 22, 2024
I asked someone halfway down the line, and they had been in line for 3 hours! pic.twitter.com/8BBCxwSwWA
Conclusion
The CrowdStrike BSOD incident underscores the urgent need for robust cybersecurity practices and compliance with the NIS 2 Directive. This incident highlights the importance of rigorous testing, strong governance, and effective incident response mechanisms. Organizations must prioritize these areas to better prevent and respond to cybersecurity threats, ensuring operational continuity and maintaining stakeholder trust in an increasingly interconnected digital landscape. The lessons learned from this event serve as a crucial reminder of the potential impact of software updates and the importance of proactive security measures.