What We Can Learn from the 2024 CrowdStrike Outage
Published 07/03/2025
CSA’s Top Threats to Cloud Computing Deep Dive 2025 reflects on eight recent real-world security breaches. The report presents the narrative of each incident, as well as the relevant cloud security risks and mitigations. Today we’re reflecting on the third incident covered in the Deep Dive: CrowdStrike 2024.
The CrowdStrike outage in July 2024 exposed how much the world depends on centralized security solutions. It highlighted the risk of single points of failure in endpoint protection. With an 18% global market share, numerous companies found themselves impacted directly or through their supply chain. An outage in Microsoft systems, servers, and many reliant services was especially problematic.
Many of the companies affected by the outage see CrowdStrike as the incident's threat actor. However, they often overlook the criminals who exploited the ensuing confusion and panic. Many criminals launched phishing attacks and malware disguised as legitimate CrowdStrike software updates.
This outage revealed issues with process management, testing, third-party security assessments, risk evaluations, and incident response planning. Many of the Top Threats identified by CSA apply.
Both CrowdStrike and its customers failed to conduct adequate change management testing (Top Threat #1: Inadequate Change Control). This underscores the importance of using multiple types of software testing and safeguards (Top Threat #6: Insecure Software Development).
Many customers did follow best practices. Namely, keeping the latest revision in QA testing and rolling out production versions one revision behind. However, even these customers overlooked critical system components (Top Threat #3: Insecure Interfaces and APIs). The immediate deployment of faulty definition files across all Falcon endpoint agents compounded the issue.
Organizations lacked adequate risk assessments and supply chain mapping (Top Threat #4: Inadequate Cloud Security Strategy). This would help them proactively identify risks and implement the associated safeguards.
CrowdStrike did publish guidance on the first day of the outage. However, incident response plans lacked critical capabilities. Many customers struggled to implement remediation steps. A robust incident response plan should include physical hardware access planning, along with governance, shared responsibility, and visibility.
Technical Impacts
- Confidentiality: The CrowdStrike incident did not directly contribute to confidentiality failures. No one publicized any instances of data exposure.
- Integrity: The CrowdStrike outage included numerous accounts of failed recoveries and corrupted backups. To restore functionality, affected systems required manual intervention, such as booting into safe mode to delete specific configuration files. Additionally, devices protected with BitLocker encryption required entering a unique 48-digit BitLocker recovery key for each device.
- Availability: Loss of availability was far and away the biggest lesson reinforced by the headline-grabbing Delta Airlines. While CrowdStrike produced a fix for the situation within a day, Delta dealt with the aftermath for weeks.
Business Impacts
- Financial: The losses associated with the outage were staggering. CrowdStrike reported 3rd quarter losses as $16.82 million USD. CrowdStrike stock losses amounted to a 45% drop over the 18 days following the outage. Fortune estimates the Fortune 500 impact included $5.4 billion USD in direct losses.
- Operational: CrowdStrike identified the issue and released a fix on the same day. However, the need for manual intervention on many affected computers created extended outages.
- Compliance: There were no reports of compliance fines for the incident.
- Reputational: Global and negative coverage across media platforms was prominent and critical. Mainstream media such as Forbes, AP News, and many local news outlets more directly covered the incident. The stock price rebounded within four months, reaching all time highs shortly thereafter.
Preventive Mitigation
- Quality Control: Follow a defined quality change control process with established baselines and system testing standards. CrowdStrike should have implemented better QA processes within their release process. Staged rollouts and rollback mechanisms could have detected faulty updates before deployment.
- Change Management Technology: Manage the risks associated with applying changes to organization assets. Customers with a structured change management process with automated rollback could have minimized the spread of the faulty update.
- SSRM Supply Chain: Document and manage the Shared Security Responsibility Model throughout the supply chain. Organizations using Falcon Sensor should have conducted regular security reviews and contingency planning for vendor failures.
- Application Security Test Automation: Implement a strategy for automated tests. Include criteria for acceptance of new information systems, upgrades, and versions. Provide application security assurance and maintain compliance. Implementing gradual, phased rollouts instead of immediate global deployment of definition files would have reduced the overall impact.
- Equipment Redundancy: Supplement business-critical equipment with redundant equipment. Locate this equipment at a reasonable minimum distance in accordance with industry standards. Organizations should have had a tested disaster recovery plan to quickly switch to alternative endpoint protection.
Detective Mitigation
- Detection of Baseline Deviation: In case changes deviate from the baseline, implement detection measures with proactive notification. Real-time monitoring of Falcon Sensor definition file updates could have triggered an alert when CrowdStrike pushed the update.
- Security Monitoring and Alerting: Identify and monitor security-related events within applications. Define and implement a system to generate alerts based on such events. Endpoint behavioral monitoring could have flagged the unexpected system crashes immediately, enabling faster rollback.
- Incident Response Metrics: Establish and monitor information security metrics. Monitoring security metrics, including the CrowdStrike agents offline, could have detected sensor updates prior to complete distribution.
- Vulnerability Prioritization: Use a risk-based model for effective prioritization of vulnerability remediation. Threat intelligence feeds that track vendor software stability could have warned organizations about issues with Falcon Sensor updates.
Corrective Mitigation
- Remediation: Establish and maintain a risk-based corrective action plan to remediate incident findings and lessons learned. Review and report remediation status to relevant team members.
- Incident Response Plans: Establish and maintain a security incident response plan. Include relevant internal departments, impacted cloud service customers, and other business critical relationships. Organizations should have had pre-approved frameworks for quickly engaging third-party vendors (CrowdStrike in this case). That way, they could have coordinated remediation.
- Supply Chain Service Agreement Compliance: Implement policies requiring all cloud service providers to comply with security, privacy, audit, and service level requirements. Organizations maintaining contracts with multiple vendors can switch security providers in case of an outage. (Though this might be a stretch for endpoint detection and response.)
- Remediation Schedule: Define and evaluate procedures to enable both scheduled and emergency responses to vulnerability identifications. Organizations could have enforced more robust patch rollback procedures to quickly disable the problematic update.
- Response Plan Exercise: Exercise the disaster response plan annually (or upon significant changes). Ensure disaster recovery drills include third-party software failures. Organizations with regular security software failure simulations would have responded faster to the outage.
Key Takeaways from This Incident
- Understand the third-party supply chain risks associated with cloud shared responsibilities models. Take steps to limit exposure where your control is limited but outcomes can potentially be disastrous.
- Consider staggered rollouts or critical infrastructure exceptions.
- While immediate patching for zero day vulnerabilities can quell an actively exploited vulnerability, quality assurance testing often pays significant benefits.
- Contracts may be the only enforceable method for correcting harm created by suppliers. Include legal teams to review implications or draft language for SLAs and breach of contract.
Interested in reading about other recent cyber incidents? CSA’s Top Threats to Cloud Computing Deep Dive 2025 analyzes seven other notable cloud breach cases. Get a detailed breakdown of the Snowflake, Football Australia, Toyota, DarkBeam, Retool/Fortress, FTX, and Microsoft incidents. This breakdown includes:
- An attack detail
- A description of the threat actor
- The associated top threats
- The technical and business impacts
- Relevant Cloud Controls Matrix (CCM) controls to use for preventive, detective, and corrective mitigation
- Essential metrics to measure control effectiveness
- Key takeaways
Unlock Cloud Security Insights
Subscribe to our newsletter for the latest expert trends and updates
Related Articles:
Why Businesses are Unprepared for the Next Wave of AI Scams
Published: 07/25/2025
Reflecting on the 2023 Toyota Data Breach
Published: 07/21/2025
A Copilot Studio Story 2: When AIjacking Leads to Full Data Exfiltration
Published: 07/16/2025
Zero Trust Lessons from a Real-World 5G Cloud Core Security Assessment
Published: 07/14/2025