Case Study

Facebook October 2021: 6 hours, $100M+ revenue, $60B market cap

On Monday 4 October 2021, a routine BGP maintenance command inadvertently withdrew all of Facebook's route advertisements from the global internet. The result was approximately six hours during which Facebook, Instagram, WhatsApp, Messenger, Workplace, and Oculus were unreachable globally. The cascade through Facebook's own internal systems substantially complicated the recovery, including a widely reported failure of physical badge access at the affected data centers. Direct ad revenue loss was estimated above $100 million; one-day market cap dropped roughly $60 billion.

Timeline

What happened, hour by hour

TimeEvent
11:39 ET (15:39 UTC)Routine maintenance command withdraws all Facebook BGP routes
~11:40 ETGlobal internet cannot reach Facebook DNS; Facebook, Instagram, WhatsApp, Messenger all unreachable
12:00 ETExternal monitoring widely picks up the outage; Cloudflare and others publish analyses
12:30 to 17:00Facebook engineers attempt remote recovery, blocked by their own tooling depending on now-unreachable services
17:00 to 17:50On-site engineers physically reach the affected routers and restore BGP advertisements
17:50 ETServices begin restoration; full recovery takes additional time as caches refresh
Following daysTelegram reports 70M new signups; Signal reports temporary growth spike

Timeline from Facebook's official post-incident write-up and contemporaneous reporting from Cloudflare.

Root Cause

The BGP withdrawal and the recovery cascade

During a routine maintenance operation, an engineer issued a command intended to assess global backbone capacity. A bug in the audit tooling that should have validated the command failed to catch a misconfigured variant of it. The command propagated and withdrew all of Facebook's BGP route advertisements from the global routing table. With no routes advertised, the global internet could not reach any of Facebook's DNS servers, which meant DNS resolution for facebook.com, instagram.com, whatsapp.com, and all other Facebook-controlled domains failed worldwide.

The recovery would have been straightforward in principle: re-advertise the routes. In practice, Facebook's own internal communication and operational tooling ran on the same network as the customer-facing services, so engineers could not reach the affected routers remotely. The badge access system that controlled physical entry to the data center buildings, also running on the same internal network, was reportedly affected, complicating physical access for the on-site response. Recovery required physically reaching the affected routers and restoring the route advertisements manually, which took several hours longer than the underlying technical problem would have required.

The cascade is the canonical example of why critical operational tooling should not run on the same infrastructure it manages. Modern operational guidance from Google's SRE community and others has long warned against this pattern. The Facebook incident is the most-cited illustration of why.

Economic Impact

Direct revenue, market cap, and competitive growth

Cost lineAmountNote
Direct ad revenue (Facebook + Instagram)~$100M+Approximately $13M per hour averaged, higher at peak
WhatsApp Business API revenueSmaller but non-zeroEnterprise customers used SMS or alternates
Oculus and Workplace revenueSmallerSubscription model, mostly deferrable
Share-price impact (one-day)~$60B market cap dropApproximately 4.9% drop on 4 October 2021
Customer-acquisition for competitorsTelegram +70M usersTelegram CEO post; Signal also grew temporarily
Reputation tailEstimated 30 to 90 day spend on regaining narrativeCoincided with whistleblower disclosures

The direct revenue figure derives from Facebook's ad-revenue run rate at the time (approximately $28 billion in Q3 2021, which annualises to ~$112B or ~$13M per hour averaged across all hours). Peak hours run higher; the affected window included US morning and European afternoon, both peak ad-served minutes. Estimates of direct ad-revenue loss above $100 million for the ~6 hour window are conservative.

The share-price impact is harder to attribute cleanly because the outage coincided with the Wall Street Journal's "Facebook Files" whistleblower disclosures, which were dominating financial-press coverage of Meta during the same week. The ~5% one-day drop and ~$60B market cap loss on 4 October 2021 was almost certainly amplified by the existing negative news flow. Most of the share-price drop reversed within several weeks, but the broader 2021-2022 share-price weakness in Meta partly traces back to this period of compounding negative narrative.

Competitive Impact

Telegram, Signal, and the migration moment

Telegram CEO Pavel Durov publicly disclosed that the company added 70 million new users on 4 October 2021, an unusually large single-day signup figure attributed primarily to WhatsApp being unavailable globally. Signal also reported a temporary signup spike, though smaller. Most of these new users remained active to some degree, with Telegram's headline MAU figure stepping up materially in the months following.

The competitive-migration cost is a line that does not appear in standard outage cost models. For a customer-facing service operating in a competitive market, an extended outage is a discrete moment when users actively seek and try alternatives. Most return after the original service is restored, but a meaningful fraction (typically 5 to 15% of those who tried an alternative) become long-term users of the alternative. For WhatsApp, with a customer base in the low billions, even a 1% defection to Telegram or Signal represents tens of millions of users. The lifetime-value loss from this defection materialises slowly and is rarely calculated, but it is real.

Lessons

What other operators learned

Out-of-band operational tooling

Critical operational tooling (incident communication, remote access, recovery procedures) must not depend on the infrastructure it manages. The Facebook case made this the canonical example, and many large operators now run operational tooling on physically and logically separate networks.

BGP-change validation

Audit tooling that validates BGP changes before they propagate is now standard at large network operators. RPKI (Resource Public Key Infrastructure) adoption accelerated meaningfully in 2022 partly in response to incidents of this class.

Physical access independence

Badge-access systems should not depend on the network they protect. Many operators reviewed and rebuilt physical-access systems on independent infrastructure following the October 2021 incident.

Competitive-migration risk

The Telegram +70M number is the most-cited illustration of competitive-migration risk during major consumer-service outages. For consumer-facing services with viable alternatives, the lifetime-value loss from outage-driven defection often exceeds the direct revenue loss.

Frequently Asked

Common Questions

What caused the Facebook outage on 4 October 2021?
Per Facebook's official summary, a routine BGP maintenance command inadvertently withdrew all of Facebook's BGP route advertisements from the global routing table. A bug in audit tooling failed to catch the misconfigured command. With no routes advertised, the global internet could not reach Facebook's DNS servers, making all Facebook properties unreachable.
How long was Facebook down on 4 October 2021?
Approximately 6 hours from BGP withdrawal at ~15:39 UTC to broad service restoration at ~21:50 UTC. Some downstream services took additional time to fully recover as caches refreshed. WhatsApp, Instagram, Messenger, Workplace, and Oculus were all affected for the same window.
How much did the Facebook October 2021 outage cost?
Direct ad-revenue loss was estimated above $100 million across the ~6 hour window. The one-day share-price drop on 4 October 2021 was approximately 4.9%, equivalent to roughly $60 billion in market capitalisation. Most of the share-price drop reversed within several weeks. Competitive-migration cost (Telegram +70M users) added a long-term lifetime-value loss that is harder to quantify.
Why did recovery take so long?
Facebook's own internal communication and operational tooling ran on the same network as the customer-facing services, so engineers could not reach the affected routers remotely. Badge-access systems at the data centers were reportedly also affected, complicating physical access. Recovery required physically reaching the affected routers and restoring route advertisements manually, which took several hours longer than the underlying technical problem would have required.
How did the outage affect WhatsApp specifically?
WhatsApp was unavailable globally for the full ~6 hour window. In countries where WhatsApp functions as primary messaging infrastructure (much of Latin America, Africa, parts of Asia and Europe), the disruption was substantial. Signal and Telegram both saw signup spikes, with Telegram CEO Pavel Durov disclosing 70 million new users on 4 October 2021 attributed primarily to the WhatsApp outage.
What is the lesson from the Facebook October 2021 outage?
Critical operational tooling must not depend on the infrastructure it manages. The 6-hour acute outage included several hours of recovery delay attributable to Facebook's own remote-access tooling being unreachable. Modern operational guidance has long warned against this pattern; the Facebook incident is the canonical illustration. RPKI adoption and physical-access independence reviews accelerated meaningfully across the industry following the incident.

Related

Updated 2026-04-27