Essential Case Studies: Because those who have not read history are doomed to repeat it.
- 1985-1987: Therac 25 (Summary video | Wikipedia | Paper)
- 1992: Patriot Missile Timekeeping (Wikipedia | Report)
- 1996: Ariane 5 flight 501 (Wikipedia | Report)
- 1999: Mars Climate Orbiter (Wikipedia | Report)
- 2002-2010: Toyota UA (Slides | Full Video | Wikipedia | Long Read)
- 2007: F-22 Date Line Bug (News | Wikipedia)
- 2009: Air France Flight 447 (Wikipedia | Report)
- 2010: Stuxnet (security) (Wired | Wikipedia | Long Read)
- 2011: Wenzhou Train Crash (News | Wikipedia)
- 2013: Asiana Flight 214 (Wikipedia)
- 2015: Seville A400M Crash (News | Wikipedia)
- 2015: Ukraine Power Outage (security) (News | CERT)
- 2017: USS John S. McCain (Propublica | Wikipedia)
- 2018: Uber Testing Fatality (News | NTSB report | NTSB Hearing)
- 2019: Boeing 737 Max/MCAS (News | FAA Findings | FAA Timeline | PBS video | Wikipedia)
- 2021: Horizon IT scandal (News | Journal paper | Wikipedia | Computerphile | News Summary)
- 2023: Cruise pedestrian dragging mishap (Wikipedia | Detailed Summary)
Additional Case Studies:
- 1990: Hubble Telescope mirror flaw (NASA | BBC)
- 1997: Mars Pathfinder ("What Really Happened on Mars")
- 1997: USS Yorktown Smart Ship failure (Wired)
- 2004: Spirit Mars Rover file system (Wikipedia)
- 2005: V-22 Osprey Crashes (Wired | Wikipedia)
- 2016: Schiaparelli Mars Lander crash (Wikipedia)
- 2014-...: Tesla Autopilot (Wikipedia)
- 2024: CrowdStrike incident (Wikipedia)
- See also: https://en.wikipedia.org/wiki/List_of_software_bugs
Recommended Supplemental Materials
- Carnegie Mellon University course 18-642 on embedded software code quality, safety, security (Koopman)
- Other reading/viewing
- Discussion of aircraft human/computer interaction issues with automation (New Yorker article)
- They Write the Right Stuff: software for grown-ups 1996 (FastCompany)
- Leveson (Paper Index)
- Software Safety: why, what and how 1986 (CiteSeer)
(longer version: 1995 Safeware Book) - High-Pressure Steam Engines and Computer Software
- Engineering a Safer World (link for free PDF download)
- Read safety standards for yourself (only freely available resources included here):
- IEC 61508 functional safety standard (older version, but still representative for learning purposes) (Wikipedia | 2005 Full Text)
- UL 4600 autonomous product system safety standard (2019 Voting Draft | Free digital view of current version)
- US DoD MIL-STD-882E Change 1 System Safety (DLA)
- UK MoD Defstan 00-55 and 00-56 are available via free account on: StanMIS (web searches will often turn up cached copies)
- US FDA Medical Device Software (Principles | Submissions)
- There are plenty of other safety standards relevant to important domains. But we're only listing the ones that you can read without paying hundreds of dollars for access.
Other Case Studies: (Still important, and should be read by anyone digging deep into safety. But less specifically related to computer-based system risks.)
- 1814: London Beer Flood (Wikipedia)
- 1860: Pemberton Mill (Wikipedia)
- 1889: Johnstown Flood (Wikipedia)
- 1903: Iroquois Theater Fire (Wikipedia)
- 1912: RMS Titanic (Wikipedia)
- 1919: Great Molasses Flood (Wikipedia)
- 1937: LZ 129 Hindenburg (Wikipedia)
- 1952: London Great Smog (Wikipedia | Video)
- 1957: Windscale Fire (Wikipedia)
- 1959: Minamata Disease (Wikipedia)
- 1940: Tacoma Narrows bridge (Wikipedia)
- 1963: USS Thresher (SSN-593) (Inquiry transcripts (Part 1 / Part 2) | Release of Records Story | Wikipedia | Subsafe | Webinar)
- 1966: Aberfan Spoil Tip Collapse (Wikipedia)
- 1972: DC10 Cargo Door/ AA Flight 96 (Wikipedia)
- 1973: PBB contamination in Michigan (Wikipedia | Science)
- 1974: Flixborough Disaster (Wikipedia | Video | 50th Anniversary Video)
- 1976: Seveso Disaster (Wikipedia)
- 1977: Tenerife Airport (Wikipedia)
- 1979: Three Mile Island (Wikipedia | Report | Lecture | NRC Backgrounder)
- 1980: Damascus Titan missile explosion (Wikipedia)
- 1981: Ford Pinto (Wikipedia | Museum Summary with video, Writeup)
- 1981: Kansas City Hotel Walkway Collapse (Wikipedia)
- 1982: Speedbird 9 (Wikipedia)
- 1983: Gimli Glider (Wikipedia)
- 1984: Bhopal (Wikipedia)
- 1986: Challenger Space Shuttle (IEEE Spectrum | Wikipedia | Report)
- 1986: Chernobyl (Wikipedia | INSAG Report | WHO Web Page | Physics Video)
- 1988: Piper Alpha (Wikipedia | Report part 1, part 2 | Video)
- 1991: Lauda Air Flight 004 (Wikipedia) -- said to have prompted the DO-178B update.
- 1993: Lufthansa Flight 2904 (Wikipedia)
- 1998: Eschede derailment (Wikipedia)
- 2000: Air France 4590 Concorde (Wikipedia)
- 2001: Petrobras P-36 (Mishap Report | NASA Case Study)
- 2003: Columbia Space Shuttle (Wikipedia | Report)
- 2006: Nimrod MR2 (Wikipedia | Report)
- 2008: B-2 Spirit Crash (Wikipedia | Investigation)
- 2008: British Air 38 (Wikipedia)
- 2010: Deepwater Horizon (Wikipedia | Reports)
- 2011: Fukushima Daiichi (Wikipedia | Report Summary | IEEE Spectrum)
- 2012: Knight Capital (Wikipedia)
- 2013: Takata airbags (Wikipedia)
- 2014: GM ignition switch (Wikipedia | Long-Read | Valukas Report)
- 2015: Volkswagen emissions scandal (Wikipedia)
- 2014-2019: Flint MI Water Crisis (Wikipedia)
- 2018: Keyless ignition & Carbon Monoxide (NY Times)
- 2023: Oceangate Titan implosion (Wikipedia | Long-Read | Long-Read | Long-Read)
- 2023: UK NATS Air Traffic Control failure (Report)
- Wikipedia List of Industrial Disasters | List of disasters by death toll | Compendium of computer-related aviation accidents
- Safety Culture (Wikipedia | NASA Handbook)
- Mode Confusion (Wikipedia)
- Compilation of US software-related Automotive Safety Recalls (Blog)
- Spreadsheet horror stories List
- Systems Engineering Body of Knowledge on Safety Engineering (SEBoK)
- NASA Safety library (index | Safety Guidebook)
- NASA Real System Failure story collection by Kevin Driscoll (Home | slides)
- FAA System Safety Handbook (FAA)
- FAA AC 25.1309-1A - System Design and Analysis (FAA)
- USAF System Safety Handbook (USAF)
- List of NHTSA software-related automotive recalls (Blog)
- List of accidents and incidents involving commercial aircraft (Wikipedia)
- SafeComp annual conferences
- Safety Critical Systems Club
- RISKS Digest
- System Safety Mailing List
- Journal of System Safety (open access)
- Safety of Work podcast (Rae & Provan) (Podcast)
- Fundamentals of Dependable Computing (John Knight)
- Better Embedded System Software Blog (Koopman)
- Safe Autonomy Blog (Koopman)
- Security Engineering (Ross Anderson)
- Marsh 100 Largest Losses in the Hydrocarbon Industry 1974-2019
- List of freely available reference books by Stephen Thomas
- To Engineer is Human, Petroski, 1992.
- Design of Everyday Things, Norman, 2013, Chapter 5: "Human Error? No, Bad Design"
Advanced Specialty Topics/Research & Papers I Personally Recommend:
- Computer System Diversity, Independence, and Bootstrapping Safety (Lorenzo Strigini)
- Radiation-induced upsets. (YouTube, OK for background on the phenomenon, but has an inaccurate summary of Toyota UA findings)
- Knight, "Safety Standards -- a New Approach" about the equity issues with safety standards that need to be resolved: https://scsc.uk/r126/1:1
- Bainbridge, "Ironies of Automation," 1983. https://www.sciencedirect.com/science/article/abs/pii/0005109883900468 (https://en.wikipedia.org/wiki/Ironies_of_Automation)
- Elish, "Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction," 2019, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2757236
NOTE: While Wikipedia is not always an authoritative source, for these sorts of events it tends to present useful summary descriptions.
If you think something important is missing, let me know!
Last update 11/23/2024
If you think something important is missing, let me know!
Last update 11/23/2024
Leveson, "Engineering A Safer World", PDF download from https://mitpress.mit.edu/books/engineering-safer-world.
ReplyDeleteNeumann, "Computer-Related Risks" (based on the Risks Digest archives as of 1994).
Excellent list. While I've read most of these, seeing this list there's a few I want to add to my reading list.
ReplyDeleteThe Eschede train accident should be included under Other Mishap Case Studies, not only because it highlights the importance of proper maintenance procedures, but also the legal aftermath where officials and engineers were charged with manslaughter.
ReplyDeletehttps://en.wikipedia.org/wiki/Eschede_derailment