Safe Autonomy

Sunday, March 30, 2025

Safety Architecture & Redundancy Patterns

This is a lecture from my CMU graduate course on safety architecture and redundancy patterns.

It describes numerous patterns that involve redundancy used for fault detection (fail silent), redundancy used for availability (fail operational), and redundancy used for integrity (SIL/ASIL) isolation. It also describes combinations such as dual 2-of-2, and high-integrity doer/checker combination.

Lecture slides: https://users.ece.cmu.edu/~koopman/ece642/lectures/Xtra_SafetyArchPatterns.pdf

Youtube lecture video: https://youtu.be/QEHr8J-ByLQ

Saturday, July 20, 2024

Cyber-Hurricanes and Automotive Software: Common cause failures and the risk of widespread automotive losses

The recent CrowdStrike IT outage has focused attention on a topic rarely discussed, but fundamental to mission-critical systems: common cause failures. While this particular outage affected business and airlines more than cars, cars have been and will be affected by similar issues.

The most important thing to think about for highly dependable critical systems is ensuring a lack of common cause failures. This includes single point failures (one component that breaks brings the whole system down), as well as failures due to something shared across multiple otherwise independent subsystems or replicated systems.

We’ll talk about cars here to keep the examples concrete, but the principles apply across all types of systems. A few examples include:

A software design defect that is the same across every car is introduced by a botched update.
A third-party software component that is installed in every car is the subject of a botched update (this is the CrowdStrike situation — a security plug-in installed in massive numbers of Windows computers failed in every computer at the same time due to some sort of defective update distribution).
A novel cyberattack or social media trend emerges that might affect all cars from a particular manufacturer. (For example the Kia TikTok challenge.)
A hardware design defect that is the same across every car is activated by a rare circumstance encountered by every car. (Yes, these happen. The poster child is the Intel FDIV bug.)
A latent, known defect is not fixed in legacy systems, perhaps because they stay in operation longer than expected but are not being updated. There have been multiple rounds of this issue with GPS week rollover. And the Unix 2038 date rollover not as far away as it might seem.
A central service goes down. For example, what if nobody can start their cars because an driver monitoring system to screen for impaired drivers depends on a failed cloud service?
A central database is corrupted. For example, what if a cloud-based mapping service is contaminated with an update that corrupts information?
A requirements gap that is revealed by a novel but broadly experienced situation. For example, a city-wide power outage disables traffic lights, and a whole fleet of self-driving cars cannot handle intersections with dark traffic lights.
A natural disaster invalidates the operational scenarios used by the designers. For example, riders who have given up their cars and depend on robotaxis for transportation need to evacuate a town surrounded by wildfires — but the robotaxis have been programmed not to drive on a road surrounded by flames. With no override capability available.

You might say these are all “rare” conditions or “edge cases,” but they can’t be dismissed due to comparative rarity when people’s lives are on the line. Too many companies seem to believe in the Andrew Carnegie saying: “Put all your eggs in one basket — and then watch that basket!” But experienced safety folks know that never really works out in the long term for technical systems at scale. Defense in depth is the smart play.

Hurricanes are a rare event for any particular town. But they can do a lot of damage when they hit. In particular, a hurricane hit does not take out a single house — but tends to destroy entire towns or groups of towns. It is, in effect, a common cause source of losses for an entire town. Defense in depth does not simply bet a hurricane won’t come to town, but also buys insurance and updates the local building codes for hurricane-resistant roofs and the like to reduce losses when the hurricane eventually shows up.

Regardless of defenses, all buildings will be subject to the same problem at the same time when the hurricane shows up. The damage is not random and independent due to the common cause loss source of the hurricane raging through the town.

We can analogize this type event in the software world as a cyber-hurricane. A large number of systems suffer a loss due to some sort of common cause effect.

The insurance industry has a financial mechanism called reinsurance to spread risk around so if any one company gets an unlucky break due to selling too many policies in a town hit by a hurricane (or earthquake, flood, tornado, etc.) the risk gets spread around on a secondary market. All the companies lose, but an unlucky break probably won’t kill any particular insurance company. Even so, insurance companies might still require mitigation efforts to issue insurance, or at least reward such efforts with lower rates.

If a car company suffers a cyber-hurricane, how might they be able to spread that risk around? The car insurance companies might be fine due to reinsurance. But the car company itself has its name in the news. Just ask Hyundai/Kia how that’s been working out for them.

For a cyber-hurricane the sources of losses might be more diverse than just strong winds and rising flood waters. But you can expect we will see such events on a regular basis as more of our society comes to depend on interconnected software-intensive systems.

If you’re in the car business, ask yourself what a cyber-hurricane might look like for your system and what you should be doing to prevent it.

Sunday, July 14, 2024

Architectural Coupling Killed The Software Defined Vehicle

SDV failures might have poor architectural cohesion and coupling as a critical factor.

We’re reading about high profile software fiascos in car companies, and how they might be handling them, for example: The $5B VW bet on Rivian; Volvo refunding car owners over poor software. And don’t forget a steady stream of recalls over infotainment screen failures related to vehicle status indication and rear-view cameras.

There are business forces at play here to be sure, such as a mad rush to catch up to Tesla for EVs. But I think there might be a system architecture issue that is also playing an outsized role — both technical and business.

The technical side of this issue is directly related to the move from a bunch of boxes from a tiered supplier system to a single big computer that is a key aspect of so-called Software Defined Vehicles.

Architectural coupling and cohesion

Two key architectural principles that differentiate a good architecture from a bad one are cohesion and coupling. High cohesion is good; low coupling is good. The opposite can easily kill a system due to drowning in complexity.

Here are some definitions:

Cohesion: how well all the functions in a particular hardware or software module are related. Are they all cousins (high cohesion)? Or is it miscellaneous cats and dogs, with a hamster tossed in for good measure (low cohesion)? As an analogy, an apartment building (all apartments) has higher cohesion than a mixed use building (shops+dining+offices+apartments+garage+metro station). Low cohesion might have some advantages, but it is more complex to maintain and operate.
Coupling: how many data connections there are into/out of each module. Low coupling is good (a few cleanly defined data types); high coupling bad. High coupling amounts to data flow spaghetti. Not the tasty kind of spaghetti — the kind that causes system design failures analogous to spaghetti code, but in the middleware universe. As a more strained analogy, think of an apartment with a dozen exit doors — a different one for going to the shops, office, a neighbor, the garage, the metro, the sidewalk, the cafeteria, your patio, etc — and what it means to check to make sure all the exit doors are locked at bed time.

The old days: technology & supplier approaches incentivized low coupling and high cohesion

In traditional vehicles, the use of wires to connect different Electronic Control Units (ECUs) placed an artificial limit on coupling. In the old days you only got so many wires in a wire bundle before it would no longer fit in the available space in the car. And with the transition to networks, you only got so many messages per second on a comparatively slow CAN bus (250K or 500K bits/sec in the bad old days).

Moreover, in the old days each ECU more or less did a single function created by a single supplier. This was due in large part to a functional architecture approach in which OEMs could mix-and-match functions inside dedicated boxes from different suppliers.

Sure, there was duplication and potential wasteful use of compute resources. But a single box doing a single function hung onto a low-bandwidth network cable had no choice but to get high cohesion and low coupling.

New technology removes previous architectural incentives

Now we have much higher speed networks (perhaps hundreds of megabits/sec, with the sky being the limit). And if all the software is on the same computer, dramatically faster than that.

We also have middleware that is pushing software architectures from procedure-passing data based on flows of control to publish/subscribe broadcast models (pub/sub). Sure, that started with CAN, but has gotten a lot more aggressive with the advent of higher speed interconnects and middleware frameworks such as the one provided by AUTOSAR.

The combination of higher connection bandwidth between modules and pub/sub middleware has effectively removed the technical cost of high coupling.

Now we are promised that Software Defined Vehicles will let us aggregate all the functions into a single big box (or perhaps a set of medium-big boxes). With high bandwidth networks. And all sorts of functions all competing for resources on shared hardware.

High bandwidth communications, pub/sub models, centralized hardware, and centralized software implicitly incentivize approaches with high coupling and low cohesion.

SDV architectures destroy all the incentives that pushed toward low coupling and high cohesion in older system designs. You should expect to get what you incentivize. Or in this case, stop getting what you have stopped incentivizing.

Any student of system architecture should find it no surprise that we’re seeing systems with high coupling (and likely low cohesion) in SDVs. Accompanied by spectacular failures.

I’m not saying it is impossible to design a good architecture with current system design approaches. And I’m not saying the only solution is to go back to slow CAN networks. Or go back 50 years in software/system architectures. And I’m only arguing a tiny bit that painful constraints build character. What I’m saying is that the incentives that used to push designers to better architectures have evaporated.

Business architecture imitating technical architecture

Consider Conway’s law: organizations and systems they create tend to have similar structures. In my experience this is a two-way street. We can easily get organizations that evolve to match the architecture of the system being built. It is possible that the software/system architecture itself is the tail, and the dog is that organizations over time have aligned themselves with low cohesion/high coupling system approaches, and are therefore suffering from these same architectural problems.

So it might not be the middleware-centric architectural trends that are as much of a problem as the way those reflect back into the organizations that create them.

Despite the headline I don’t think the SDV is actually dead. But the failures we’re seeing will not be resolved simply by hiring more and smarter people. There are some fundamental issues in architecture that need to be addressed. With incentives to do so that are strategic rather than tactical, making it harder to explain the return on investment for doing so. Beyond that, there are some serious issues with how software engineering practices have evolved and their suitability for life critical systems.

I think the spectacular failures are just getting started. It will take some really fundamental changes to get things back on track. And probably more corporate fiascos.

Sunday, July 7, 2024

Mercedes Benz DRIVE PILOT and driver blame

MB has softened their stance on Level 3 liability, but they still don't really have your back when it matters.

Good news: Mercedes Benz has improved their position on driver liability.

Bad news: But they’re not there yet. A soft promise to pay insurance isn’t the biggest issue. Tort liability in a courtroom is.

MB is starting yet another round of “we take responsibility for crashes” for their Level 3 DRIVE PILOT traffic jam assist automation feature, approved for use in California & Nevada as a “Level 3” system and some places outside the US as an ALKS traffic jam assist system. (h/t to Susana Gallun for this gem promising they will cover insurance cost of crashes in Australia.)

Gone is the wording of the driver having to notice “irregularities .. in the traffic situation” while watching a movie on the dashboard. Because that was just plain stupid. And was an admission that they were selling an SAE Level 2+ system and not a Level 3 system, because Level 3 requires no driver attention to what is happening on the roadway.

Now MB has adopted terminology from SAE J3016. The driver “must take control of the vehicle if evident irregularities concerning performance relevant system failures are detected in the vehicle.” Straight out of J3016 Level 3 requirements. So that at least means they are consistent with the standard they invoke, and they are legit selling a Level 3 system according to their description. Too bad J3016 is not actually a safety standard and conformance to Level 3 does not bestow safety.

(Excerpt from Drive Pilot supplement: https://www.mbusa.com/content/dam/mb-nafta/us/owners/drive-pilot/QS_Sedan_Operators_Manual_US.pdf

There are two issues here. The first is what a “performance relevant system failure” inside the vehicle actually might be, and how the driver is supposed to notice if the system does not tell them. Let’s say there is a complete lidar failure for some reason (for example, a common cause failure inside the lidar firmware for all redundant lidar units), and the system does not tell the driver to take over. Crash ensues. That is clearly a “performance relevant system failure” — including the part where the system fails to inform the driver that the ADS is driving blind while the driver is still watching the movie. Yes, conforms to Level 3. But see the part where J3016 is not actually a safety standard.

So the driver is still potentially hung out to dry if the automation fails and there is a crash.

At this point MB typically says something about how smart their engineers are (which is credible) and they will pay insurance (if they don’t walk it back) and it won’t fail because MB is trustworthy (mixed experiences on that). But the fact is the current wording still leaves drivers exposed to significant downside if the technology fails. And technology eventually fails, no matter who builds it. (MB has recalls, just like everyone else.)

But let’s say MB actually honors its soft PR-promise to pay up insurance claims. In practice this kind of doesn’t matter, because the driver’s insurance was going to pay anyway. For anyone who can afford DRIVE PILOT, who pays insurance is not an existential economic threat. That comes from the wrongful death lawsuit, which MB is NOT saying it will cover.

What matters is who takes responsibility for the $10M+ (or whatever) wrongful death tort lawsuit that blows well past the insurance coverage of the driver, and which MB might (or might not) conveniently find an excuse to walk away from.

Mercedes Benz still wants us to think they accept liability in case of a Level 3 computer driver crash. Their current posture is not as ridiculous as it was before, because now they’re flirting with tort liability instead of disingenuously meaning only product liability as they were saying last year.

But we’re still not there yet. Maybe next year we will see them actually say they accept a duty of care (liability) for any crash that occurs when DRIVE PILOT is engaged, and liability returns to the driver only after more than 10 seconds have elapsed from a takeover request or the driver performs a takeover before then. Human factors questions about takeover situational awareness and safety remain, but at least this would be relatively clear-cut and not leave the potential for the human driver to be hung out to dry for watching a movie — as MB encourages them to do — when a crash happens.

For a longer writeup with detailed history, see:
No, Mercedes-Benz will NOT take the blame for a Drive Pilot crash

Friday, June 21, 2024

Time to Formally Define Level 2+ Vehicle Automation

We should formally define SAE Level 2+ to be a feature that includes not only Level 2 abilities but also the ability to change its travel path via intersections and/or interchanges. Level 2+ should be regulated in the same bin as SAE Level 3 systems.

There is a lot to unpack here, but ultimately doing this matters for road safety, with much higher stakes over the next 10 years than regulating completely driverless (Level 4/5) robotaxi and robotruck safety. Because Level 2+ is already on the roads, doing real harm to real people today.

First, to address the definition folks who are losing it over me uttering the term "2+" right now, I am very well aware that SAE J3016 outlaws notation like "Level 2+". My suggestion is to change things to make it a defined term, since it is happening with or without SAE's blessing, and we urgently need consistently defined term for the things that everyone else calls Level 2+ or Level 2++. (Description and analysis of SAE Levels here. Myth 5 talks about Level 2+ in particular.)

From a safety point of view, we've known for decades that when you take away steering responsibility the human driver will drop out, suffering from automation complacency. There have been enough fatalities from plain features said to be Level 2 (automated lane keeping + automated speed), such as cars under-running crossing big rigs, that we know this is an issue. But we also have ways of trying to address this by requiring a combination of operational design domain enforcement and camera-based driver monitoring. This will take a while to play out, but the process has started. Maybe regulatory intervention will eventually resolve the worst of those issues. Maybe not -- but let's leave that for another day.

What's left is the middle ground between next-gen-cruise-control features (lane centering + automated speed) and vehicles that aspire to be robotaxis or robotrucks but aren't quite there. That middle ground includes a human driver so the designers can keep the driver in the loop to avoid and/or blame for crashes. If you thought plain Level 2 had problems with automation complacency, Level 2+ says “hold my beer.” (Have a look at the concept of the moral crumple zone. And do not involve beer in driving in any way whatsoever.)

Expecting a normal human being to pay continuous hawk-like attention for hours while a car drives itself almost perfectly is beyond credibility. And dangerous, because things might seem fine for lots and lots of miles — until the crash comes out of the blue and the driver is blamed for not preventing it. Telling people to pay attention isn’t going to cut it. And I really have my doubts that driver monitoring will work well enough to ensure quick reaction time after hours of monotony.

People just suck at paying attention to boring tasks and reacting quickly to sudden life-threatening failures. And blaming them for sucking won’t stop the next crash. I think the car is going to have to be able to actively manage the human rather than the human managing the car, and the car will have to ensure safety until the human driver has time to re-engage with the driving task (10 seconds, 30 seconds, maybe longer sometimes). That sounds more like a Level 3 feature than a Level 2 feature from a regulatory point of view.

Tesla FSD is the poster child for Level 2+, but over the next 5 years we will see a lot more companies testing these waters as they give up on their robotaxi dreams and settle for something that almost drives itself -- but not quite.

The definition I propose is Level 2+ is a feature that meets the requirements for Level 2 but also is capable of changing roadways at an intersection and/or interchange.

Put simply, if it drives you down a single road, it's Level 2. But if it can make turns or use an exit/entrance ramp it is Level 2+.

One might pick different criteria, but this has the advantage of being simple and relatively unambiguous. Lane changing on the same roadway is still Level 2. But you are at Level 2+ once you start doing intersections, or go down the road (ha!) of recognizing traffic lights, looking at traffic for unprotected left turns, and so on. In other words, almost a robotaxi -- but with a human trying to guess when the computer driver will make a mistake and then potentially getting blamed for a crash.

No doubt there will be minor edge cases to be clarified, probably having to do with the exact definition of “roadway”. Or someone can propose a good definition for that word that takes care of the edge cases. The point here is not to write detailed legal wording, but rather to get the idea across of making turns at an intersection being the litmus test for Level 2+.

From a regulatory point of view, Level 2+ vehicles should be regulated the same as Level 3 vehicles. I realize Level 2+ is not necessarily a strict subset of Level 3, but the levels were never intended to be a deployment path, despite the use of a numbering system. I think they both share a concern of adequate driver engagement when needed in a system that is essentially guaranteed to create driver complacency and slow reaction times due to loss of situational awareness.

How does this look in practice? In the various bills floating around federal and state legislatures right now, they should include a definition of Level 2+ (Level 2 + intersection/interchange capability) and group it with Level 3 for whatever regulatory strategy they propose. Simple as that.

If SAE ORAD wants to take up this proposal for SAE J3016 that's fine too. (Bet some committee members are reading this — happy to discuss at the next meeting if you’re willing to entertain it.) But that document disclaims safety as being out of its scope, so what I care about a lot more are the regulatory frameworks that are currently near-toothless for the not-quite-robotaxi Level 2+ features already being driven on public roads.

Note: Based on proposed legislation I've seen, pulling Level 2+ into the Level 3 bin is the most urgent and viable path to improve regulatory oversight of this technology in the near to mid term. If you really want to do away with the levels I have a detailed way to do this, noting that the cut-line for Supervisory is at Level 2 rather than Level 2+, but is otherwise compatible with this essay. If you want to use the modes but change the cut line, let’s talk about how to do that without breaking anything.

Note: Tesla fans can react unfavorably to my essays and social media posts. To head off some of the “debate” — yes, navigate-on-autopilot counts as Level 2+ in my view. And we have the crashes to prove it. And no, Teslas are not dramatically safer than other cars by any credible analysis I’ve ever seen.

Monday, June 17, 2024

Perspective on Waymo's Safety Progress

I am frequently asked what I think of Waymo's progress on safety. Here are some thoughts on progress, mishaps, and whether we know they are acceptably safe. At current deployment rates it will take Waymo about 20 years with ZERO fatalities to show they are net as safe as average human driver fatality rates (including the old cars, impaired drivers, etc. in that comparison baseline). Their current statements that they are already saving lives are hype.

Safety at scale still remains the biggest question. And even with reasonable growth rates that question will remain open for many years for robotaxi technology. With Waymo currently in the lead, there are even more question marks for the other players with regard to safety.

Waymo has made impressive progress in scaling up operations. Some had previously criticized their ramp-up for being slower than other companies, but they are looking a lot smarter these days for having done that.

We've seen some recent incidents (for example the utility pole crash) and an investigation from NHTSA. I hope those are not signs that they have started scaling up faster than they should due to funding pressure.

This piece in Forbes notes that Waymo is now doing more than 50,000 paid rides a week across three cities and plans to do more launches.

Sounds like a lot! But from a safety point of view not enough to really know how things will turn out.

Waymo is disingenuously messaging that they are already saving lives, but the truth is nobody knows how that will turn out yet. At this rate they will need perhaps 20 years without a single fatality (see math check below) to show they are no worse than an average US human driver. And that is under some wildly favorable assumptions (e.g., software updates never create a new defect -- which is not how things work in the real world.). So for practical purposes the bar is set at perfection right now. We'll have to see how things turn out.

It certainly feels like Waymo has been more aggressive lately, perhaps because they are feeling pressure to show progress to justify further investment with a good news story. The danger is if Alphabet puts too much pressure on Waymo to expand too fast that could generate a bad news story instead of a good one. What happened at Cruise provides a strong cautionary tale for the whole industry. Let's hope Waymo is not pushed into making the same mistakes.

Truths & Myths About Automated Vehicle Safety -- Video Series

The past year has seen both peak hype and significant issues for the automated vehicle industry. In this talk we recap general trends and summarize the current situation for autonomous vehicles such as robotaxis, as well as conventional vehicles that have automated steering features. Many of the issues the industry faces are self-inflicted, stemming from a combination of inflated promises, attempts to scale immature technology too aggressively, and an overly narrow view of safety. Overall, the companies deploying the technology have failed to address legitimate concerns of a wide variety of stakeholders. There are a number of different aspects that still need to be addressed including: legislation, regulation, liability, insurance, driver skills, traffic enforcement, emergency services, vulnerable road users, engineering standards, business models, public messaging, investor pressure, cultural change, ethical/equity concerns, and local oversight. We concentrate on how all these pieces need to fit together to create sustainable automated vehicle technology approaches.

All Released Videos: YouTube Play List | Archive.org big video

Slide deck (acrobat)

Individual Videos:

1. Introduction & Overview

2. Truth or Myth? -- You Can Ride In An Autonomous Vehicle Today

Robotaxi Deployments
Other Testing & Deployments
Remote Operators
Autonomous pilot deployments are already on public roads; testing continues

3. Truth or Myth? -- Can Personally Owned Vehicles Can Drive Themselves?

Personal Vehicles Require Supervision
Things Can Go Very Wrong
IIHS: Only 1 of 14 Automated Steering Systems Is Acceptable
Automated steering requires continuous human driver attention - not really "self driving"

4. Truth or Myth? -- Are People Are Inherently Terrible Drivers?

The Myth of 94% Human Error
It Ain't 94%
Industry: Replace Terrible Human Drivers
Human Drivers Can Improve Over Time
Might We Do Better?
Many Countries Do Better Than the US At Road Safety
Better road safety does not require using computer drivers

5. Truth or Myth? -- Computer Controlled Active Safety Features Can Improve Safety

Active Safety Can Really Work!
Example Car Active Safety Features
Computer-controlled features can improve safety

6. Truth or Myth? -- Does Automated Steering Improve Driving Safety?

Automated Steering Vs. Active Safety
Active Safety Makes The Difference, Not Automated Steering
IIHS: Automated Steering Is Not A Safety Feature
Automated steering is a convenience feature, not a safety feature

7. Truth or Myth? -- People Are Terrible At Supervising Automation

Automation Bias & Complacency
NTSB Recommendations For Supervision Safety
Risk of Degraded Safety Due To Automation Complacency
Driver Monitoring To The Rescue?
Driver attention management is an open challenge

8. Truth or Myth? -- Are Ordinary Drivers Are Qualified To Test Driving Automation?

Public Road Beta Testing
Road Testing Can Cause Real Harm
Customers cosplaying "beta tester" expose everyone to undue risk

9. Truth or Myth? -- Blaming Drivers Deflects Accountability Away from Companies

The Moral Crumple Zone
Uber ATG Autonomous Vehicle Tester Blamed
Tesla Autopilot Double Fatality Driver Blame Story
Blaming drivers protects the company, not necessarily other road users

10. Truth or Myth? -- Does Lots of Sensors Means No Avoidable Collisions?

Perception Builds the World Model
Sensors Alone Do Not Ensure Safety
Sensors aren't enough; perception and prediction are critical for safety

11. Truth or Myth? -- Computers Won't Drive Drunk ... but ...

Human Error ==> Robot Error
Handling Non-Crash Hazards
City of San Francisco Concerns
Beyond Just Avoiding Crashes
Robot drivers will fail -- sometimes differently than human drivers

12. Truth or Myth? -- Safe Enough Requires More Than "Safer Than A Human Driver"

What People Mean By "Safe"
Positive Risk Balance
Other Safety Considerations
AVs need more than improved statistical average safety

13. Truth or Myth? -- Will Insurance Cost Pressure Ensure Acceptable Automated Vehicle Safety?

Insurance leverage for safety
Affordable Insurance vs. Safety
Net Risk Alone Is Not Safety
Insurance pressure alone will not ensure acceptable safety

14. Truth or Myth? -- Autonomous Vehicle Ethics (Not just the Trolley Problem)

The Infamous Trolley Problem
Ethics: Deployment Governance
Equity Concerns for AVs
Ethics/Equity Question that matters: Who decides what/when/where to deploy?

15. Truth or Myth? -- Does 10 Million Good Miles Prove Autonomous Vehicles Are Safe?

2023: Results from 1M+ Miles
How Many Road Testing Miles To Prove Safety?
Are Robotaxis Safer?
Companies predict -- but cannot yet prove -- severe injury/fatality safety

16. Truth or Myth? -- Does Road Testing Make Autonomous Vehicles Safe?

How About a Robot Driver Test?
Brute Force Road Testing
The Challenge Is Covering Everything
Safety Requires an Accurate World Model
Heavy Tail Distribution of Surprises
Heavy Tail Edge Cases Explained
Safety Engineering In A Nutshell
Safety depends on engineering to mitigate rare, high-consequence events

17. Truth or Myth? -- Do Safety Standards Exist? Do They Stifle Innovation?

Standards Set an Expectation of Safety
Safety Standards & Innovation
Case Study: Loss of the Titan Submersible
Safety Standards Deter UNSAFE Innovation

18. Truth or Myth? -- Will Government Regulation Ensure Safe Vehicle Automation?

Robotaxi Regulatory System In Action
US Regulatory Posture
Regulators Struggle with Novel Technology
Trend: System Safety Recalls
Federal Recall-Based Strategy Struggling To Deal With System-Level Safety

19. Truth or Myth? -- Will Product Liability Ensure Safe Vehicle Automation?

Product Liability Is Not Enough
Product Liability Is The Wrong Tool for Most Automated Vehicle Crashes

20. Truth or Myth? -- Current Tort Liability Rules Will Ensure Safe Vehicle Automation

Tort Law For Engineers
Duty of Care for Accountability
Implications of Defining a Computer Driver
Alternate to SAE Levels for Regulation
The Awkward Middle: Supervisory Mode
Urgency of Defining a Computer Driver
Providing A Safety Guardrail
Tort Law Could Help Support Safety -- Via the Computer Driver Concept

21. Conclusions

Saturday, June 15, 2024

The Waymo Utility Pole Crash

Waymo vs. utility pole smackdown: the utility pole won. No apparent extenuating circumstances.
Nobody was injured; the vehicle was empty. The pole suffered a minor dent but is still in service.

This video has an interview with the passenger who was waiting for pickup in Phoenix: https://www.youtube.com/watch?v=HAZP-RNSr0s Waymo did not provide a comment for the story.

Now the Waymo utility pole safety recall report is out (https://static.nhtsa.gov/odi/rcl/2024/RCLRPT-24E049-1733.PDF). Interesting that the vehicle was executing a pullover maneuver at the time it hit the pole. From a validation point of view I'll bet it could go down the center of the alleyway just fine in normal driving, but what bit them was the combination of pulling to the side of the road and a pole happening to be in what the vehicle thought was a safe pullover area due to the map.

Still not addressed is how utility poles being assigned a "low damage score" could have made it all the way through peer reviews, simulation, and road testing -- and needed to be found in a crash which might been worse in other circumstances.

This serves as a stark reminder that these vehicles lack common sense, in this case "thinking" that running smack into a utility pole was no big deal. They are subject to software defects as are all computer-based systems. We still don't know if/when they will be better than human drivers at reducing fatalities. But we know for sure they will make unforced errors in driving, hype notwithstanding.
This is also a good reminder that safety validation needs to consider all operational modes, and it is common for the problems to crop up in unusual or failure recovery modes. While there is no indication of an equipment malfunction in this particular case, safety in abnormal mission termination modes is notoriously difficult because there might also be malfunctioning equipment that triggered the system mode change.

Description of defect: "Prior to the Waymo ADS receiving the remedy described in this report, a collision could occur if the Waymo ADS encountered a pole or pole-like permanent object and all of the following were true: 1) the object was within the the boundaries of the road and the map did not include a hard road edge between the object and the driveable surface; 2) the Waymo ADS’s perception system assigned a low damage score to the object; 3) the object was located within the Waymo ADS’s intended path (e.g. when executing a pullover near the object); and 4) there were no other objects near the pole that the ADS would react to and avoid. "

Since I've had a number of questions here is my best shot at clarifying the collision mechanism:

The alleyway is marked as drivable, because the entire alley road surface is in fact all drivable (no curb; mostly garage entranceways) -- except for utility poles once in a while

The robotaxi's computer driver saw the utility pole in question, and correctly classified it as "utility pole".

A human drive would know "hitting utility pole == bad". However, some data structure somewhere in the computer driver was set that "hitting utility pole == OK". This applies to ALL utility poles EVERYWHERE, not just this particular utility pole.

So the computer driver drove smack into the utility pole thinking it was OK, when if fact it was not.

There was no mapping error involved in the collision. Changing the map is a workaround only.

I can speculate that somewhere there is an object classification system, and somehow (probably manually or semi-manually) each object type has an attribute of "OK to hit?" The global utility pole one was set incorrectly. There are other possibilities, but this is the simplest one.

What is shocking is that such a mistake could make it through quality, safety validation, and testing processes.

Academic Publishing, AI, and Broken Incentives

Since I've been a part of the academic publication ecosystem for decades (as have some of my readers), I'll share some thoughts on the latest publication quality scandals:

Junior faculty are suffering from an ever-increasing squeeze for publication and citation metrics. This has been an issue with growing severity as long as I've been at a university.
Some specialty areas find it easier to publish than others for many legitimate reasons.
ChatGPT is just the latest form of shady authoring technique.
The academic paper mills are symptoms, not the root cause. Effectively they are exploiting doctoral student => postdoc => junior faculty suffering for profit.
Shutting down paper mills does not fix the problem. It just forces it to manifest some other way.
All these pressures combined with MUCH more difficulty in raising research funding means that those same faculty are pressed to peer review more papers with lower quality with far less time -- and effectively no professional compensation for doing so.
On-line venues that exist to increase the number of publication slots available are just the latest form of publication metric arms races.
The top-tier conferences in my field still have excellent peer review. But it is no surprise that other publication venues do not. And the situation is ripe for scam artists to set up what amount to fake journals with fake reviews to make money from low-cost web publications. Sometimes established publication names are duped. Other times it is worse.
The academic publication cash grab has been a thing for a long, long time. (I recall $100 page charges for professional society journals when I was in grad school.) It has simply been weaponized along with the greater erosion of the publication industry we've been seeing overall.

It does feel like this problem has "gone exponential" lately. But it is a long-standing trend (look up Beall's list: https://en.wikipedia.org/wiki/Beall%27s_List) The problem won't be fixed just by killing off the most abusive journals, because the root causes run much deeper.

Ultimately this is difficult to solve because interdisciplinary evaluation committees find it easiest to count number of "scholarly" publications, and things like an H-index. "Impact" is hard to measure, and likely takes longer than the assistant=>associate professor timeline allows.

This is a systemic problem that will require senior faculty across the university ecosystem to resolve via deep cultural change. Clutching our pearls over ChatGPT showing up in papers should be directed to calling attention to the root problems (hence this post).

This is my personal perspective as an academic who, in the worst case, might be forced to retire as already planned in a few months. I hope that many junior faculty who do not feel they can speak up feel seen, even if I can't offer you a solution.

Relevant news story: Flood of Fake Science Forces Multiple Journal Closures

Saturday, June 8, 2024

SAE J3018 for operational safety

Any company testing on public roads should conform to the industry standard for road testing safety: SAE J3018.

When AV road testing first started, it was common for testers to claim that they were safe because they had a "safety driver." However, as was tragically demonstrated in the Tempe AZ testing fatality in 2018, not all approaches to safety driving are created equal. Much more is required. Fortunately, there is an SAE standard that addresses this topic.
SAE J3018_202012 "Safety-Relevant Guidance for On-Road Testing of Prototype Automated Driving System (ADS)-Operated Vehicles" (https://www.sae.org/standards/content/j3018_202012/ -- be sure to get the 2020 revision) provides safety relevant guidance for road testing. It concentrates on guidance for the "in-vehicle fallback test driver" (also known informally as the safety driver).

The scope of J3018 includes:

Fallback test driver training (classroom, simulation, track, on-road)
Workload management
Selection of test routes
Pre-trip protocol (checklist, inspection)
Test driver monitoring
Test driver post-test debrief
Incident response protocol

AV testers should be conform to J3018 to ensure that they are following identified best practices for safety driver training and effectiveness. Endorsing this standard will avoid a DOT having to create their own driver qualification and training requirements.

Taking a deeper look at J3018, it seems a bit light on measuring whether the safety driver is actually providing effective risk mitigation. Rather, it seems to implicitly assume that training will necessarily result in acceptable road testing safety. While training and qualification of safety drivers is essential, it is prudent to also monitor safety driver effectiveness, and testers should be asked to address this issue. Nonetheless, J3018 is an excellent starting point for testing safety. Testers should be doing at least what is in J3018, and probably more.

J3018 does cost money to read, and the free preview is not particularly informative. However, there is a free copy of a precursor document available here: https://avsc.sae-itc.org/principles-01-5471WV-42925L3.html that will give a flavor of what is involved. That having been said, any DOT guidance or requirement should follow J3018, and not the AVSC precursor document.
In addition to following J3018, the safety-critical mechanisms for testing should be designed to conform to the widely used ISO 26262 functional safety standard. (This is not to say that the entire test vehicle -- which is still a work in progress -- needs to conform to 26262 during testing. Rather, that the "Big Red Button" and any driver takeover functions need to conform to 26262 to make sure that the safety driver can really take over when necessary.)

For cargo vehicles that will deploy without drivers, J3018 can still be used by installing a temporary safety driver seat in the vehicle. Or the autonomy equipment can be mounted on a conventional vehicle in a geometry that mimics the cargo vehicle geometry. When the time comes to deploy without a driver physically in the system, you are really testing an autonomous vehicle with a chase car or remote safety supervisor, covered in a following section on testing without a driver.

Wednesday, June 5, 2024

Live Talk: Challenges in Autonomous Vehicle Safety Assessment

Challenges in Autonomous Vehicle Safety Assessment

Recorded live at a US DOT workshop on May 29, 2024.

Knowing whether an autonomous vehicle is safe enough to operate on public roads is an extremely difficult challenge. Assessment must include acknowledging that operating millions of miles does not come close to proving safety, that robot drivers will make mistakes -- often the same mistakes people make, fundamental incompatibilities between conventional safety engineering processes and the machine learning technology used by these systems, a pervasive lack of automotive safety standard adoption by the industry, and important considerations of safety that go far beyond net statistical risk.

Event home page: https://www.transportation.gov/hasscoe/highlights/AI-assurance

Tuesday, June 4, 2024

Waymo's Misleading Claim of Saving Lives

Waymo claims they are "already saving lives" so often, and people are so taken in by that misleading claim, that I'm going to take a moment to explain why it is misleading. And especially harmful when used as justification for loose regulatory policies as it so often is.

The claim: "The data to date indicates the Waymo Driver is already reducing traffic injuries and fatalities." Here is the claim, which has been at the top of the Waymo Safety landing page for quite a while now (https://waymo.com/safety/ as of June 4, 2024; highlighted of those words added):

Having had high school English, I would interpret that sentence as also including an unproven claim of "already reducing fatalities" being supported by data. And I would expect that anyone authoring this sentence would reasonable expect a reader or listener to conclude "already reducing fatalities." Those listeners include federal and state regulators and legislators.

This claim is absurd for a simple reason. US driving data shows human-driven vehicles have ballpark 1 fatal crash per 100M miles (varies by year, zip code, etc. -- for more nuance see this narrated video slide which is in terms of fatal crashes, noting that some such crashes have multiple fatalities). But their latest study is for only 7.1 million miles. They need something like 40 times more data prove they are actually saving lives with statistical confidence (likely even more).

What is really going on here seems to be some sort of word game that is essentially guaranteed to mislead readers. Their 7.1 million mile study talks about a bin called "any-injury-reported" crashes that were lower than human-driven vehicles, and fatalities are a subset of that bin. So the claim being made is (apparently) the bin containing fatalities is better than human drivers. Without mention that the sample size is too small for valid conclusions on fatalities. So maybe they have saved about 0.07 or perhaps even 0.10 lives depending on the baseline you use for human drivers -- and maybe not.

But don't just take my word for it, see for yourself this excerpt from Waymo's own paper saying "Serious injury and fatalities are a subset of this any-injury-reported benchmark, but no statement on these outcome levels can be made at this time based on this retrospective data." In other words, Waymo does not have enough data to know how fatalities will turn out. That's the truth. Waymo's safety landing page claim is something other than the full truth.

Waymo paper: "Comparison of Waymo Rider-Only Crash Data to Human Benchmarks at 7.1 Million Miles" https://arxiv.org/pdf/2312.12675 (top of page 15; highlight added)

Monday, June 3, 2024

Five views into the Cruise Robotaxi Pedestrian Dragging Mishap

On October 2, 2023, a Cruise robotaxi dragged a woman 20 feet underneath the vehicle in San Francisco. The circumstances of the mishap and everything else are complex. But the robotaxi industry was profoundly shaken.

Here are four descriptions of the events and what might be learned from those events. Each is in a different style, intended for a different audience.

J. Mathews, "In a single night, self-driving startup Cruise went from sizzling startup to cautionary tale. Here’s what really happened—and how GM is scrambling to save its $10B bet," Fortune, May 16, 2024.

Mainstream business audience.
Long-form journalism style, includes interview material from former Cruise staff and contractors not available in the external investigation report.

P. Koopman, "Lessons from the Cruise Robotaxi Pedestrian Dragging Mishap," IEEE Reliability Magazine, arXiv preprint June 2024. (Official version/paywall)

General engineering audience; a less dense, more descriptive style than the SafeComp paper below.
Peer reviewed technical magazine style, emphasizing lessons that might be learned.

P. Koopman, "Anatomy of a Robotaxi Crash: Lessons from the Cruise Pedestrian Dragging Mishap," SafeComp 2024 proceedings, arXiv preprint April 29, 2024

Detailed technical summary of the material in the Quinn Emanuel external investigation report.
Peer reviewed academic paper. Condensed style with detailed citations to the external investigation report.
Best guide to rigorously putting all the disjoint pieces of the external investigation report into a more coherent narrative.

Quin Emanuel Trial Lawyers, REPORT TO THE BOARDS OF DIRECTORS OF CRUISE LLC, GM CRUISE HOLDINGS LLC, AND GENERAL MOTORS HOLDINGS LLC REGARDING THE OCTOBER 2, 2023 ACCIDENT IN SAN FRANCISCO, Jan. 24, 2024.

External investigation paid for by Cruise, so expect this to present the facts in the best possible light for Cruise as a business.
The authoritative source to publicly available information. Some pages are heavily redacted.

Missy Cummings analysis that emphasizes the role of remote operations: A Root Cause Analysis of a Self- Driving Car Dragging a Pedestrian, July 2024.

Additional content:

A video podcast where I walk through the mishap events with Junko Yoshida & Bolaji Ojo: https://youtu.be/OaF6IbYoVHQ
Cruise also maintains a safety page. A the time of this writing the big fonts are used to say "transparent about safety" and "continuous improvement". So far not a lot of detail about what has changed and what transparency will mean as they get back on the road.