Thursday, March 15, 2018

Interview on Smarter Cars podcast

I really enjoyed the discussion with  Michele Kyrouz  on the Smarter Cars podcast. Thanks so much for inviting me on as a guest!   // March 15, 2018 episode

Thursday, March 1, 2018

A More Comprehensive Look at Autonomous Vehicle Testing and Validation

Developers should create a transparent safety argument based on testing, simulation, and good engineering practices.

The race to deploy self-driving cars is in full swing, with fleets being deployed on public roads in increasing numbers. While some vehicles require driver supervision, other fleets are being deployed which don’t even have driver controls at all. The big question is: will these vehicles be safe enough?

Carnegie Mellon NavLab 11 / 2001 /

Now that self-driving car technology is maturing, there is increasing recognition of the fact that demonstrating adequate levels of safety with on-road testing is impractical. Too many billions of miles are needed for a credible statistical safety argument, and it simply costs too much and takes too long to do that. Other safety critical application areas already have a solution to this problem in the form of internationally accepted safety standards (e.g., aircraft, trains, and even home appliances). While there is an international safety standard for conventional cars (ISO-26262), its use is not currently required by the US government.

The stakes are high. Even a single bad line of software code can — and has — caused catastrophic failures. While nobody should expect this technology to be perfect, it’s important to ensure that it follows accepted practices and is appropriately safe. Assuming we have a quantification of how much better than a human driver the technology needs to be, the technologists need to make sure that they have a way to measure and ensure that self-driving cars actually are that safe.

A starting point for understanding self-driving car safety is defining an effective role for on-road data gathering. Contrary to common discussions about the purpose of vehicle testing, on-road testing should not be primarily used to find bugs in software. (Bugs — which really should be called defects — should be found in simulation and development.) Rather, the critical need for on-road testing is to understand the driving environment and collect data to make sure that simulation will include all the typical, unusual, and just plain weird driving situations that a full-scale self-driving car fleet will experience on real-world roadways. On-road vehicle testing can be used as a graduation exercise before deploying production vehicles to make sure nothing was missed in development, but it should not be the primary strategy for finding defects in the vehicle’s autonomy.

Ensuring adequate safety requires two approaches beyond collecting on-road data. The first approach is extensive simulation across many different levels. Closed course testing is a form of simulation in that a real car’s response is measured within an environment that has artificially created scenarios designed to stress key aspects of the system. Pure software simulation can be used in addition to closed course testing to scale up to hundreds or thousands of simulated cars running 24 hours a day in a cloud computing environment. These and other types of simulation can help improve testing coverage. But, experience with other types of critical systems shows that even extensive simulation is not enough to catch all the software defects that can cause avoidable deaths.

In addition to testing and simulation, rigorous design engineering approaches must be used to ensure that the software running self-driving cars is well designed and can deal safely with the inevitable glitches, malfunctions, faults, and surprises that happen in real world systems. Following ISO 26262 and other relevant safety standards is a necessary starting point. Additional methods such as system robustness testing and using a safety net architectural approach can help develop a sufficient level of assurance for the AI-centric vehicle functions. These technical approaches should be created in parallel with the control system side of self-driving car technology. While both design and validation of self-driving cars is a maturing area, it is likely that we have the means to ensure an appropriate level of safety. However, it is crucial to ensure that developers actually use these state-of-the-art techniques rather than skimping on safety validation in the race to deploy.

As self-driving car technology continues to be developed, there are fleets of test vehicles operating on public roads. Public on-road testing clearly has the potential to put pedestrians, other vehicles, and the general public at risk if the test vehicle misbehaves. While safety drivers are charged with ensuring vehicle safety, there are challenges to make this approach effective in practice. For example, continuous safety driver vigilance turns out to be difficult to maintain if the vehicle doesn’t fail very often. And it can be difficult for even a vigilant driver to know if a vehicle is about to misbehave without having some sort of indication of the vehicle’s near term plans. In general, it is important to ensure that a test vehicle (or any partially autonomous vehicle that requires human supervision) never paints the safety driver into a corner, leaving that driver in an untenable situation from which recovery within the constraints of reasonable human performance is difficult or impossible. When the driver does intervene, it’s important that the vehicle actually responds to the driver and gracefully relinquishes relevant vehicle controls.

In the longer term, it will be important that vehicle makers create a transparent argument as to why their AV is sufficiently safe. That argument must be reviewed by credible, independent safety experts. Well-chosen safety performance metrics can give some insight into how AV technology is progressing, but tend to come with many disclaimers and limitations. Simplistic metrics such as disengagement reports are no substitute for a thorough understanding of whether an AV has been engineered to an appropriate level of safety.

A lesson learned a long time ago in other safety areas is that independent audits of some type are an absolute requirement for designing safe systems. Transparency in safety arguments does not necessarily mean design information must be made public, nor that the government must perform reviews. But some credible independent party must assess whether an AV has been designed to an appropriate level of engineering rigor to be safe. The safety reports being published by some AV developers are an encouraging first step, but more details are required for independent assessment. A good starting point is to say exactly which appropriate international safety standards are being followed for what aspects of safety, and get external assessment according to those standards.

Incorporating AI technology into life critical computer-based systems such as AVs presents unique challenges. Whether safety is ensured within the scope of existing standards or is supported by arguments in addition to those standards, a transparent safety argument should be checked by an independent assessor to ensure that vehicles really are as safe as they need to be for use on public roads. That safety argument will need to include a lot more than just on-road test results.

About the author:
Dr. Philip Koopman has been working on autonomous vehicle safety for more than 20 years. As a professor at Carnegie Mellon University he teaches safe, secure, high quality embedded system engineering. As co-founder of Edge Case Research LLC he finds ways to improve AV safety, including robustness testing, architectural safety patterns, and structured safety argument approaches. He has a blog on self-driving car safety at:

Tuesday, February 27, 2018

A Driver Test For Self-Driving Cars Isn't Enough

I recently read yet another argument that a driving road test should be enough to certify an autonomous vehicle as safe for driving. In general, the idea was that if it's good enough to put a 16 year old on the road, it should be good enough for a self-driving vehicle.  I see this idea enough that it's worth explaining why it it's a really bad one.

 CC SA 3.0

Even if we were to assume that a self-driving vehicle is no different than a person (which is clearly NOT true), applying the driving test is only half the driver license formula. The other half is the part about being 16 years old. If a 12 year old is proficient at operating a vehicle, we still don't issue a drivers license. In addition to technical skills and book knowledge, we as a society have imposed a maturity requirement in most states of "being 16." It is typical that you don't get an unrestricted license until you're perhaps 18. And even then you're probably not a great driver at any age until you get some experience. But, we won't even let you on the road under adult supervision at 12!
The maturity requirement is essential.  As a driver we're expected to have the maturity to recognize when something isn't right, to avoid dangerous situations, to bring the vehicle to a safe state when something has gone wrong, to avoid operating when the vehicle system (vehicle + driver) is impaired, to improvise when something weird happens, and to compensate for other drivers who are having a bad day (or are simply suffering from only being 16). Autonomous driving systems might be able to do some of that, or even most of it in the near term. (I would argue that they are especially bad at self-detecting when they don't know what's going on.) But the point is a normal driving test doesn't come close to demonstrating "maturity" if we could even define that term in a rigorous, testable way. It's not supposed to -- that's why licenses require both testing and "being 16."
To be sure, human age is not a perfect correlation to maturity. But as a society we've come to the situation in which this system is working well enough that we're not changing it except for some tweaks every few years for very young and very old drivers who have historically higher mishap rates. But the big point is if a 12 year old demonstrates they are a whiz at vehicle operation and traffic rules, they still don't get a license.  In fact, they don't even get permission to operate on a public road with adult supervision (i.e., no learners permit at 12 in any US state that I know of.)  So why does it make sense to use a human driving test analogy to give a driver license, or even a learner permit, to an autonomous vehicle that was designed in the last few months?  Where's the maturity?
Autonomy advocates argue that encapsulating skill and fleet-wide learning from diverse situations could help cut down the per-driver learning curve. And it could reduce the frequency of poor choices such as impaired driving and distracted driving. If properly implemented, this could all work well and could improve driving safety -- especially for drivers who are prone to impaired and distracted driving. But while it's plausible to argue that autonomous vehicles won't make stupid choices about driving impaired, that is not at all the same thing as saying that they will be mature drivers who can handle unexpected situations and in general display good judgment comparable to a typical, non-impaired human driver. Much of safe driving is not about technical skill, but rather amounts to driving judgment maturity. In other words, saying that autonomous vehicles won't make stupid mistakes does not automatically make them better than human drivers.  
I'd want to at least see an argument of technological maturity as a gate before even getting to a driving skills test. In other words, I want an argument that the car is the equivalent of "being 16" before we even issue the learner permit, let alone the driver license. Suggesting that a driving test is all it takes to put a vehicle on the road means: building a mind-boggling complex software system with technology we're just at the point of understanding how to validate, doing an abbreviated acceptance test of basic skills (a few minutes on the road), and then deciding it's fine to put in charge of several thousand pounds of metal, glass, and a human cargo as it hurtles down the highway. (Not to mention the innocent bystanders it puts at risk.)
This is a bad idea for any software.  We know from history that a functional acceptance test doesn't prove something is safe (at most it can prove it is unsafe if a mishap occurs during the test).  Not crashing during a driver exam is to be sure an impressive technical achievement, but on its own it's not the same as being safe!  Simple acceptance testing as the gatekeeper for autonomous vehicles is an even worse idea. For other types of software we have found that in practice that you can't understand software quality for life-critical systems without also considering the rigor of the engineering process. Think of good engineering process as a proxy for "being 16 years old." It's the same for self-driving cars. 
(BTW, "life-critical" doesn't mean perfect. It means designed with sufficient engineering rigor to be suitable for its intended criticality. See ISO 26262 or your favorite software safety standard, which is currently NOT required by the government for autonomous vehicles, but should be.)
It should be noted that some think that a few hundred million miles of testing can be a substitute for documenting engineering rigor. That’s a different discussion. What this essay is about is saying that a road test — even an hour long grueling road test — does not fulfill the operational requirement of “being 16” for issuing a driving license under our current system. I’d prefer a different, more engineering-based method of certifying self-driving vehicles for use on public roads. A road test should certainly be part of that approach. But if you really want to base everything on a driver test, please tell me how you plan to argue the part about demonstrating the vehicle, the design team, or some aspect of the development project has the equivalent 16-year-old maturity. If you can’t, you’re just waving your hands about vehicle safety.

(Originally posted on Dec. 19, 2016, with minor edits)

Saturday, February 24, 2018

Welcome To My Blog on Self Driving Car Safety


This blog primarily covers Autonomous Vehicle safety, often known as self-driving car safety.

I'm splitting this blog off from my Better Embedded Software blog to reflect my increased emphasis on Autonomous Vehicle (AV) safety.

In my professor gig, these days my main research is on AV stress testing (the ASTAA and RIOT projects at CMU NREC).

I'm also very active in the startup company that I co-founded with Mike Wagner: Edge Case Research LLC. Over the past couple years we have emphasized both stress testing and creating safety cases for AVs.

Comments are moderated. I read them all.  Comments that ask a question typically get approved, although I can't offer specific advise on your particular system this way.  Additional thoughts and responsible contrary views are also typically approved.

While I appreciate the "nice blog" type posts, I typically don't approve them for posting.  (So many of them are comment spam!)  If you really want to be sure I get that type of message just e-mail it to me directly, and it will be appreciated.  If you'd like it to be posted as a comment on the blog let me know.  Send to:   

A few of the initial posts are duplicates from my other blog, but I'll be posting all my new AV content here.  I hope you enjoy the postings!

-- Phil Koopman
Pittsburgh, PA, USA

About the author:
Dr. Philip Koopman has been working on autonomous vehicle safety for more than 20 years. As a professor at Carnegie Mellon University he teaches safe, secure, high quality embedded system engineering. ( As co-founder of Edge Case Research LLC he finds ways to improve AV safety, including robustness testing, architectural safety patterns, and structured safety argument approaches. (

TechAD Talk on Highly Autonomous Vehicle Validation

Here are the slides from my TechAD talk on self driving car safety:

Highly Autonomous Vehicle Validation from Philip Koopman

Highly Autonomous Vehicle Validation: it's more than just road testing!
- Why a billion miles of testing might not be enough to ensure self-driving car safety.
- Why it's important to distinguish testing for requirements validation vs. testing for implementation validation.
- Why machine learning is the hard part of mapping autonomy validation to ISO 26262

(Originally posted on Nov 11, 2017)

SCAV 2017 Keynote: Challenges in Autonomous Vehicle Validation

Challenges in Autonomous Vehicle Testing and Validation from Philip Koopman

Challenges in Autonomous Vehicle Validation
Keynote Presentation Abstract
Philip Koopman
Carnegie Mellon University; Edge Case Research LLC
ECE Dept. HH A-308, 5000 Forbes Ave., Pittsburgh, PA, USA

Developers of autonomous systems face distinct challenges in conforming to established methods of validating safety. It is well known that testing alone is insufficient to assure safety, because testing long enough to establish ultra-dependability is generally impractical. That’s why software safety standards emphasize high quality development processes. Testing then validates process execution rather than directly validating dependability.

Two significant challenges arise in applying traditional safety processes to autonomous vehicles. First, simply gathering a complete set of system requirements is difficult because of the sheer number of combinations of possible scenarios and faults. Second, autonomy systems commonly use machine learning (ML) in a way that makes the requirements and design of the system opaque. After training, usually we know what an ML component will do for an input it has seen, but generally not what it will do for at least some other inputs until we try them. Both of these issues make it difficult to trace requirements and designs to testing as is required for executing a safety validation process. In other words, we’re building systems that can’t be validated due to incomplete or even unknown requirements and designs.

Adaptation makes the problem even worse by making the system that must be validated a moving target. In the general case, it is impractical to validate all the possible adaptation states of an autonomy system using traditional safety design processes.

An approach that can help with the requirements, design, and adaptation problems is basing a safety argument not on correctness of the autonomy functionality itself, but rather on conformance to a set of safety envelopes. Each safety envelope describes a boundary within the operational state space of the autonomy system.

A system operating within a “safe” envelope knows that it’s safe and can operate with full autonomy. A system operating within an “unsafe” envelope knows that it’s unsafe, and must invoke a failsafe action. Multiple partial specifications can be used as an envelope set, with the intersection of safe envelopes permitting full autonomy, and the union of unsafe envelopes provoking validated, and potentially complex, failsafe responses.

Envelope mechanisms can be implemented using traditional software engineering techniques, reducing the problems with requirements, design, and adaptation that would otherwise impede safety validation. Rather than attempting to prove that autonomy will always work correctly (which is still a valuable goal to improve availability), the envelope approach measures the behavior of one or more autonomous components to determine if the result is safe. While this is not necessarily an easy thing to do, there is reason to believe that checking autonomy behaviors for safety is easier than implementing perfect, optimized autonomy actions. This envelope approach might be used to detect faults during development and to trigger failsafes in fleet vehicles.

Inevitably there will be tension between simplicity of the envelope definitions and permissiveness, with more permissive envelope definitions likely being more complex. Operating in the gap areas between “safe” and “unsafe” requires human supervision, because the autonomy system can’t be sure it is safe.

One way to look at the progression from partial to full autonomy is that, over time, systems can increase permissiveness by defining and growing “safe” envelopes, shrinking “unsafe” envelopes, and eliminating any gap areas.

ACM Reference format:
P. Koopman, 2017. Challenges in Autonomous Vehicle Validation. In
Proceedings of 1st International Workshop on Safe Control of Connected
and Autonomous Vehicles, Pittsburgh, Pennsylvania, USA, April 2017
(SCAV 2017), 1 page.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is  granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Copyright is held by the owner/author(s).
SCAV'17, April 21-21 2017, Pittsburgh, PA, USA
ACM 978-1-4503-4976-5/17/04.

(Originally posted 4/23/2017)

Autonomous Vehicle Safety: An Interdisciplinary Challenge

Autonomous Vehicle Safety: An Interdisciplinary Challenge

By Phil Koopman & Mike Wagner

Ensuring the safety of fully autonomous vehicles requires a multi-disciplinary approach across all the levels of functional hierarchy, from hardware fault tolerance, to resilient machine learning, to cooperating with humans driving conventional vehicles, to validating systems for operation in highly unstructured environments, to appropriate regulatory approaches. Significant open technical challenges include validating inductive learning in the face of novel environmental inputs and achieving the very high levels of dependability required for full-scale fleet deployment. However, the biggest challenge may be in creating an end-to-end design and deployment process that integrates the safety concerns of a myriad of technical specialties into a unified approach.

Read the preprint version here for free (link / .pdf)

Official IEEE version (subscription required):  
DOI: 10.1109/MITS.2016.2583491

IEEE Intelligent Transportation Systems Magazine (Volume: 9, Issue: 1, Spring 2017, pp. 90-96)

"This would require a safety level of about 1 billion operating hours per catastrophic event. (FAA 1988)" should be
"This would require a safety level of about 1 billion operating hours per catastrophic event due to the failure of a particular function. (FAA 1988)"  (Note that in this context a "function" is something quite high level such as the ability to provide sufficient thrust from the set of jet engines mounted on the airframe.)

(Originally posted January 30, 2017)

Interview on Smarter Cars podcast

I really enjoyed the discussion with  Michele Kyrouz  on the Smarter Cars podcast. Thanks so much for inviting me on as a guest! https:...