Showing posts from December, 2020

Safety Performance Indicator (SPI) metrics (Metrics Episode 14)

SPIs help ensure that assumptions in the safety case are valid, that risks are being mitigated as effectively as you thought they would be, and that fault and failure responses are actually working the way you thought they would. Safety Performance Indicators, or SPIs, are safety metrics defined in the Underwriters Laboratories 4600 standard. The 4600 SPI approach covers a number of different ways to approach safety metrics for a self-driving car, divided into several categories. One type of 4600 SPI safety metric is a system-level safety metric. Some of these are lagging metrics such as the number of collisions, injuries and fatalities. But others have some leading metric characteristics because while they’re taken during deployment, they’re intended to predict loss events. Examples of these are incidents for which no loss occurs, sometimes called near misses or near hits, and the number of traffic rule violations. While by definition, neither of these actually results in a loss, it’s

Conformance Metrics (Metrics Episode 13)

Metrics that evaluate progress in conforming to an appropriate safety standard can help track safety during development. Beware of weak conformance claims such as only hardware, but not software, conforms to a safety standard. Conformance metrics have to do with how extensively your system conforms to a safety standard.  A typical software or systems safety standard has a large number of requirements to meet the standard, with each requirement often called clauses. An example of a clause might be something like "all hazards shall be identified" and another clause might be "all identified hazard shall be mitigated."  (Strictly speaking a clause is typically a numbered statement in the standard in the form of a "shall" requirement that usually has a lot more words in it than those simplified examples.) There are often extensive tables of engineering techniques or technical mitigation measures that need to be done based on the risk presented by each hazard. F

Surprise Metrics (Metrics Episode 12)

You can estimate how many unknown unknowns are left to deal with via a metric that measures the surprise arrival rate.  Assuming you're really looking, infrequent surprises predict that they will be infrequent in the near future as well. Your first reaction to thinking about measuring unknown unknowns may be how in the world can you do that? Well, it turns out the software engineering community has been doing this for decades: they call it software reliability growth modeling. That area’s quite complex with a lot of history, but for our purposes, I’ll boil it down to the basics. Software reliability growth modeling deals with the problem of knowing whether your software is reliable enough, or in other words, whether or not you’ve taken out enough bugs that it’s time to ship the software. All things being equal, if a complete same system test reveals 10 times more defects in the current release than in the previous release, it’s a good bet your new release is not as reliable as your

Operational Design Domain Metrics (Metrics Episode 11)

Operational Design Domain metrics (ODD metrics) deal with both how thoroughly the ODD has been validated as well as the completeness of the ODD description. How often the vehicle is forcibly ejected from its ODD also matters. Operational Design Domain metrics (ODD metrics) deal with both how thoroughly the ODD has been validated as well as the completeness of the ODD description. An ODD is the designer’s model of the types of things that the self-driving cars intended to deal with. The actual world, in general, is going to have things that are outside the ODD. As a simple example, the ODD might include fair weather and rain, but snow and ice might be outside the ODD because the vehicle is intended to be deployed in a place where snow is very infrequent. Despite designer’s best efforts, it’s always possible for the ODD to be violated. For example, if the ODD is Las Vegas in the desert, this system might be designed for mostly dry weather or possibly light rain. But in fact, in Vegas, on

Prediction Metrics (Metrics Episode 10)

You need to drive not where the free space is, but where the free space is going to be when you get there. That means perception classification errors can affect not only the "what" but also the "future where" of an object. Prediction metrics deal with how well a self driving car is able to take the results of perception data and predict what happens next so that it can create a safe plan.  There are different levels of prediction sophistication required depending on operational conditions and desired own-vehicle capability. The first, simplest prediction capability is no prediction at all. If you have a low speed vehicle in an operational design domain in which everything is guaranteed to also be moving at low speeds and be relatively far away compared to the speeds, then a fast enough control loop might be able to handle things based simply on current object positions. The assumption there would be everything’s moving slowly, it’s far away, and you can stop your v

Perception Metrics (Metrics Episode 9)

Don’t forget that there will always be something in the world you’ve never seen before and have never trained on, but your self driving car is going to have to deal with it. A particular area of concern is correlated failures across sensing modes. Perception safety metrics deal with how a self driving car takes sensor inputs and maps them into a real-time model of the world around it.  Perception metrics should deal with a number of areas. One area is sensor performance. This is not absolute performance, but rather with respect to safety requirements. Can a sensor see far enough ahead to give accurate perception in time for the planner to react? Does the accuracy remain sufficient given changes in environmental and operational conditions? Note that for the needs of the planner, further isn’t better without limit. At some point, you can see far enough ahead that you’ve reached the planning horizon, and sensor performance beyond that might help with ride comfort or efficiency but is not

Planning Metrics (Metrics Episode 8)

Planning metrics should cover whether the plan paths are actually safe. But just as important is whether plans work across the full ODD and account for safety when pushed out of the ODD by external events. Planning metrics deal with how effectively a vehicle can plan a path through the environment, obstacles, and other actors. Often, planning metrics are tied to the concept of having various scenarios and actors that a vehicle might encounter when dealing with the various behaviors, maneuvers, and other considerations. In another segment, I’ll talk about how the system builds a model of the external world. For now, let’s assume that the self driving car knows exactly where all the objects are and what their predicted trajectories and behaviors are. The objective is typically to make progress in navigating through the scenario without hitting things. Some path planning metrics are likely to be tied closely to the motion safety metrics. A self driving car that creates a path plan that in

Motion Metrics (Metrics Episode 7)

Approaches to safety metrics for motion ultimately boil down to a combination of Newton’s laws. Implementation in the real world also requires the ability to understand, predict, and measure both the actions of others and the environmental conditions that you’re in. The general idea of a metric for motion safety is to determine how well a self-driving car is doing at not hitting things. One of the older metrics is called "time-to-collision." In its simplest form, this is how long it will take for two vehicles to collide if nothing changes. For example, if one car is following another and the trailing car is going faster than the leading car, eventually, if nothing changes, they’ll hit. How long that will take depends on the relative closing speed and gives you the time-to-collision. The general idea is that the shorter the time, the higher the risk, because there’s less time for a human driver to react and intervene. There are more complicated formulations of this concept tha

Leading and Lagging Metrics (Metrics Episode 6)

You'll need to use leading metrics to decide when it's safe to deploy, including process quality metrics and product maturity metrics. Here are some examples of how leading and lagging metrics fit together. Ultimately, the point of metrics is to have a measurement that tells us if a self-driving car will be safe enough. For example, whether it will be safer than a human driver. The outcome we want is a measure of how things are going to turn out on public roads. Metrics that take direct measurements of the outcomes are called lagging metrics because they lag after the deployment. That’s things like number of crashes that will happen, number of fatal crashes, and so on. To be sure, we should be tracking lagging metrics to identify problems in the fleet after we’ve deployed.  But that type of metric doesn’t really help with a decision about whether to deploy in the first place. You really want some assurance that self-driving cars will be appropriately safe before you start deplo

Coverage Driven Metrics (Metrics Episode 5)

Coverage based metrics need to account for both the planning and the perception edge cases, possibly with two separate metrics. It takes way too many road miles to be able to establish whether a self driving car is safe by brute force. Billions of miles of on-road testing are just not going to happen.  Sometimes people say, “Well, that’s okay. We can do those billion miles in simulation.” While simulation surely can be helpful, there are two potential issues to this. The first is that simulation has to be shown to predict outcomes on real roads. That’s a topic for a different day, but the simple version is you have to make sure that what the simulator says actually predicts what will happen on the road. The second problem, which is what I’d like to talk about this time, is that you need to know what to feed the simulation.  Consider that if you hypothetically drove a billion miles on the real road, you’re actually doing two things at the same time. The first thing is you’re testing to

Using a Driving Test as a Safety Metric (Metrics Episode 4)

The part of the driver test that is problematic for self-driving cars is deciding that the system has maturity of judgement. How do you check that it is old enough to drive? At some point, companies are done testing and they need to make a decision about whether it’s okay to let their vehicles actually be completely self-driving, driverless cars. The important change here is that there is no longer a human responsible for continuously monitoring operational safety. That changes the complexion of safety, because now there’s no human driver to rely upon to take care of anything weird that might go wrong. That means you’ll need different metrics for safety when you deploy compared to those used during road testing. One way people talk about knowing that it’s time to deploy is to give the car a road test, just like a human gets a driver’s test. Everyone wants something quick, cheap, and painless to decide whether their self-driving car is ready to go, and a driver test has some intuitive a