Saturday, December 12, 2020

Operational Design Domain Metrics (Metrics Episode 11)

Operational Design Domain metrics (ODD metrics) deal with both how thoroughly the ODD has been validated as well as the completeness of the ODD description. How often the vehicle is forcibly ejected from its ODD also matters.

Operational Design Domain metrics (ODD metrics) deal with both how thoroughly the ODD has been validated as well as the completeness of the ODD description.

An ODD is the designer’s model of the types of things that the self-driving cars intended to deal with. The actual world, in general, is going to have things that are outside the ODD. As a simple example, the ODD might include fair weather and rain, but snow and ice might be outside the ODD because the vehicle is intended to be deployed in a place where snow is very infrequent.

Despite designer’s best efforts, it’s always possible for the ODD to be violated. For example, if the ODD is Las Vegas in the desert, this system might be designed for mostly dry weather or possibly light rain. But in fact, in Vegas, once in a while, it rains and sometimes it even snows. The day that it snows the vehicle will be outside its ODD, even though it’s deployed in Las Vegas.

There are several types of ODD safety metrics that can be helpful. One is how well validation covers the ODD. What that means is whether the testing, analysis, simulation and other validation actually cover everything in the ODD, or have gaps in coverage.

When considering ODD coverage it’s important to realize that ODDs have many, many dimensions. There are much more than just geo-fencing boundaries. Sure, there’s day and night, wet versus dry, and freeze versus thaw.  But you also have traffic rules, condition of road markings, the types of vehicles present, the types of pedestrians present, whether there are leaves on the tree that affect LIDAR localization, and so on.  All these things and more can affect perception, planning, and motion constraints.

While it’s true that a geo-fence area can help limit some of the diversity in the ODD, simply specifying a geo-fence doesn’t tell you everything you need to know, and you’ve covered all the things that are inside that geo-fenced area. Metrics for ODD validation can be based on a detailed model of what’s actually in the ODD -- basically an ODD taxonomy of all the different factors that have to be handled and how well testing, simulation, and other validation cover that taxonomy.

Another type of metric is how well the system detects ODD violations. At some point, a vehicle will be forcibly ejected from its ODD even though it didn’t do anything wrong, simply due to external events. For example, a freak snowstorm in the desert, a tornado or the appearance of a new type of completely unexpected vehicle and force a vehicle out of its ODD with essentially no warning. The system has to recognize when it has exited its ODD and be safe. A metric related to this is how often ODD violations are happening during testing and on the road after deployment.

Another metric is what fraction of ODD violations are actually detected by the vehicle. This could be a crucial safety metric, because if an ODD violation occurs and the vehicle doesn’t know it, it might be operating unsafely. Now it’s hard to build a detector for ODD violations that the vehicle can’t detect (and such failures should be corrected). But this metric can be gathered by root cause analysis whenever there’s been some sort of system failure or incident. One of the root causes might simply be failure to detect an ODD violation.

Coverage of the ODD is important, but an equally important question is how good is the ODD description itself? If your ODD description is missing many things that happen every day in your actual operational domain (the real world,), then you’re going to have some problems.

A higher level of metric to talk about is ODD description quality. That is likely to be tied to other metrics already mentioned in this and other segments. Here are some examples. The frequency of ODD violations can help inform the coverage metric of the ODD against the operational domain. Frequency of motion failures could be related to motion system problems, but could also be due to missing environmental characteristics in your ODD. For example, cobblestone pavers are going to have significantly different surface dynamics than a smooth concrete surface and might come as a surprise when they are encountered. 

Frequency of perception failures could be due to training issues, but could also be something missing from the ODD object taxonomy. For example, a new aggressive clothing style or new types of vehicles. The frequency of planning failures could be due to planning bugs, but could also be due to the ODD missing descriptions of informal local traffic conventions.

Frequency of prediction failures could be prediction issues, but could also be due to missing a specific class of actors. For example, groups of 10 and 20 runners in formation near a military base might present a challenge if formation runners aren't in training data. It might be okay to have an incomplete ODD so long as you can always tell when something is happening that forced you out of the ODD. But it’s important to consider that metric issues in various areas might be due to unintentionally restricted ODD versus being an actual failure of the system design itself.

Summing up, ODD metric should address how well validation covers the whole ODD and how well the system detects ODD violations. It’s also useful to consider that a cause of poor metrics and other aspects of the design might in fact be that the ODD description is missing something important compared to what happens in the real world.

For the podcast version of this posting, see: https://archive.org/details/metrics-12-odd-metrics

Thanks to podcast producer Jackie Erickson.


No comments:

Post a Comment

All comments are moderated by a human. While it is always nice to see "I like this" comments, only comments that contribute substantively to the discussion will be approved for posting.