Friday, December 4, 2020

Road Testing Safety Metrics (Metrics Episode 3)

To build trust, self-driving car companies should be transparent about operational safety metrics for road testing, such as how often the safety driver fails to react to an issue. This won't be perfect, but it ought to be at least as safe as normal human drivers on public roads.

Right now when you see what looks like a self driving car on the road, it’s not really a production self driving car -- it’s a self driving car technology test platform. That usually means that some human someplace in the car remotely, whatever, is keeping an eye on it and making sure things are safe. In turn if you asked the question, “How safe is that car,” you’re not actually asking about the safety of the self driving technology pretty much at all. What you’re asking is whether or not that human is able to properly supervise the safety and do the right thing when something goes wrong. So if you care about the safety of this public on road testing of this maturing technology what you care about is the human safety driver’s performance. To understand that let’s talk about the types of things that safety driver has to do.

The safety driver has to build and maintain situational awareness and know what’s supposed to happen next. The safety driver has to notice something is going wrong and intervene at exactly the right time. While this is happening the safety driver is under some pressure to balance getting operational data with maintaining safety. After all, sitting in the garage with the car turned off doesn’t get the data that they’re out in the public roads to get. As they’re operating the safety driver has to figure out how to react when other drivers do things that are weird, illegal, dangerous, or just plain crazy.  And sometimes it’s the car itself that misbehaves due to a software defect in the self driving technology. When something bad happens, the safety driver has to execute a takeover maneuver and make sure that they don’t make things worse.

Now, that sounds like a lot to do but the fact is that when things are going well it’s an exceptionally boring job. The car just keeps doing the thing it’s supposed to be doing and the driver sits there watching and waiting. Every once in a while something will go wrong and the safety driver has to make sure that not only do they have situational awareness, and that they have an idea of what to do next, but also that they’re not startled into doing the wrong thing. Rolling all that up, supervising an autonomous test platform is exceptionally difficult work and requires the highest of driving skills. And yes, we actually do get to the part where the car has to behave properly, but it has nothing to do with the self driving technology -- it has to do with the emergency override. If the human safety driver wants to override the vehicle then they have to be able to turn off the self driving feature and get control of the vehicle.

It turns out getting that mostly right is straight forward but getting it 100% right takes a lot of care and attention. Taking all this into account, you start realizing that disengagements, which is the measure we use now, is actually the wrong metric. In fact it’s exactly the wrong metric. That's because disengagements aren’t what make you dangerous, the thing that makes you the most dangerous when you’re testing technology is non disengagements: a failure to disengage when you should have. In other words, the hazards that you did not catch with the human safety driver that’s what matters.

Now, measuring something that doesn’t happen is not straightforward but in fact if you want metrics for safety that’s what you have to go after. The starting point isn’t really quite metrics at all but rather processes and what you hear is the more thoughtful companies are starting to emphasize things like good driver qualifications, they have training, they have good reaction times. Driver testing, go out on a test track, make sure the driver can handle surprises and make sure the driver can maintain situational awareness. Yes, the disengagement mechanism has to work. The proverbial big red button needs to work all the time not just most of the time. Fortunately there’s a safety standard ISO 26262 which tells you exactly how to make that kind of system so these companies should be following that standard with their disengagement mechanism.

Now there’s an issue about whether or not training once is enough.  It’s not. You need refreshers at least, and so refresher training is also a good idea. Many of the companies stop with that. But the problem with doing only training is that you think you’re good enough but you don’t know you’re actually good enough unless you take operational data. 

Sure, the driver can pay attention on test day and maintain engagement. But can a safety driver maintain that same peak performance for six months of day in, day out testing that’s really boring because mostly things work pretty well? Humans aren’t perfect, and that means even really good safety drivers aren’t perfect.  That’s just the way it is. So what you really want is not just training but also operational metrics to make sure that the safety drivers are able to be as good as they need to be to reach your safety goal.

Here are some metrics that could work internally for an engineering effort:

  • You could measure the attention of the driver throughout a shift. Maybe you’ll find out, as might be expected, that after many hours the driver has trouble maintaining attention. That wouldn’t be a surprise. Some companies are doing two hour shifts. While for active driving that might be a good shift size, there’s lots of data for several decades suggesting that for supervising autonomy you need shorter shifts, maybe only 30 minutes. So do you have data showing that your two hour shifts are okay? Or maybe you need a shorter shift?  Without data you won’t know that.
  • You might also want to measure the intervention accuracy. Did the driver actually know if their blind spot was clear when they made a lane change, or did they just get lucky when they did a takeover and a lane change? Knowing the fraction of time that drivers do the right thing is important. It’s not going to be 100%, but you just need to make sure it’s good enough to achieve your safety goal. 
  • And every once in a while if the driver tries to do a takeover and the vehicle doesn’t respond you really need to know that that happened.

Now those metrics are somewhat detailed. It might not be the kind of thing that a government agency or the public would really be in a position to process well. So you want some roll up metrics as well. But these are mostly things that have to do with non disengagements rather than disengagements. 

To really deal with road testing safety, the number one concern has got to be times when the vehicle did something dangerous and the safety driver did not do the right compensating action. Your drivers aren’t going to be perfect. Are you planning on just blaming them when there’s a mishap or are you actually taking measurements to find out if you’re hitting performance targets?  So if you have a target for how much the driver has to pay attention and how much is okay to lose attention for a second or two, then you need to know if you’re hitting that target and it’s not going to be perfection.

You probably also want some metrics for rule violations. 

  • For example, running a stop sign even though nothing bad happened, is still a bad thing. The number of times that happens is not going to be zero, but it should be so low that it’s dramatically better than normal human drivers for example.
  • You probably want to measure how often you unreasonably encroach on a bike lane even if there’s no bicyclists there.
  • You might want to measure how often you fail to yield to a pedestrian. Most pedestrians aren’t going to jump out in front of a car to get hit, but the fact that they had to back off and let you pass means you weren’t behaving the way you were supposed to. 
  • Probably the most important thing to measure is near misses (or as some people call them near hits) where the margin was just too small even if you got lucky and nothing bad actually happened. For example, maybe you’re supposed to leave three feet of clearance to a bicyclist. (And at high speeds I would hope you do leave at least that, but if you’re down at two or three miles an hour crawling through a dense urban area and you're two feet away from a bicyclist, maybe that’s okay.) You need a definition of near misses that responds to the particular situation but ultimately you want to know how often you had a near miss, meaning you were in a situation more dangerous than it was supposed to be. If your safety driver catches it, then great. But if your safety driver does not, that means you’re operating taking chances that you didn’t intend to take, and knowing that is super important.

No safety program and no safety metric is going to be perfect. It’s unreasonable to expect self driving car testing to be absolutely perfect with not even a fender bender. Surely mishaps will happen especially when interacting with human drivers. But there has to be some sort of strategy. 

A viable strategy might be that you use above average safety drivers who have to deal with the extra complexity of the safety platforms, but they come out overall to be at least as good as normal human drivers. You’re making a big ask to have these safety drivers be better than normal human drivers and compensate for all the difficulties with this technology. Probably it can be done, but you really need to know you got there instead of training the drivers, sending them out on roads and simply hoping that you’re safe enough. You really want some metrics. 

To build trust companies should be exposing at least the high level rollup metrics to the public. They have nothing to do with the secret sauce behind the self driving car but they have everything to do with the safety of the general public as this technology is being tested on public roads.

For the podcast version of this posting, see: https://archive.org/details/metrics-04-road-testing-metrics

Thanks to podcast producer Jackie Erickson.



No comments:

Post a Comment

All comments are moderated by a human. While it is always nice to see "I like this" comments, only comments that contribute substantively to the discussion will be approved for posting.