Wednesday, November 25, 2020

Disengagements as a progress metric is a bad idea (Metrics Episode 2)

 We should be worried about road testing safety metrics, not disengagements.

A disengagement happens when the autonomy in a self driving car detects an internal problem, or a human test driver takes over control of a self driving car test platform because of safety concerns. Self driving car developers have to report these disengagements, for example, to California. The apparent rationale for requiring these reports is that all things being equal, disengagements per mile might decrease over time as technology matures. Along those lines, eventually when disengagements reached zero, you might think it’s time to deploy the vehicle without a human test driver. The problem is that this model is much too simplistic and more importantly, not all things are equal.

Let’s start with some basics. Not all miles are equal. If you wanted to game disengagements, you could do so by driving around an empty block in beautiful weather at 4:00 AM with no traffic, no pedestrians, nothing on the road, around and around in circles. You get a lot of miles. You wouldn’t learn much, but you get a lot of miles. That’s not at all the same as, for example, trying to drive across all 446 bridges in Pittsburgh during a blizzard. Those miles are just not the same. Another potential problem is that not all safety drivers are equal. Some safety drivers will be more prone to be cautious and others less cautious. Hopefully there is rigorous driver screening so that the test drivers safety drivers are the right amount of cautious, but in fact, this is still an area the industry’s working on. So even with the best intentions, all disengagements might not be equal.

Now what happens next? After this disengagement data is collected, the metrics get published and that leads the media to trend those published metrics into the great disengagement metric horse race. Pundits opine about which company is in the lead and companies who are ahead say, "yeah, look at our low disengagement rate" and so on. Now it’s hard to blame people for doing this because the developers operate in such great secrecy. That’s really the only progress metric out there, but it’s not a good metric. In fact, it’s probably a harmful metric. A big concern is that using disengagements as a metric provide strong incentives for behavior that make things worse instead of better, especially if you’re being judged in it for progress and maybe your next funding round depends on your disengagement metric.

Here’s a problem: the disengagement metric penalizes companies who tell their safety drivers to be extra safe by being extra quick to disengage. So, that means there’s an incentive to tell drivers to give the vehicles a little more slack, which might or might not be as safe as it should be. Now, I’m not saying that people necessarily do this intentionally, but in a very competitive environment, there’s going to be natural pressure to say, well, if it’s on the borderline, let it go to make our numbers look better and probably it’s safe enough. And people might convince themselves of that, even though they’re operating unsafely. 

Another problem is the metric penalizes companies who are working on difficult operational design domains and incentivizes them to chase easy miles. Now again, I’m not saying companies are doing this on purpose, but certainly the incentive is there. In fact, there are good reasons why a company making excellent progress would actually see their disengagements increase rather than decrease. Maybe the company’s expanded its operational design domain to handle more challenging situations. The week that they decide to start operating in rain, I’d imagine the disengagement rate would go up instead of down.

Another reason is maybe safety driver training has been improved and the policies have been changed to improve road test safety at the expense of increased disengagements. I’d love to see that kind of outcome, but it makes the metric look bad. 

Some companies filter the disengagement to say, okay, well we’re only going to report the disengagements that count and the problem is that’s a two edge sword. Sure, it makes sense not to report planned disengagements. If it’s the end of the testing day and you’re going to take the car back to the garage and you turn off autonomy, surely that disengagement should not count.

But because companies are being judged on disengagements, there’s also incentive to gain them a bit. For example, the driver might take over control and maybe it was a dangerous situation, maybe it wasn’t, but because of the pressure of metrics, the company decides to round down and attributed to something else when in fact probably the car should have been doing better and the disengagement should have at least partially counted. You might end up with under-reporting disengagements and that should be a cause for concern.

Let me give a couple of hypothetical examples of the kind of situations which could lead to this kind of bad outcome. For example, let’s say a company only reports disengagement if an after the fact simulation says yes, they would hit something. And then the car goes by a pedestrian and the safety driver disengages because it looks like it’s going to be kind of close to the pedestrian, they don’t want to take a chance. So far, so good. Now let’s say you would’ve missed the pedestrian by 10 to 20 feet. Okay, fine. That disengagement probably should not count, but what if you’re only going to miss the pedestrian by one inch? Well, you didn’t hit the pedestrian. You could say, well, that one doesn’t count because we didn’t hit anything. But I’m going to say missing a pedestrian by one inch, that one ought to count.

And so without details about how exactly this reporting has been done, we don’t really know what the numbers mean. 

Here’s another hypothetical example. Let’s say a test vehicle runs a red light, but it’s late at night and no one’s around. The driver looks around, the driver’s been told, unfortunately, to make sure the disengagement numbers look good. There’s no cross traffic. The driver says, you know what? I’m just going to let it run the red light because no harm will be done. There’s a situation where the disengagement doesn’t happen, the number looks good, but in that hypothetical scenario, the driver’s been incentivized to do unsafe things. That made up example brings to a head the real issue here. 

Disengagements might be useful input to some parts of an engineering process, but in a hyper competitive market, they provide all the wrong incentives for road test safety.  Really are we worried about progress? 

Do we really want the Departments of Transportation measuring progress of companies? Their job really has to be keeping people safe on the road. And so if the publicly reported data actually provides incentives to do road testing unsafely to make progress look good, that’s a problem. Historically, those kinds of incentives, they lead to a story that ends badly for everyone. 

Well, it’s interesting to know what progress might be made by the industry. But if you really care about safety, the thing you ought to be worried about right now is road testing safety. Now, some of the reporting data actually does help with that. For example, crashes being reported. Sure, that actually directly measures road testing safety. But all the disengagement metrics and all the buzz really doesn’t help road testing safety -- and in fact might be hurting it. It might be putting pressure on the companies to undermine road testing safety just to make their numbers look better.

I would recommend that California and any other government that’s doing this should stop forcing disengagement reporting and instead encourage the reporting of more productive metrics that are about road testing safety, not about the horse race to get a self driving car deployed. This is not a simple ask. This is actually a hard thing to do. But the industry should step up and propose metrics that have to do with road testing safety. That’s going to help them build trust with the public and it’s going to help the government agencies fulfill their responsibility to ensure that the road testing and eventual deployment is done in a safe, responsible manner.

To learn more, we recommend a paper our team published for SAE World Congress titled “Safety Argument Considerations For Road Testing of Autonomous Vehicles.” This paper gives guidance on a safety case for human supervision of road testing.

For the podcast version of this posting, see:

Thanks to podcast producer Jackie Erickson.

No comments:

Post a Comment

All comments are moderated by a human. While it is always nice to see "I like this" comments, only comments that contribute substantively to the discussion will be approved for posting.