Saturday, April 30, 2022

OTA updates won't save buggy autonomous vehicle software

There is a feeling that it's OK for software to ship with questionable quality if you have the ability to send out updates quickly. You might be able to get away for this with human-driven vehicles, but for autonomous vehicles (no human driver responsible for safety) this strategy might collapse.


Right now, companies are all pushing hard to do quick-turn Over The Air (OTA) software updates, with Tesla being the poster child of both shipping dodgy software and pushing out quick updates (not all of which actually solve the problem as intended). There is a moral hazard that comes with the ability to do quick OTAs in that you might not spend much time on quality since you know you can just send another update if the first one doesn't turn out as you hoped.

"There's definitely the mindset that you can fix fast so you can take a higher risk," Florian Rohde, a former Tesla validation manager   (https://www.reuters.com/article/tesla-recalls-idTRNIKBN2KN171)

For now companies across an increasing number of industries have been getting away with shipping lower quality software, and the ability to do internet-based updates has let them get by with such a strategy. The practice is so prevalent that the typical trouble-shooting for any software after "is the power turned on" has become "have you downloaded the latest updates."

But the reason this approach works (after a fashion) is that there is a human user or vehicle operator present to recognize something is wrong, work around the problem, and participate in the trouble shooting. In a fully automated vehicle, that human isn't going to be there to save the day.

What happens when there is no human to counter-act the defective and potentially fatally dangerous defective software behavior? The biggest weakness of any automation is typically that it is not "smart" enough to know when something is going wrong that is not supposed to happen. People are pretty good at this, which is why even for very serious software defects in cars we often see huge numbers of complaints compared to few actual instances of harm -- because human drivers have compensated for the bad software behavior.

Here's a concrete example of a surprising software defect pulled from my extensive list of problematic automotive software defects: NHTSA Recall 14V-204:

  • Due to software calibration error vehicle may be in and display "drive" but engage "reverse" for 1.5 seconds.

If a human driver notices the vehicle going the wrong direction they'll stop accelerating pretty quickly. They might hit something at slow speed during the reaction time, but they'll realize something is wrong without having explicit instructions for that particular failure scenario. In contract, a computer-based system that has been taught the car always moves in the direction of the transmission display might not even realize something is wrong and accelerate into collisions.

Obviously a computer can be programmed to deal with such a situation if it has been thought of at design time. But the whole point here is that this is something that isn't supposed to happen -- so why would you waste time programming a computer to handle an "impossible" event? Safety engineering deals with hazard analysis to mitigate low risk things, but even that often overlooks "impossible" events until after they've occurred. Sure, you can send an OTA update after the crash -- but that doesn't bring crash victims back to life.

In practice the justification that it is OK to ship out less-than-perfect automotive software has been that human drivers can compensate for problems. (In the ISO 26262 functional safety standard one takes credit for "controllability" in reducing the risk of a potential defect.) When there is no human driver, that's a problem, and shipping defective software is more likely to result in harm to a vehicle occupant or other road user before it can be noticed there is a problem for OTA to correct.

Right now, a significant challenge to OTA updates is the moral hazard that software will be a bit more dangerous than it should be due to pushing the boundaries of human driver ability to compensate for defects. With fully automated vehicles there will be a huge cliff of ability, and even small OTA update defects could result in large numbers of crashes across a fleet before there is time to correct the problem. (If you push a bad updated to millions of cars, you can have a lot of crashes even in a single day for a defect that affects a common driving situation.)

The industry is going all-in on fast and loose OTAs to be more "agile" and iterate software changes more quickly without worrying as much about quality. But I think they're heading right into a proverbial brick wall that will be hit when the human drivers are taken out of the loop. Getting software quality right will become more important than ever for fully autonomous vehicles.


 

Sunday, April 24, 2022

Maturity Levels for Autonomous Vehicle Safety

I've been on a personal journey to understand what safety really means for autonomous vehicles. As part of this I repeatedly find myself in conversations in which participants have wildly different notions of what it means to be "safe."  Here is an attempt to put some structure around the discussion:

Pyramid top to bottom: just culture / system safety / sotif / functional safety / hazard analysis / defensive driving / basic driving

An inspiration for this idea is Maslow's famous hierarchy of needs. The idea is that organizations developing autonomous vehicles have to take care of the lower levels before they might be able to afford at higher levels. For example, if your vehicle crashes every 100 meters because it struggles to detect obstacles in ideal conditions, worrying about nuances of lifecycle support won't get you your next funding round.

To succeed as a viable at-scale company, you need to address all the levels in the AV maturity hierarchy. But in reality companies will likely climb the levels like rungs in a ladder. To draw the parallel to Maslow's needs hierarchy, if a company is starving for cash to run its operations, they're going to care more about getting the next demo or funding milestone compared to lifecycle safety considerations. That will only change when venture funding bakes higher levels of this safety maturity hierarchy into their milestones. 

  • Basic driving functionality: the vehicle works in a defined environment without hitting any objects or other road users on the defined scale of a funding milestone demo. When people say that their vehicle is safe because it has high crash safety ratings, that aligns with this bin. (I personally prefer my safety to happen without the part where the vehicle crashes.)
  • Defensive driving: vehicle has expert driving skills, actively avoiding driving situations that present increased risk. This is analogous to sending a human driver to defensive driving school. At some point the automated driver becomes expert in terms of being able to drive in failure-free situations.
  • Systematic hazard analysis: engineering effort has been spent analyzing and mitigating risks not just from driving functions, but also potential technical malfunctions, forced exits from the intended operational design domain, etc.  (For example, HARA from ISO 26262.) Common hazards that aren't easy or inexpensive to mitigate might well be pushed onto the driver (e.g., incomplete redundancy to handle component failure, or required driver intervention to mitigate risks).
  • Functional safety: analysis and redundancy have been added, and a principled approach (e.g., based on safety integrity levels) has been taken to ensure risks from technical faults in the system have been mitigated (e.g., ISO 26262 conformance).
  • Safety of the Intended Function (SOTIF): ensuring "unknowns" have been addressed, dealing with environmental influences (e.g., not all radar pings will be returned), closing gaps in requirements, and accounting for aspects of machine learning. (e.g., ISO 21448 conformance).
  • System level safety: accounting for things beyond just the driving task, including lifecycle considerations. Ensuring that hazard analysis and mitigation extends to process aspects, and a safety case has been used to ensure acceptable safety. (e.g., ANSI/UL 4600)
    Cybersecurity needs to be addressed to achieve system safety, but should not wait to get started until reaching this level.
  • Just Safety Culture: operating and continuously improving the organization and execution of other levels of the hierarchy according to Just Culture principles rather than blame.
    Specific common anti-patterns for Just Culture relevant to autonomous vehicles are:

As with the Maslow hierarchy the levels are not exclusive. Rather all levels need to operate concurrently, with the highest concurrently active level indicating progress toward safety maturity.

You might see this differently, see some things I've missed, etc. Comments welcome!

Tuesday, April 12, 2022

Cruise Stopped by Police for Headlights Off -- Why Is This a Big Deal?

In April 2022 San Francisco police pulled over an uncrewed Cruise autonomous test vehicle for not having its headlights on. Much fun was had on social media about the perplexed officer having to chase the car a few meters after it repositioned during the traffic stop. Cruise said it was somehow intentional behavior. They also said their vehicle "did not have its headlights on because of a human error" (Source: https://www.theguardian.com/technology/2022/apr/11/self-driving-car-police-pull-over-san-francisco

Police officer stops Cruise vehicle in San Francisco

The traffic behavior indicates that Cruise needs to do better making it easier for local police to do traffic stops -- but that's not the main event. The real issue here is the headlights being off.

Cruise said in a public statement: "we have fixed the issue that led to this." Forgive me if I'm not reassured. A purportedly completely autonomous car (the ones that are supposed to be nearly perfect compared to those oh-so-flawed human drivers) that always drives at night didn't know to turn its headlights on when driving in the dark. Seriously? Remember that headlights on at night is not just for the AV itself (which might not need them in good city lighting with a lidar etc.) but also to be visible to other road users. So headlights off at night is a major safety problem. (Headlights on in day is also a good idea -- at least white forward-facing running lights, which this vehicle also did not seem to have on.)

Cruise says these vehicles are autonomous -- no human driver required. And they only test at night, so needing headlights can hardly be a surprise. But they got it wrong. How can that be? 

This can't just be hand-waved away. Something really fundamental seems broken, and it is just a question of what it is.

The entire AV industry, including Cruise, has a history of extreme secrecy and in particular lack of transparency with safety. So we can only speculate. Here are the possibilities I can think of. None of them inspire confidence in safety, and they tend to get worse as we go down the list.

  1. The autonomy software had a defect that didn't turn headlights on at night. Perhaps, but seems unlikely. That would (one assumes) affect the entire fleet. And there should be software checks to make sure the headlights turn on as a safety check built into the software. If true, this never should have slipped through quality control let alone a rigorous safety critical software design process, and indicates significant concerns with overall software safety.
  2. The headlight switch is supposed to be on at all times. Many (most?) vehicles have "smart" lights. You can turn them off if you want, but in practice you just turn it to "on" and leave it there for months or years and the headlights just do the right thing, switching from daytime running lights to full on automatically, and turning off when the vehicle does. If you're in urban San Francisco high beams are unlikely to be relevant. So the autonomy doesn't mess with the lights at all. Except -- why does the software not check to see that the vehicle condition is safe in terms of headlights? Seems like a design oversight. How did a hazard like this get missed in hazard analysis? If this is the situation, this really calls into question whether the hazard analysis was done properly for the software. Even if this was fixed, what else did hazard analysis miss?
    • 2a) A passenger in the vehicle turned the headlights off as a prank. If this is possible, even more important for this to be called out for software monitoring in the hazard analysis. But the check for headlights off obviously isn't there now. 
  3. The software is completely ignorant of headlight state, and there is a maintenance tech who is supposed to turn the lights "on" as part of the check-out process each day to prepare the vehicle to run. This manual headlight-on check didn't get done. This is a huge issue with the Safety Management System, because if they missed that check what other checks did they miss? There are plenty of things to check on these vehicles for safety (e.g., sensor calibration, sensor cleaning/cleaner systems, correct software configuration). If they forget to turn on the headlight switch, what else did they forget? While this might be a "within normal tolerance" deviation, given lack of safety transparency Cruise doesn't get the benefit of the doubt. A broken SMS puts all road users at risk. This is a big deal unless proven otherwise. Firing, training, or having a stern talk with the human who made the "error" won't stop it from happening again. Blaming the driver/pilot/human never really does.
  4. There is no SMS. That's basically how Uber ATG ended up killing a pedestrian in Tempe Arizona. If this is the case, it is even scarier.
Again, we don't really know the situation because Cruise is trying the "nothing to see here .. move along" response to this incident. But none of these scenarios is comforting.  If I had to guess my money would be on #3, simply because #4 would be too irresponsible to have to contemplate. But really, we have no way to know what's really going on. And it might be another alternative I have not considered.

Cruise should take this as a wakeup call to get their safety house in order before they have a big event. Blaming safety critical failures on "human error" is generally indicative of a poor safety culture. They have a chance here to turn the ship around -- before there is harm to a road user.

Is there a scenario I missed that is less of a concern? Maybe Cruise will give us a substantive explanation. If they do I'll be happy to post it here as a follow-up.

-----------------------------------------------------------------------------------

Kyle Vogt at Cruise sent me a response on LinkedIn on 4/16/2022:

Kyle Vogt
CEO at Cruise

I’d like to respectfully disagree with your characterization of this event. Allow me to provide some context.

We apply SMS rigorously across the company, which as you probably know includes estimating the safety risk for known hazards and having a process to continuously surface new ones.

We then apply resources accordingly to attempt to reduce the overall safety risk of our service as effectively as possible. This is a continuous process and we will continue even though we’ve passed the point where we can operate without backup drivers.

Rarely would this direct us to focus on things that are extremely unlikely to result in injury, such as lack of headlights on a single vehicle for a short duration of time. We have a process in place to ensure they are functioning properly, but the safety impact of that process failing is low.

Our use of SMS directs the majority of our resources towards higher risk areas like pedestrian and cyclist interactions. These are far more complex and of higher potential severity.


I firmly believe that’s this is the right approach, even if it means our vehicles occasionally do something that seems obviously wrong to a human.


-----------------------------------------------------------------------------------

@Kyle Vogt: Thank you for responding to my post. I am glad to hear that you have a Safety Management System in place now. This is excellent news.
Hopefully your SMS folks are telling you that the headlight-off incident should be treated as a serious safety process issue rather than being sloughed off as a "no big deal." A healthy safety culture would acknowledge this is a process failure that needs to be corrected rather than making excuses based on "safety impact of that process failing is low." Process failures in safety critical system operations are a problem -- period. An attitude that less critical processes don't matter is easily toxic for safety culture. Safety culture is how you do business, and if disregarding procedures that are seen as less than the most critical is how you do business, eventually that will catch up with you resulting in loss events. More simply: discounting a low severity process failure is at odds with your statement that you "apply SMS rigorously across the company."
I would say that headlights off has a substantive potential severity for other road users because they are signaling/warning devices to help people know to get out of the way if your vehicle should malfunction. For example, with headlights off you cannot take controllability credit for the other road user jumping out of your way if you fail to detect them, because they won't see headlights as an indicator of your approach at night. Moreover, having headlights on is the law.
Regardless of your statement that you are focusing on higher severity issues, a process failure for something as simple and obvious as making sure headlights are on at night is very disconcerting, and raises concerns over your safety process quality in general. A more transparent analysis of how that process failed and whether other higher severity process failures are occurring that have not been made public would be the right thing to do here to restore public trust. If this is the only time the headlights were off and your other pre-mission procedures have a high compliance rate, then OK, stuff happens. However -- if pre-mission procedures are hit or miss and you don't even keep track of procedural errors that don't result in a scary pedestrian near-miss, that is quite another. If you don't have the procedural metrics to show you are operating safely, then probably you aren't operating safely. Which is the case?
Your GM-issued VSSA from 2018 seems outdated and has no mention of an SMS. So it is difficult for the public to know what your plan is for safety. More transparency and communication regarding your safety process is essential to help build public trust.
-----------------------------------------------------------------------------------