Safe Autonomy: safety culture

Showing posts with label safety culture. Show all posts

Friday, October 27, 2023

The Cruise Safety Stand-Down -- What Happens Next?

Cruise has announced a fleet-wide safety stand-down. This involves suspending driverless operations in all cities, reverting to operation only with in-vehicle safety drivers.

I'm glad to see this step taken. But it is crucial to realize that this is the first step in what it likely to prove a long journey. The question is, what should happen next?

Loss of public trust is an issue as they say. And perhaps there was an imminent ban in another state forcing their hand to be proactive. But the core issues almost certainly run deeper than mismanaging disclosure of the details of a tragic mishap and doing damage control with regulatory trust.

The real issues will be found to have their roots in the culture of the company. Earnest, smart employees with the best of intentions can be ineffective at achieving acceptable safety if the corporate culture undermines their official company slogan of "safety first, always."

This is the time to ask the hard questions. The answers might be even harder, but they need to be understood for Cruise to survive long term. It is questionable whether they could survive a ban in another state. But escaping that via a stand-down only to implement a quick fix won't be enough. If we see business as usual restored in the next few weeks, that is almost certainly a shallow fix. It will simply be a matter of time before a future adverse situation happens from which there will be no recovery.

This is the moment for Cruise to decide to lean into safety.

The details are too much for a post like this, but the topics alone indicate the scope of what has to be considered:

Safety engineering -- Have they effectively identified and mitigated risks?
Operational safety -- Safety procedures, inspections, maintenance, management of Operational Design Domain limits responsive to known issues, field data feedback, etc. This includes ensuring their Safety Management System (SMS) is effective.
System engineering -- Do the different pieces work together effectively? This includes all the way from software in 3rd party components to vehicle integration to ability of remote operators to effectively manage gaps in capabilities ... and more
Public messaging and regulatory interface -- Building genuine trust, starting with more transparency. Stop the blame game; accept accountability. Own it.
Investor expectations -- Determine a scaling plan that is sustainable, and figure out how to fund it in the likely case it is longer than what was previously promised
Definition of acceptable safety -- More concrete than seeing how it turns out based on crash data, with continual measurement of predictive metrics
Safety culture -- Which underlies all of the above, and needs to start at the top.

And I'm sure there are more; this is just a start.

Near-term, the point of a safety stand-down is to stabilize a situation during uncertainty. The even more important part comes next: the plan to move forward. It will take weeks to take stock and create a plan, with the first days simply used to organize how that is going to happen. And months to execute that plan. Fortunately for Cruise there is an existing playbook that can be adapted from Uber ATG's experience with their testing fatality in 2018. Cruise should already have someone digging into that for initial ideas.

An NTSB-style investigation into this mishap could be productive. I think such an investigation would be likely to bring to light new issues that will be a challenge to the whole industry involving expectations for defensive driving behaviors and post-crash safety. If NTSB is unable to take that on, Cruise should find an independent organization who can do something close. But such an investigation is not the fix, and cultural improvements at Cruise should not wait for one to conclude. However, an independent investigation can be the focal point for deeper understanding of the problems that need to be addressed.

------------------------------------------------

Philip Koopman is a professor at Carnegie Mellon University in Pittsburgh Pennsylvania, USA, who has been working on self-driving car safety for more than 25 years. https://users.ece.cmu.edu/~koopman/

Friday, October 14, 2022

The Software Defined Vehicle Is Still More Wish Than Reality

Here is a Software Defined Vehicle video that covers a lot of ground. Car companies are all talking a big game about adding software to their vehicles, including big data, software updates, connectivity, and more. The possibilities are exciting, but you only have to read the news to know that the road to get there is proving bumpier than they'd like. (See this story too.)

Getting the mix of Silicon Valley software + automotive system integration + vehicle automation technology right is still a big challenge. This video talks about the possibilities. But to get there, OEMs still have a lot of work to do achieving a viable culture that addresses inherent tensions:

Cutting edge cloud software vs. life critical embedded systems
Role of automation vs. realistic expectations of human drivers
A shift from "recall" mentality to continuous improvement processes
Fast updates vs. assured safety integrity
Role of suppliers vs. OEM, especially for autonomous vehicle functions
Monetizing data vs. consumer rights
OEMs stepping up to the system integration challenges
Getting a regulatory approach that balances risks and benefits across all stakeholders

(Sadly, the video includes an incorrect statement that "95% to 96% of the accidents happen because of distracted driving" in the context of fatalities. Drivers are not perfect, but distracted driving only contributes to about 9% of fatalities per US DOT, about one-tenth of what was stated.)

YouTube: https://youtu.be/T4wEe2bNSSk

Tuesday, April 12, 2022

Cruise Stopped by Police for Headlights Off -- Why Is This a Big Deal?

In April 2022 San Francisco police pulled over an uncrewed Cruise autonomous test vehicle for not having its headlights on. Much fun was had on social media about the perplexed officer having to chase the car a few meters after it repositioned during the traffic stop. Cruise said it was somehow intentional behavior. They also said their vehicle "did not have its headlights on because of a human error" (Source: https://www.theguardian.com/technology/2022/apr/11/self-driving-car-police-pull-over-san-francisco)

Police officer stops Cruise vehicle in San Francisco

The traffic behavior indicates that Cruise needs to do better making it easier for local police to do traffic stops -- but that's not the main event. The real issue here is the headlights being off.

Cruise said in a public statement: "we have fixed the issue that led to this." Forgive me if I'm not reassured. A purportedly completely autonomous car (the ones that are supposed to be nearly perfect compared to those oh-so-flawed human drivers) that always drives at night didn't know to turn its headlights on when driving in the dark. Seriously? Remember that headlights on at night is not just for the AV itself (which might not need them in good city lighting with a lidar etc.) but also to be visible to other road users. So headlights off at night is a major safety problem. (Headlights on in day is also a good idea -- at least white forward-facing running lights, which this vehicle also did not seem to have on.)

Cruise says these vehicles are autonomous -- no human driver required. And they only test at night, so needing headlights can hardly be a surprise. But they got it wrong. How can that be?

This can't just be hand-waved away. Something really fundamental seems broken, and it is just a question of what it is.

The entire AV industry, including Cruise, has a history of extreme secrecy and in particular lack of transparency with safety. So we can only speculate. Here are the possibilities I can think of. None of them inspire confidence in safety, and they tend to get worse as we go down the list.

The autonomy software had a defect that didn't turn headlights on at night. Perhaps, but seems unlikely. That would (one assumes) affect the entire fleet. And there should be software checks to make sure the headlights turn on as a safety check built into the software. If true, this never should have slipped through quality control let alone a rigorous safety critical software design process, and indicates significant concerns with overall software safety.
The headlight switch is supposed to be on at all times. Many (most?) vehicles have "smart" lights. You can turn them off if you want, but in practice you just turn it to "on" and leave it there for months or years and the headlights just do the right thing, switching from daytime running lights to full on automatically, and turning off when the vehicle does. If you're in urban San Francisco high beams are unlikely to be relevant. So the autonomy doesn't mess with the lights at all. Except -- why does the software not check to see that the vehicle condition is safe in terms of headlights? Seems like a design oversight. How did a hazard like this get missed in hazard analysis? If this is the situation, this really calls into question whether the hazard analysis was done properly for the software. Even if this was fixed, what else did hazard analysis miss?

2a) A passenger in the vehicle turned the headlights off as a prank. If this is possible, even more important for this to be called out for software monitoring in the hazard analysis. But the check for headlights off obviously isn't there now.

The software is completely ignorant of headlight state, and there is a maintenance tech who is supposed to turn the lights "on" as part of the check-out process each day to prepare the vehicle to run. This manual headlight-on check didn't get done. This is a huge issue with the Safety Management System, because if they missed that check what other checks did they miss? There are plenty of things to check on these vehicles for safety (e.g., sensor calibration, sensor cleaning/cleaner systems, correct software configuration). If they forget to turn on the headlight switch, what else did they forget? While this might be a "within normal tolerance" deviation, given lack of safety transparency Cruise doesn't get the benefit of the doubt. A broken SMS puts all road users at risk. This is a big deal unless proven otherwise. Firing, training, or having a stern talk with the human who made the "error" won't stop it from happening again. Blaming the driver/pilot/human never really does.
There is no SMS. That's basically how Uber ATG ended up killing a pedestrian in Tempe Arizona. If this is the case, it is even scarier.

Again, we don't really know the situation because Cruise is trying the "nothing to see here .. move along" response to this incident. But none of these scenarios is comforting. If I had to guess my money would be on #3, simply because #4 would be too irresponsible to have to contemplate. But really, we have no way to know what's really going on. And it might be another alternative I have not considered.

Cruise should take this as a wakeup call to get their safety house in order before they have a big event. Blaming safety critical failures on "human error" is generally indicative of a poor safety culture. They have a chance here to turn the ship around -- before there is harm to a road user.

Is there a scenario I missed that is less of a concern? Maybe Cruise will give us a substantive explanation. If they do I'll be happy to post it here as a follow-up.

-----------------------------------------------------------------------------------

Kyle Vogt at Cruise sent me a response on LinkedIn on 4/16/2022:

Kyle Vogt
CEO at Cruise

I’d like to respectfully disagree with your characterization of this event. Allow me to provide some context.

We apply SMS rigorously across the company, which as you probably know includes estimating the safety risk for known hazards and having a process to continuously surface new ones.

We then apply resources accordingly to attempt to reduce the overall safety risk of our service as effectively as possible. This is a continuous process and we will continue even though we’ve passed the point where we can operate without backup drivers.

Rarely would this direct us to focus on things that are extremely unlikely to result in injury, such as lack of headlights on a single vehicle for a short duration of time. We have a process in place to ensure they are functioning properly, but the safety impact of that process failing is low.

Our use of SMS directs the majority of our resources towards higher risk areas like pedestrian and cyclist interactions. These are far more complex and of higher potential severity.

I firmly believe that’s this is the right approach, even if it means our vehicles occasionally do something that seems obviously wrong to a human.

-----------------------------------------------------------------------------------

My response on 4/17/2022:

@Kyle Vogt: Thank you for responding to my post. I am glad to hear that you have a Safety Management System in place now. This is excellent news.
Hopefully your SMS folks are telling you that the headlight-off incident should be treated as a serious safety process issue rather than being sloughed off as a "no big deal." A healthy safety culture would acknowledge this is a process failure that needs to be corrected rather than making excuses based on "safety impact of that process failing is low." Process failures in safety critical system operations are a problem -- period. An attitude that less critical processes don't matter is easily toxic for safety culture. Safety culture is how you do business, and if disregarding procedures that are seen as less than the most critical is how you do business, eventually that will catch up with you resulting in loss events. More simply: discounting a low severity process failure is at odds with your statement that you "apply SMS rigorously across the company."
I would say that headlights off has a substantive potential severity for other road users because they are signaling/warning devices to help people know to get out of the way if your vehicle should malfunction. For example, with headlights off you cannot take controllability credit for the other road user jumping out of your way if you fail to detect them, because they won't see headlights as an indicator of your approach at night. Moreover, having headlights on is the law.
Regardless of your statement that you are focusing on higher severity issues, a process failure for something as simple and obvious as making sure headlights are on at night is very disconcerting, and raises concerns over your safety process quality in general. A more transparent analysis of how that process failed and whether other higher severity process failures are occurring that have not been made public would be the right thing to do here to restore public trust. If this is the only time the headlights were off and your other pre-mission procedures have a high compliance rate, then OK, stuff happens. However -- if pre-mission procedures are hit or miss and you don't even keep track of procedural errors that don't result in a scary pedestrian near-miss, that is quite another. If you don't have the procedural metrics to show you are operating safely, then probably you aren't operating safely. Which is the case?
Your GM-issued VSSA from 2018 seems outdated and has no mention of an SMS. So it is difficult for the public to know what your plan is for safety. More transparency and communication regarding your safety process is essential to help build public trust.

-----------------------------------------------------------------------------------