Friday, June 21, 2024

Time to Formally Define Level 2+ Vehicle Automation

We should formally define SAE Level 2+ to be a feature that includes not only Level 2 abilities but also the ability to change its travel path via intersections and/or interchanges. Level 2+ should be regulated in the same bin as SAE Level 3 systems.

There is a lot to unpack here, but ultimately doing this matters for road safety, with much higher stakes over the next 10 years than regulating completely driverless (Level 4/5) robotaxi and robotruck safety. Because Level 2+ is already on the roads, doing real harm to real people today.

First, to address the definition folks who are losing it over me uttering the term "2+" right now, I am very well aware that SAE J3016 outlaws notation like "Level 2+". My suggestion is to change things to make it a defined term, since it is happening with or without SAE's blessing, and we urgently need consistently defined term for the things that everyone else calls Level 2+ or Level 2++. (Description and analysis of SAE Levels here. Myth 5 talks about Level 2+ in particular.)

From a safety point of view, we've known for decades that when you take away steering responsibility the human driver will drop out, suffering from automation complacency. There have been enough fatalities from plain features said to be Level 2 (automated lane keeping + automated speed), such as cars under-running crossing big rigs, that we know this is an issue.  But we also have ways of trying to address this by requiring a combination of operational design domain enforcement and camera-based driver monitoring. This will take a while to play out, but the process has started. Maybe regulatory intervention will eventually resolve the worst of those issues. Maybe not -- but let's leave that for another day.

What's left is the middle ground between next-gen-cruise-control features (lane centering + automated speed) and vehicles that aspire to be robotaxis or robotrucks but aren't quite there. That middle ground includes a human driver so the designers can keep the driver in the loop to avoid and/or blame for crashes. If you thought plain Level 2 had problems with automation complacency, Level 2+ says “hold my beer.” (Have a look at the concept of the moral crumple zone. And do not involve beer in driving in any way whatsoever.)

Expecting a normal human being to pay continuous hawk-like attention for hours while a car drives itself almost perfectly is beyond credibility. And dangerous, because things might seem fine for lots and lots of miles — until the crash comes out of the blue and the driver is blamed for not preventing it. Telling people to pay attention isn’t going to cut it. And I really have my doubts that driver monitoring will work well enough to ensure quick reaction time after hours of monotony.

People just suck at paying attention to boring tasks and reacting quickly to sudden life-threatening failures. And blaming them for sucking won’t stop the next crash. I think the car is going to have to be able to actively manage the human rather than the human managing the car, and the car will have to ensure safety until the human driver has time to re-engage with the driving task (10 seconds, 30 seconds, maybe longer sometimes). That sounds more like a Level 3 feature than a Level 2 feature from a regulatory point of view.

Tesla FSD is the poster child for Level 2+, but over the next 5 years we will see a lot more companies testing these waters as they give up on their robotaxi dreams and settle for something that almost drives itself -- but not quite.

The definition I propose is Level 2+ is a feature that meets the requirements for Level 2 but also is capable of changing roadways at an intersection and/or interchange. 

Put simply, if it drives you down a single road, it's Level 2. But if it can make turns or use an exit/entrance ramp it is Level 2+.

One might pick different criteria, but this has the advantage of being simple and relatively unambiguous. Lane changing on the same roadway is still Level 2. But you are at Level 2+ once you start doing intersections, or go down the road (ha!) of recognizing traffic lights, looking at traffic for unprotected left turns, and so on. In other words, almost a robotaxi -- but with a human trying to guess when the computer driver will make a mistake and then potentially getting blamed for a crash.

No doubt there will be minor edge cases to be clarified, probably having to do with the exact definition of “roadway”. Or someone can propose a good definition for that word that takes care of the edge cases. The point here is not to write detailed legal wording, but rather to get the idea across of making turns at an intersection being the litmus test for Level 2+.  

From a regulatory point of view, Level 2+ vehicles should be regulated the same as Level 3 vehicles. I realize Level 2+ is not necessarily a strict subset of Level 3, but the levels were never intended to be a deployment path, despite the use of a numbering system. I think they both share a concern of adequate driver engagement when needed in a system that is essentially guaranteed to create driver complacency and slow reaction times due to loss of situational awareness.

How does this look in practice? In the various bills floating around federal and state legislatures right now, they should include a definition of Level 2+ (Level 2 + intersection/interchange capability) and group it with Level 3 for whatever regulatory strategy they propose. Simple as that.

If SAE ORAD wants to take up this proposal for SAE J3016 that's fine too. (Bet some committee members are reading this — happy to discuss at the next meeting if you’re willing to entertain it.) But that document disclaims safety as being out of its scope, so what I care about a lot more are the regulatory frameworks that are currently near-toothless for the not-quite-robotaxi Level 2+ features already being driven on public roads.

Note: Based on proposed legislation I've seen, pulling Level 2+ into the Level 3 bin is the most urgent and viable path to improve regulatory oversight of this technology in the near to mid term. If you really want to do away with the levels I have a detailed way to do this, noting that the cut-line for Supervisory is at Level 2 rather than Level 2+, but is otherwise compatible with this essay. If you want to use the modes but change the cut line, let’s talk about how to do that without breaking anything.

Note: Tesla fans can react unfavorably to my essays and social media posts. To head off some of the “debate” — yes, navigate-on-autopilot counts as Level 2+ in my view. And we have the crashes to prove it. And no, Teslas are not dramatically safer than other cars by any credible analysis I’ve ever seen.

Monday, June 17, 2024

Perspective on Waymo's Safety Progress

I am frequently asked what I think of Waymo's progress on safety.  Here are some thoughts on progress, mishaps, and whether we know they are acceptably safe. At current deployment rates it will take Waymo about 20 years with ZERO fatalities to show they are net as safe as average human driver fatality rates (including the old cars, impaired drivers, etc. in that comparison baseline). Their current statements that they are already saving lives are hype.

Safety at scale still remains the biggest question.  And even with reasonable growth rates that question will remain open for many years for robotaxi technology.  With Waymo currently in the lead, there are even more question marks for the other players with regard to safety.


Waymo has made impressive progress in scaling up operations. Some had previously criticized their ramp-up for being slower than other companies, but they are looking a lot smarter these days for having done that. 

We've seen some recent incidents (for example the utility pole crash) and an investigation from NHTSA. I hope those are not signs that they have started scaling up faster than they should due to funding pressure.

This piece in Forbes notes that Waymo is now doing more than 50,000 paid rides a week across three cities and plans to do more launches.  

Sounds like a lot!  But from a safety point of view not enough to really know how things will turn out.  

Waymo is disingenuously messaging that they are already saving lives, but the truth is nobody knows how that will turn out yet.  At this rate they will need perhaps 20 years without a single fatality (see math check below) to show they are no worse than an average US human driver. And that is under some wildly favorable assumptions (e.g., software updates never create a new defect -- which is not how things work in the real world.). So for practical purposes the bar is set at perfection right now. We'll have to see how things turn out.

It certainly feels like Waymo has been more aggressive lately, perhaps because they are feeling pressure to show progress to justify further investment with a good news story. The danger is if Alphabet puts too much pressure on Waymo to expand too fast that could generate a bad news story instead of a good one. What happened at Cruise provides a strong cautionary tale for the whole industry. Let's hope Waymo is not pushed into making the same mistakes.

Sunday, June 16, 2024

Truths & Myths About Automated Vehicle Safety -- Video Series

The past year has seen both peak hype and significant issues for the automated vehicle industry. In this talk we recap general trends and summarize the current situation for autonomous vehicles such as robotaxis, as well as conventional vehicles that have automated steering features. Many of the issues the industry faces are self-inflicted, stemming from a combination of inflated promises, attempts to scale immature technology too aggressively, and an overly narrow view of safety. Overall, the companies deploying the technology have failed to address legitimate concerns of a wide variety of stakeholders. There are a number of different aspects that still need to be addressed including: legislation, regulation, liability, insurance, driver skills, traffic enforcement, emergency services, vulnerable road users, engineering standards, business models, public messaging, investor pressure, cultural change, ethical/equity concerns, and local oversight. We concentrate on how all these pieces need to fit together to create sustainable automated vehicle technology approaches.

Truths & Myths About Automated Vehicle Safety

All Released Videos: YouTube Play List | Archive.org big video

Slide deck (acrobat)

Individual Videos:






Saturday, June 15, 2024

The Waymo Utility Pole Crash

Waymo vs. utility pole smackdown: the utility pole won. No apparent extenuating circumstances.
Nobody was injured; the vehicle was empty. The pole suffered a minor dent but is still in service.

This video has an interview with the passenger who was waiting for pickup in Phoenix: https://www.youtube.com/watch?v=HAZP-RNSr0s Waymo did not provide a comment for the story.


Now the Waymo utility pole safety recall report is out (https://static.nhtsa.gov/odi/rcl/2024/RCLRPT-24E049-1733.PDF). Interesting that the vehicle was executing a pullover maneuver at the time it hit the pole. From a validation point of view I'll bet it could go down the center of the alleyway just fine in normal driving, but what bit them was the combination of pulling to the side of the road and a pole happening to be in what the vehicle thought was a safe pullover area due to the map.

Still not addressed is how utility poles being assigned a "low damage score" could have made it all the way through peer reviews, simulation, and road testing -- and needed to be found in a crash which might been worse in other circumstances.

This serves as a stark reminder that these vehicles lack common sense, in this case "thinking" that running smack into a utility pole was no big deal. They are subject to software defects as are all computer-based systems. We still don't know if/when they will be better than human drivers at reducing fatalities. But we know for sure they will make unforced errors in driving, hype notwithstanding.
This is also a good reminder that safety validation needs to consider all operational modes, and it is common for the problems to crop up in unusual or failure recovery modes. While there is no indication of an equipment malfunction in this particular case, safety in abnormal mission termination modes is notoriously difficult because there might also be malfunctioning equipment that triggered the system mode change.

Description of defect: "Prior to the Waymo ADS receiving the remedy described in this report, a collision could occur if the Waymo ADS encountered a pole or pole-like permanent object and all of the following were true: 1) the object was within the the boundaries of the road and the map did not include a hard road edge between the object and the driveable surface; 2) the Waymo ADS’s perception system assigned a low damage score to the object; 3) the object was located within the Waymo ADS’s intended path (e.g. when executing a pullover near the object); and 4) there were no other objects near the pole that the ADS would react to and avoid. "

Since I've had a number of questions here is my best shot at clarifying the collision mechanism:
  • The alleyway is marked as drivable, because the entire alley road surface is in fact all drivable (no curb; mostly garage entranceways) -- except for utility poles once in a while 
  • The robotaxi's computer driver saw the utility pole in question, and correctly classified it as "utility pole".
  • A human drive would know "hitting utility pole == bad". However, some data structure somewhere in the computer driver was set that "hitting utility pole == OK". This applies to ALL utility poles EVERYWHERE, not just this particular utility pole.
  • So the computer driver drove smack into the utility pole thinking it was OK, when if fact it was not.

There was no mapping error involved in the collision. Changing the map is a workaround only.

I can speculate that somewhere there is an object classification system, and somehow (probably manually or semi-manually) each object type has an attribute of "OK to hit?" The global utility pole one was set incorrectly. There are other possibilities, but this is the simplest one.

What is shocking is that such a mistake could make it through quality, safety validation, and testing processes.


Academic Publishing, AI, and Broken Incentives

Since I've been a part of the academic publication ecosystem for decades (as have some of my readers), I'll share some thoughts on the latest publication quality scandals:

  • Junior faculty are suffering from an ever-increasing squeeze for publication and citation metrics. This has been an issue with growing severity as long as I've been at a university.
  • Some specialty areas find it easier to publish than others for many legitimate reasons.
  • ChatGPT is just the latest form of shady authoring technique.
  • The academic paper mills are symptoms, not the root cause. Effectively they are exploiting doctoral student => postdoc => junior faculty suffering for profit.
  • Shutting down paper mills does not fix the problem. It just forces it to manifest some other way.
  • All these pressures combined with MUCH more difficulty in raising research funding means that those same faculty are pressed to peer review more papers with lower quality with far less time -- and effectively no professional compensation for doing so.
  • On-line venues that exist to increase the number of publication slots available are just the latest form of publication metric arms races.
  • The top-tier conferences in my field still have excellent peer review. But it is no surprise that other publication venues do not. And the situation is ripe for scam artists to set up what amount to fake journals with fake reviews to make money from low-cost web publications. Sometimes established publication names are duped. Other times it is worse.
  • The academic publication cash grab has been a thing for a long, long time. (I recall $100 page charges for professional society journals when I was in grad school.) It has simply been weaponized along with the greater erosion of the publication industry we've been seeing overall.
It does feel like this problem has "gone exponential" lately. But it is a long-standing trend (look up Beall's list: https://en.wikipedia.org/wiki/Beall%27s_List) The problem won't be fixed just by killing off the most abusive journals, because the root causes run much deeper.

Ultimately this is difficult to solve because interdisciplinary evaluation committees find it easiest to count number of "scholarly" publications, and things like an H-index. "Impact" is hard to measure, and likely takes longer than the assistant=>associate professor timeline allows.

This is a systemic problem that will require senior faculty across the university ecosystem to resolve via deep cultural change. Clutching our pearls over ChatGPT showing up in papers should be directed to calling attention to the root problems (hence this post).

This is my personal perspective as an academic who, in the worst case, might be forced to retire as already planned in a few months. I hope that many junior faculty who do not feel they can speak up feel seen, even if I can't offer you a solution.





Saturday, June 8, 2024

SAE J3018 for operational safety

 Any company testing on public roads should conform to the industry standard for road testing safety: SAE J3018. 


When AV road testing first started, it was common for testers to claim that they were safe because they had a "safety driver." However, as was tragically demonstrated in the Tempe AZ testing fatality in 2018, not all approaches to safety driving are created equal.  Much more is required. Fortunately, there is an SAE standard that addresses this topic.
SAE J3018_202012 "Safety-Relevant Guidance for On-Road Testing of Prototype Automated Driving System (ADS)-Operated Vehicles" (https://www.sae.org/standards/content/j3018_202012/ -- be sure to get the 2020 revision) provides safety relevant guidance for road testing. It concentrates on guidance for the "in-vehicle fallback test driver" (also known informally as the safety driver).

The scope of J3018 includes:

  • Fallback test driver training (classroom, simulation, track, on-road)
  • Workload management
  • Selection of test routes
  • Pre-trip protocol (checklist, inspection)
  • Test driver monitoring
  • Test driver post-test debrief
  • Incident response protocol

AV testers should be conform to J3018 to ensure that they are following identified best practices for safety driver training and effectiveness. Endorsing this standard will avoid a DOT having to create their own driver qualification and training requirements.


Taking a deeper look at J3018, it seems a bit light on measuring whether the safety driver is actually providing effective risk mitigation. Rather, it seems to implicitly assume that training will necessarily result in acceptable road testing safety. While training and qualification of safety drivers is essential, it is prudent to also monitor safety driver effectiveness, and testers should be asked to address this issue. Nonetheless, J3018 is an excellent starting point for testing safety. Testers should be doing at least what is in J3018, and probably more.


J3018 does cost money to read, and the free preview is not particularly informative. However, there is a free copy of a precursor document available here: https://avsc.sae-itc.org/principles-01-5471WV-42925L3.html  that will give a flavor of what is involved. That having been said, any DOT guidance or requirement should follow J3018, and not the AVSC precursor document.
In addition to following J3018, the safety-critical mechanisms for testing should be designed to conform to the widely used ISO 26262 functional safety standard. (This is not to say that the entire test vehicle -- which is still a work in progress -- needs to conform to 26262 during testing. Rather, that the "Big Red Button" and any driver takeover functions need to conform to 26262 to make sure that the safety driver can really take over when necessary.)


For cargo vehicles that will deploy without drivers, J3018 can still be used by installing a temporary safety driver seat in the vehicle. Or the autonomy equipment can be mounted on a conventional vehicle in a geometry that mimics the cargo vehicle geometry. When the time comes to deploy without a driver physically in the system, you are really testing an autonomous vehicle with a chase car or remote safety supervisor, covered in a following section on testing without a driver.

Wednesday, June 5, 2024

Live Talk: Challenges in Autonomous Vehicle Safety Assessment

Challenges in Autonomous Vehicle Safety Assessment

Recorded live at a US DOT workshop on May 29, 2024.

Topics:
- Brute force testing is impracticable
- Robot error
- Machine learning & the Vee model
- Industry standards adoption
- Beyond statistical net safety



Tuesday, June 4, 2024

Waymo's Misleading Claim of Saving Lives

Waymo claims they are "already saving lives" so often, and people are so taken in by that misleading claim, that I'm going to take a moment to explain why it is misleading. And especially harmful when used as justification for loose regulatory policies as it so often is.

The claim:  "The data to date indicates the Waymo Driver is already reducing traffic injuries and fatalities."   Here is the claim, which has been at the top of the Waymo Safety landing page for quite a while now (https://waymo.com/safety/  as of June 4, 2024;  highlighted of those words added):


Having had high school English, I would interpret that sentence as also including an unproven claim of "already reducing fatalities" being supported by data.   And I would expect that anyone authoring this sentence would reasonable expect a reader or listener to conclude "already reducing fatalities."  Those listeners include federal and state regulators and legislators.

This claim is absurd for a simple reason.  US driving data shows human-driven vehicles have ballpark 1 fatal crash per 100M miles (varies by year, zip code, etc. -- for more nuance see this narrated video slide which is in terms of fatal crashes, noting that some such crashes have multiple fatalities).  But their latest study is for only 7.1 million miles. They need something like 40 times more data prove they are actually saving lives with statistical confidence (likely even more).

What is really going on here seems to be some sort of word game that is essentially guaranteed to mislead readers. Their 7.1 million mile study talks about a bin called "any-injury-reported" crashes that were lower than human-driven vehicles, and fatalities are a subset of that bin.  So the claim being made is (apparently) the bin containing fatalities is better than human drivers. Without mention that the sample size is too small for valid conclusions on fatalities.  So maybe they have saved about 0.07 or perhaps even 0.10 lives depending on the baseline you use for human drivers -- and maybe not.

But don't just take my word for it, see for yourself this excerpt from Waymo's own paper saying "Serious injury and fatalities are a subset of this any-injury-reported benchmark, but no statement on these outcome levels can be made at this time based on this retrospective data."  In other words, Waymo does not have enough data to know how fatalities will turn out.  That's the truth. Waymo's safety landing page claim is something other than the full truth.

Waymo paper:  "Comparison of Waymo Rider-Only Crash Data to Human Benchmarks at 7.1 Million Miles"  https://arxiv.org/pdf/2312.12675  (top of page 15; highlight added)









Monday, June 3, 2024

Four views into the Cruise Robotaxi Pedestrian Dragging Mishap

 On October 2, 2023, a Cruise robotaxi dragged a woman 20 feet underneath the vehicle in San Francisco. The circumstances of the mishap and everything else are complex. But the robotaxi industry was profoundly shaken.


Here are four descriptions of the events and what might be learned from those events. Each is in a different style, intended for a different audience.

Additional content:
  • A video podcast where I walk through the mishap events with Junko Yoshida & Bolaji Ojo: https://youtu.be/OaF6IbYoVHQ
  • Cruise also maintains a safety page. A the time of this writing the big fonts are used to say "transparent about safety" and "continuous improvement". So far not a lot of detail about what has changed and what transparency will mean as they get back on the road.