SDV failures might have poor architectural cohesion and coupling as a critical factor.
We’re reading about high profile software fiascos in car companies, and how they might be handling them, for example: The $5B VW bet on Rivian; Volvo refunding car owners over poor software. And don’t forget a steady stream of recalls over infotainment screen failures related to vehicle status indication and rear-view cameras.
There are business forces at play here to be sure, such as a mad rush to catch up to Tesla for EVs. But I think there might be a system architecture issue that is also playing an outsized role — both technical and business.
The technical side of this issue is directly related to the move from a bunch of boxes from a tiered supplier system to a single big computer that is a key aspect of so-called Software Defined Vehicles.
Architectural coupling and cohesion
Two key architectural principles that differentiate a good architecture from a bad one are cohesion and coupling. High cohesion is good; low coupling is good. The opposite can easily kill a system due to drowning in complexity.
Here are some definitions:
Cohesion: how well all the functions in a particular hardware or software module are related. Are they all cousins (high cohesion)? Or is it miscellaneous cats and dogs, with a hamster tossed in for good measure (low cohesion)? As an analogy, an apartment building (all apartments) has higher cohesion than a mixed use building (shops+dining+offices+apartments+garage+metro station). Low cohesion might have some advantages, but it is more complex to maintain and operate.
Coupling: how many data connections there are into/out of each module. Low coupling is good (a few cleanly defined data types); high coupling bad. High coupling amounts to data flow spaghetti. Not the tasty kind of spaghetti — the kind that causes system design failures analogous to spaghetti code, but in the middleware universe. As a more strained analogy, think of an apartment with a dozen exit doors — a different one for going to the shops, office, a neighbor, the garage, the metro, the sidewalk, the cafeteria, your patio, etc — and what it means to check to make sure all the exit doors are locked at bed time.
The old days: technology & supplier approaches incentivized low coupling and high cohesion
In traditional vehicles, the use of wires to connect different Electronic Control Units (ECUs) placed an artificial limit on coupling. In the old days you only got so many wires in a wire bundle before it would no longer fit in the available space in the car. And with the transition to networks, you only got so many messages per second on a comparatively slow CAN bus (250K or 500K bits/sec in the bad old days).
Moreover, in the old days each ECU more or less did a single function created by a single supplier. This was due in large part to a functional architecture approach in which OEMs could mix-and-match functions inside dedicated boxes from different suppliers.
Sure, there was duplication and potential wasteful use of compute resources. But a single box doing a single function hung onto a low-bandwidth network cable had no choice but to get high cohesion and low coupling.
New technology removes previous architectural incentives
Now we have much higher speed networks (perhaps hundreds of megabits/sec, with the sky being the limit). And if all the software is on the same computer, dramatically faster than that.
We also have middleware that is pushing software architectures from procedure-passing data based on flows of control to publish/subscribe broadcast models (pub/sub). Sure, that started with CAN, but has gotten a lot more aggressive with the advent of higher speed interconnects and middleware frameworks such as the one provided by AUTOSAR.
The combination of higher connection bandwidth between modules and pub/sub middleware has effectively removed the technical cost of high coupling.
Now we are promised that Software Defined Vehicles will let us aggregate all the functions into a single big box (or perhaps a set of medium-big boxes). With high bandwidth networks. And all sorts of functions all competing for resources on shared hardware.
High bandwidth communications, pub/sub models, centralized hardware, and centralized software implicitly incentivize approaches with high coupling and low cohesion.
SDV architectures destroy all the incentives that pushed toward low coupling and high cohesion in older system designs. You should expect to get what you incentivize. Or in this case, stop getting what you have stopped incentivizing.
Any student of system architecture should find it no surprise that we’re seeing systems with high coupling (and likely low cohesion) in SDVs. Accompanied by spectacular failures.
I’m not saying it is impossible to design a good architecture with current system design approaches. And I’m not saying the only solution is to go back to slow CAN networks. Or go back 50 years in software/system architectures. And I’m only arguing a tiny bit that painful constraints build character. What I’m saying is that the incentives that used to push designers to better architectures have evaporated.
Business architecture imitating technical architecture
Consider Conway’s law: organizations and systems they create tend to have similar structures. In my experience this is a two-way street. We can easily get organizations that evolve to match the architecture of the system being built. It is possible that the software/system architecture itself is the tail, and the dog is that organizations over time have aligned themselves with low cohesion/high coupling system approaches, and are therefore suffering from these same architectural problems.
So it might not be the middleware-centric architectural trends that are as much of a problem as the way those reflect back into the organizations that create them.
Despite the headline I don’t think the SDV is actually dead. But the failures we’re seeing will not be resolved simply by hiring more and smarter people. There are some fundamental issues in architecture that need to be addressed. With incentives to do so that are strategic rather than tactical, making it harder to explain the return on investment for doing so. Beyond that, there are some serious issues with how software engineering practices have evolved and their suitability for life critical systems.
I think the spectacular failures are just getting started. It will take some really fundamental changes to get things back on track. And probably more corporate fiascos.
No comments:
Post a Comment
All comments are moderated by a human. While it is always nice to see "I like this" comments, only comments that contribute substantively to the discussion will be approved for posting.