Tuesday, March 1, 2022

Maturation path for safety & security practices

Brief informal notes from a wrap-up quick position statement talk I did at a workshop today.

Both safety and security have a lot in common in terms of how they are maturing over time. Without getting into a religious debate about the difference between them, I note that their trajectory seems to include the following steps, especially for autonomous systems. I'd argue that each step is in a sense more mature than the previous step.

  1. Get the system to work. Safety/security can come later.
  2. Get the system to work almost all the time. Conflate this with safety/security even though you're still really just getting it to work in the common cases (safety for a vehicle is "doesn't hit stuff" while security is "doesn't get taken down by the usual continuous stream of automated attacks")
  3. Brute force problem fixes:   fly/crash/fix/fly (air) and drive/crash/fix/drive (ground)
  4. Create a set of best practices in the nature of a building code ("build your system this way")
    • Create a useful fiction that you have completely characterized the requirements and operational environment and that your building code will always work.
    • Any failure is an embarrassing piece of bad news that violates the fiction of complete understanding.
  5. As system matures, complain about false alarm safety/security shutdowns
    • It might feel like this means your system has problems, but in fact you're a lot more safe and secure than systems that operate oblivious to their vulnerabilities
  6. Start permitting breaking the building code standard rules by arguing that exceptions still result in equivalent safety/security
  7. Evolve to full-up deductive assurance cases to argue safety/security beyond building codes
    • Still maintain the fiction of complete knowledge of requirements and environment
  8. Start operating in more open environments and admit you didn't really understand requirements, nor environment
    • Spend a lot of time chasing down problems that reveal defects in your safety case (safety case does not match environmental assumptions, or might not even match deployed system)
  9. Switch to an inductive safety case approach:
    • Account for risk from epistemic uncertainty (unknown unknowns)
    • Instrument system for failure precursors (e.g., safety performance indicators tied to safety case claims)
    • Treat incidents as an opportunity to fix problems before there is a loss event.

No comments:

Post a Comment

All comments are moderated by a human. While it is always nice to see "I like this" comments, only comments that contribute substantively to the discussion will be approved for posting.