Brute force mileage accumulation to fix all the problems you see on the road won't get you all the way to being safe. For so very many reasons...
This A400m crash was caused by corrupted software calibration parameters. |
It's a moving target: A significant problem with argumentation based on fly-fix-fly
occurs when designers try to take credit for previous field testing despite a
major change. When arguing field testing safety, it is generally inappropriate
to take credit for field experience accumulated before the last change to the
system that can affect safety-relevant behaviour. (The “small change” pitfall
has already been discussed in another blog post.)
Discounting Field Failures: There is no denying the intuitive appeal to an
argument that the system is being tested until all bugs have been fixed.
However, this is a fundamentally flawed argument. That is because this amounts
to saying that no matter how many failures are seen during field testing, none
of them “count” if a story can be concocted as to how they were
non-reproducible, fixed via bug patches, and so on.
To the degree such an argument could be credible, it would have to find and fix the root cause of essentially all field failures. It would further have to demonstrate (somehow) that the field failure root cause had been correctly diagnosed, which is no small thing, especially if the fix involves retraining a machine-learning based system.
To the degree such an argument could be credible, it would have to find and fix the root cause of essentially all field failures. It would further have to demonstrate (somehow) that the field failure root cause had been correctly diagnosed, which is no small thing, especially if the fix involves retraining a machine-learning based system.
Heavy Tail: Additionally, it would
have to argue that a sufficiently high fraction of field failures have actually
been encountered, resulting in a sufficiently low probability of encountering
novel additional failures during deployment. (Probably this would be an incorrect argument if, as expected, field failure types have a heavy tail distribution.)
Fault Reinjection: A fly-fix-fly argument must also address both the
fault reinjection problem. Fault reinjection occurs when a bug fix introduces a
new bug as a side effect of that fix. Ensuring that this has not happened via
field testing alone requires resetting the field testing clock to zero after
every bug fix. (Hybrid arguments that include rigorous engineering analysis are
possible, but simply assuming no fault reinjection without any supporting
evidence is not credible.)
It Takes Too Long: It is difficult to believe an argument that claims
that a fly-fix-fly process alone (without any rigorous engineering analysis to
back it up) will identify and fix all safety-relevant bugs. If there is a large
population of bugs that activate infrequently compared to the amount of field
testing exposure, such a claim would clearly be incorrect. Generally speaking,
fly-fix-fly requires an infeasible amount of operation to achieve the ultra-dependable
results required for life critical systems, and typically makes unrealistic
assumptions such as no new faults are injected by fixing a fault identified in
testing (Littlewood and Strigini 1996). A specific issue is the matter of edge
cases, discussed in an upcoming blog post.
(This is an excerpt of our SSS 2019 paper: Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019. Read the full text here)
(This is an excerpt of our SSS 2019 paper: Koopman, P., Kane, A. & Black, J., "Credible Autonomy Safety Argumentation," Safety-Critical Systems Symposium, Bristol UK, Feb. 2019. Read the full text here)
- Littlewood, B., Strigini, L. (1993) “Validation of Ultra-High Dependability for Soft-ware-Based Systems,” Communications of the ACM, 36(11):69-80, November 1993.
No comments:
Post a Comment
All comments are moderated by a human. While it is always nice to see "I like this" comments, only comments that contribute substantively to the discussion will be approved for posting.