top of page
publicmvclockwork

Complexity vs Reliability and Mission-worthiness

Updated: 5 days ago

I have an unusual job or field... "technical decision making in the face of unknown/unknowable/very-low-probability-risk" (wonder why I never get invited to parties?). The design of things is a technical decision-making process. I used to live on planes.. 2M miles... going around and mostly fixing designs with statistically infrequent but what would likely be high-consequence (catastrophic) failures. Not individual machines... the design from which the copies of the machines are created. I've done a pretty substantial number of talks on the intersection of design and system reliability and THE villain thru all of that is system complexity. I'm used to what the questions are, particularly from people who don't work in design or stochastic risk and solution, so a quick blog entry to present an analogy I usually use to illustrate the point. This actually encapsulates an ~90 min lecture on toxic effect of complexity on reliability and mission worthiness.


We had an elevator in a new building in our department at the university where I used to teach electrical engineering. It had a 4x4 matrix of quartz lights in the ceiling and the janitor, with whom I would frequently chat when we ran into each other, was always ticked off he was replacing lights. It was approaching a certainty to see a burned out light or two any time you rode in it. I remember making bets with myself as I'd walk up to it if it was going to be "perfect" or "imperfect". Never bet on perfect:) Time is the ultimate stressor and you will definitely lose. The problem was made more obvious (especially to the janitor) by other elevators in the campus inventory with many fewer bulbs which were, statistically, fully lit much more often.


That difference came from toxic, pointless, just-because-I-can complexity designed in by someone who doesn't actually have any training or experience at design. (Rant warning: Design IS a real thing and I'm not talking about the flowery use of the word you see by some aesthetics-over-engineering fields. I'm talking about actual hard problems where the requirement for the system/product/device not failing is at a level of real that even the vast majority of engineers never encounter (most never have anything to do with design). One of my chief maxims going on 40 years now is... If you don't design every day, don't design on any day. At least if the downside/consequence is significant. End Rant, for now).


Back to toxic complexity. The more parts in your system, the more MTBF random variables (i.e. the probability of how long you go between failures, which is not a set specific time interval... but a statistical distribution on that) all counting down on you in parallel. With more RV's timing out in parallel, you have "more lottery tickets being drawn tonight" and thus more chances to "win". A state of system imperfection arrives more frequently cuz it just usually takes one. You want to make a design take longer? Throw more designers at it. You want your system to blow up more frequently? Make it more complex. 


A boat is a collection of thousands of things, each with its own failure statistics/MTBF. They're ALL ticking down on you. Same for your car. Same for everything. Entropy is a PITA and the less useless crap you have in your system/boat/life and especially the less crap you have in critical places in your system/boat/life, the less-unhappy you will remain. And this doesn't even consider any special vulnerabilities of the thing that will break and cripple the entire system. Like say... electronic engine management systems in... say, lighting storms, or marinas with leakage currents from someone on the next boat over who utterly half-a$$ed his electrical system because he read a forum post written by another low-resolution boater who got lucky the first time he tried something or ... or... or just plain old entropy. My requirement from before I bought Clock Work was a fully mechanical system because I hate when stuff breaks. It pays the bills but I hate it when my stuff breaks.


Rest assured.. I KNOW you can't have this discussion with most. The ratio of sunsets and sail-covers types to technical/systems thinkers is many many many to one, and awesome about that! Thank GOD for the party planning industry that everyone isn't an engineer. I decided to write this out in the hopes of meeting another hyper-mechanical/technical/systems/stochastic thinker (roughly same order of magnitude of challenge Big Foot males have have during their mating season, although.. I have a blog), and/or dragging someone who doesn't have the systems or engineering background to have articulated this on their own but who might appreciate it into the fold.


If you want to see more math (seriously... why do I not get invited to parties???), I'll submit the following from an old FB post.




I don't know the number for the projected service life for the light bulbs in the janitor's closet but my recollection of the failure frequency tells me he probably got them on sale and were not 1000-hr bulbs.


Added 8/26

Allow me to try and further reify the preceding example.. I've been using that example for well more than 25 years but recently in trying to figure out how to get a guy just LOCKED IN to a halo-effect driven belief (extremely common) that his higher complexity engines were as reliable as a MUCH simpler one, I came up with this modification to the "elevator problem" and a statement afterward I believe that hopefully blows away any remaining fog...


Addition to the elevator problem - Suppose the elevator company makes 3 versions of the elevator. The fancy one.. the AM-Gee.. has to have all 16 lit in order to move at all. That is.. for that model, all 16 are in the show-stopper category. The math above most closely matches this model elevator.


The next model down is the HLux. This model only needs 1 specific bulb to be lit. The other 15 can be dead, but the HLux moves is the one critical bulb works.


The final model is the Marathon. It doesn't depend on any bulbs to move.


Assuming the rest of the elevator system for each model is identical, which one is more likely to be in a CAN'T-MOVE condition? How about least likely? No asking the janitor.


BTW, the HLux is more reliable than you might think.. if the critical bulb blows, you have 15 spares.


BTW#2 - Notice that adding just a single show-stopper part to the simplest system makes that system less reliable. If it's not there, it can't break.



146 views0 comments

Recent Posts

See All

Comments


Commenting has been turned off.
bottom of page