Pragmatic 11: Cause And Effect

3 February, 2014


John and Ben discuss the process of analyzing failure, from broken alternator belts to plane crashes. Starting with Toyota’s five (sometimes six!) why’s, John explains some different approaches including fault tree and root cause analysis.

Transcript available
A weekly discussion shall contemplating the practical application of technology exploring the real-world trade-offs we look at how greed ideas are transformed into products and services that can change your lives nothing is as simple as it seems I'm bound Xander and Michael Wills discharge energy your junk I'm doing very well Heydon and Norma Andrew Dawson yeah sorry is still haven't quite kicked actor it that by giving got unforeseen to timely Aerospace over there and are now turn is our turn is coming that's anyway so one and hello is everyone in the chat room thanks for stopping by listen life so starting off again with a general thank you to run from twitter and's arm who's been saying in continuing to get some great feedback about the show it is all greatly appreciated so thank you to all those people are also quick apology to Michael so lease I missed announces name episode for he actually kinda beg to some extent on Twitter not to correct his name but it swallows the best mind so I can't get out of my system and arm years grabs the audit so they could special thanks also to the listeners of email me directly I have read every email that is coming I have respond to some but there is a bit of a backlog armbands working on a more efficient way to distribute them arm for arm post every thing is Caesar building up everything and feel like so I will get back to you guys are that have emails I have in the bank I will get back to you shortly so our thanks again for that are three more iTunes reviews we still get them which is fantastic arm different mix of countries one from my own country are and then one from my home away from home Canada and also one from Kazakhstan now a big thanks to his re-Jason Clark and Nick Nikita Cush Pushkin arm respectively and which I'm really hoping is that their real name because at school name are just a side note are you actually thrilled to have a lesson from Kazakhstan and the reason is that when I was living in Calgary as a and interna students are but I was interning at Nortel the time back in 97 RIST I was sharing a room are was lies four rooms in the ER campus are on at the University of Calgary and is one of the my roommates and was sarcastic stallion and Daisy really annoyed when I been a chess art which did happen from time to time and apparently that is not allowed his neck are well disposed to be like you know chess geniuses and stuff and Australiansto be in a good bit anyway so they go go Kazakhstan are also thank you to work Tim Radke who recently ER changes website from a grim trigger to a site using his own name and directionally things about show I Ashley missed originally because it was in German but I eventually are found it so thank you very much for that are also much appreciated arm quickly and apology for feedback for episode two we Ashley have more that's just the episode that keeps on giving our and unfortunately Benjamin Wiley this week with other things and was unable to give a figure excelled out you'll be seeing it shortly and I've some tolerable people about our don't worry it's coming are if you hang onto that such a thing as a good one to so episode two of four bits of follow-up amounts like a set of sedatives on giving are also those reading take distortion are my blog then please note that I'm sort of in the process of migrating from WordPress to Stanek for a whole bunch of reasons are those following what were our feet will see all those but in any case arm saw following the footsteps of Sidonia and Harry Markson in the next week or two you should see that coming up are all when it does happen but please let me know would think they are always interested in any any and all feedback about that so sorry, that was pretty good yeah and yet was and so was Harry's actually it was arm sometime before that in the early 2013 now so you actually read them both in America after that on on our storing model to get it in here you headed so while you are just as I reached my limit with the with WordPress at this point and I know of an advocate for quite some time I don't hate it idolatrous and different economic example and do crazy stuff like that like to swipe the slate clean and sign all Australia's new shiny toy which is common Stanek is right now so I got that sort of giddy excitement while playing with a new toy senior okay the topic today may sound odd at first but I actually want talk about our cause-and-effect and I realise the surface it sounds a little bit our weird maybe I'm sure arm why I want talk about is it something that's fascinated me so the idea of our domino effect the idea of a chain of events and the chain of events lead to an outcome and when you start at the beginning of the chain of events you can't see the outcome but at the end of the chain of events you look back and you're amazed I'm amazed at the different connections between the different separate events that lead to that one last event and everyone sort of has their own way of thinking about this but that bad if it happens every day and in all sorts of things happen and steered the sources of scenarios you can play out area in your head all this didn't happen and that would have happened in my sort of thing I find it fascinating and I also find fussing from an engineering point of view because of a very big part of what I was doing in the first few years of my career was reliability engineering where we did a lot of these are things and I talk about today so will start talking about are the analytical component of cause-and-effect and how people can use that if if they want to in their daily lives as they wanted so should be an interesting one side like to start with Toyota and is whilst I was Toyota as they came up with this concept of the five wise but I've heard of the five wise or not and arm yards look familiar but analyses around the field away Arm in Toyota they developed a methodology many years ago called the five wise and the concept is you ask why five times and was a methodology to help improve their manufacturing processes and the quality of the vehicles and the idea is seasonably ask well just like the title says why five times in a row so why the example from the Wikipedia article was very simplistically gets the idea across soldiers read for it for you why are the battery is dead wise dead the alternator is not functioning okay wise that the alternator belts has are broken and is wise that the minor belt was well beyond is useful service line life and wasn't replaced and why was that while the vehicle is not maintained according to the recommended service schedule at that point you've reached five wise at that point they are idea is that you've reached your conclusion to the final recommendation would be follow proper maintenance schedule for your vehicle the problem is that you can actually keep asking why and get further than that potentially select so that she did you ask why again why why was the vehicle not maintained according to the service centre what turns out the old made a bell in question has been made for years is not available anymore so wasn't that they intentionally are didn't maintain it it was because they were unable to maintain it which completely changes the complexion of your conclusion so the problem is folio Toyota should have in considering manufacture this this stupid belt right this petrol engine anyway so that the problem with five wise is that it's it's a great place to start you can start with the idea of a systematically going through a query process to reach a point where you're satisfied that you've reached the true cause of the problem but why stop at five Waterford Waterford stakes eight questions or 10 questions to get there so it's not a serious chasse is not serious whelming in many respects most people don't take it seriously in reliability because it simply can't afford experiments like to get you thinking about how you should be doing our analysis of a failure and how how to get you started on and on sort of this is one approach but simplistic because the death is arbitrarily set the 5Y 5141 2000 in our at what point do you stop and know we can get down the molecular level the oath of the builders made the mortgages made of steel the steel is made of carbon in the carbons of this many atoms and it will point you stop the point of ridiculousness is also difficult to define the other issue is that each path is basically it's a singular path so you go from one question to the previous of the previously previous and essentially assumes there is only one way to get there and that's not really always the case because the facts usually in the case normally there is a predominant reason boats what about the less dominant reasons that were contributing factors and the five wise methodology doesn't address any of that so it is generally not used any more considered to be somewhat of a thought experiment more than anything so that leads to the next evolution of that kind of thinking something called root cause analysis to root cause analysis is essentially method of looking a failure event and breaking it down into some events or a series of sub- events that led up to the failure and usually repeat that for each level down and see reach a point we calibrated down any further is the sort of exercise that is done where you have a failure event that has occurred and you try to analyse why it's not so much a design tool but it's the sort of process that's is very similar to the next one I talk about except it's not our's house I say statistically and analysable in other words there is no potential is no probabilities used is no calculation of what is the most likely path it simply we had a failure of the possible causes and we simply go investigate each of them until we figure out what caused so as an investigative tool more than anything else sort of thing that's our is used in accident of investigations and will get to that so the most calculated arm rigourous method of fault analysis I'd like to suggest from a top-down approach is something called Fault tree analysis and more FDA short and faltering analysis was developed by a bloke called RH Watson don't notice first name was sorry Mr Watson and it was developed when he was working for Bell labs 1962 and is our it was an even you are a group of engineers and they were trying to understand the best way the best areas are designed to focus on to ensure that they had them the best possible outcome for their design so it was designed to be a it was is meant to be a design tool it's top-down again so you start by defining an undesirable state at the top of the tree in a cycle of poultry it it ends up looking like a tree with the trees upside down unless you think that the the neo-branches of the tree of the roots of the tree I guess but you or the poetry of something I the idea is that's each of the individual reasons it is kind of very similar root cause analysis but instant but what you do is you say well arm if my undesirable state was a failure of a computer let's say what could cause that while it could be the hard drive it could be our software problem could be the CPU or it could be your memory is, to different possible reasons each of those I have a failure rate associated with that so what's the probability is going to fail in the next thousand hours 10,000 hours hundred thousand hours whatever it is and by doing that you can then figure out which of these branches of the of the fault analysis tree is the most critical in that critical path that critical our series of events is the one that you should spend the time working on the hardest so in essence it guides the way you design are for improved reliability so improve it by adding redundancy is the simplest one so you know, most common thing on a server computer files is the power supply statistically therefore you have redundant power supplies so simple and originally the analysis would have been done decades ago as a fault tree looking all the role the right would have all components and I would figured hours okay well we should have redundancy in our power supplies as a whole other discussion about Markov models and is an reliability our prediction I didn't want to get too much in depth into that and I don't get me started on that go on forever so each of these events they've got figure out the deterministic failure rate and state failure rates are sometimes referred to as the mean time between failures or MTBF a lot of products all will have that so you'll buy hard drive will be having MTBF of the own 200,000 hours automatic it is arm pulling around my out of my ear but they're afraid but in any case the numbers not the Riverpoint one of the ER one interesting things is that you can reach the failure rate theoretically by using a different kind of our fallen failure analysis so you could look at that individual circuit board and say I'm going to do some referred to as they are familiar all for Mico which is of the familiar as failure modes affects analysis and the other one is Arthur Meeker which is a scene in stanza criticality sales value modes affects criticality analysis and if you notice the fit the failure rates of each of the components of the board and you look at the different results of different pins are stuck at a higher or lower Aurora and a transient state and you look at the end result of what what's corrupted and what isn't and so on all that stuff can be rolled up into a failure rate for our product doesn't exist are theoretically predicted failure and Nassau they would use feeding the faltering analysis and I did a lot of 4 m when I worked at Nortel are in 97 and again in the 99 and 2000 saw the bread-and-butter and an R&D because you don't have reread we have real we did have reprisal coming back from the fields because of failure rates and we tried to map our predictions that have been made previously to the field returns in the real the real world and has a reasonably good alignment but in any case going off topic so these faltering analyses are quite detailed and is that you got a bunch of basic basic function loss your basic event undeveloped event external event undeveloped ice gun develops our intermediate events the Wikipedia article goes through all of them are not really regarded to me the specifics but the interconnections between them a simple logic gates so Gaudio and Soares explores all that stuff so if you know your basic bullion album the Boolean logic gates then you look very familiar to you so it accounts for things you need to have multiple failures of certain things in order to actually are in order to actually have a failure or an undesired event cycle none undecided that anyway feel free to read up on on that desert is the Wikipedia article is not too bad so well the question however however is that we've got all these methods develop for failure analysis now so you faltering your arm root cause analysis and it all starts with five why sort of as a thoughtful experiment but how is actually usable useful maybe is obvious may be decent but I find something fascinating are also scary at the same time because there are a couple of documentaries on TV documentary TV series that explore their stuff so two of them in particular so you've got are one of them called seconds from disaster and the other 15 under your you and National Geographic Channel it's very height know it's very I express that it's arm very sensationalised now the big dramatic voice in 010 when there seconds from disaster site a.k.a. this rather odd appointment now I guess but I found that there is another one called Mayday but it's got five different names out in rows in Australia it's called air crash investigation and the appraisal cycle than the UK and Ireland and save some parts of the world it's called air emergency or in disasters so side and of wine if they do that but anyway that's a Canadian documentary hour + produced by Yum Cidaflex and that particular one is far more how should I say Yum are straightforward far more clinical in its delivery of the news it's arm the year so is it less height but both shows a good it's just that disaster is a broad spectrum of topics like arm, the show is not focused on one thing its focus our customer covered like the Piper Alpha oil platform you explosion a covered are tunnel fire ear under the under the Alps they cover air plane crashes but if you look at Mayday air crash investigation where disasters are emergency which were whatever name you call it that one focuses purely on on air crashes plane crashes and plays a course or an area where all this fault analysis and root cause analysis is critical because you if you have a problem up there in the sky you're on your own and easily goes wrong you're gonna fall out of the sky and that's generally a bad thing so Mayday on discovery here okay is it acted as dramatic yet what they do is they got periods where they walk you through the events that lead up to the accident and then they walk you through the investigation afterwards may often have lie to her 20 32nd segments of dramatisations you usually are compared all you wherever that life is an aircraft one usually so MA they still dramatise though that the little bit but they often will have the actual audio sometimes I can't legally release the audio photo all sorts of different reasons I still investigating it or arm the other was request that they were sealed for some reason so you can't always hear that but that's okay ah it is some yesterday there a lot on there and I've watched that a lot over the years and I found to be our are fascinating and disturbing at the same time because you i.e. realise that a lot of people died in some of these things and is when I look at what I do from day to day I work on aircraft even when I worked at Boeing at NAC work on aircraft to work on other military stuff but arm but not commercial aircraft and is most the stuff that I deal with it is technically possible for me to kill somebody but you they have to be trying very hard even so you still have to be very careful with what you do what you design it's not a joke is not a game it's just that it doesn't have an immediate obvious our effect like a plane falling out of the sky does so there's one in particular that stuck in my mind that I want to talk about and is the reason will talk about I guess I can't really explain it it's just that it's it's lodged itself in my mind of what lots of episodes of this and this particular one has stayed with me for years and I don't really and I think I think the reason is because I see parallels with what I've seen in my own are in my experience so the municipal tangent bites arm illustrates the point so this particular one is Aero proof light 603 so on 2 October 1996 just after midnight are 70 people on a flight from Miami to Lima in Peru are on a Boeing 757 and is it's important to set it more the scene arm moonrise was at about 10:47 PM on that day there was about a 79% elimination so wasn't wasn't pitch black by any means but because the flight started our predominantly leaving and living Miami predominantly are overwater and is when you're flying at night overwater and there is no mountain ranges or anything that year you might see in the moonlight perhaps essentially you are relying on your instrumentation so without instruments you you've got a problem so on this particular flight arm so little bit more limo background first getting himself most of the instruments on an aircraft require sampling of the outside atmosphere so the entire plane can't be sealed you need to have a speed indication in air pressure there is a hot there is and is a bunch of they call them mum static ports in the static ports are usually from the plane at the nose the plane and their son like small tubes of his online degree abandoned them are very smooth exactly the configuration from aircraft aircraft the ideas the upfront away from any turbulence and turbulence and so on nice clean air to work with that sort of thing to get the most accurate reading now these small ports arm are critical for the instrumentation work so keep that in mind for second Southaven in the air for a few minutes they started a bunch of alarms from their instruments are all very conflicting so you had overspeed alarm under speed alarm arm to loan terrain to to load aground terrain alarm are the primary oximeter and the SP indicators were not indicating correctly at all and they were essentially flying without their primary instruments now I am not asked I say I am not a pilots and is I suck at Microsoft flight simulator however based on my understanding of of the aircraft bottom line is that the primary instruments were not functional and I were getting a flight of alarms and these audible alarms as well as visit visual flashing alarms to 77 is not at the time it was an older plane even at the time so this is not your like a modern air bar saw our Dreamliner anything it wasn't digital that wasn't was old-fashioned old school logo that of this the right thing which was run and how high they are they don't know how fast they going something is clearly not right they turn around and head back to the airport or at least they try to log onto fire out to get any guidance they couldn't see the land from where they were I had no idea how fast I go no idea how high they were and as a result I tried to descend and slow down to get to cats try lands and is essentially that the plane starts to stall they have several stalls and drops a lot of altitude very quickly and eventually they crashed into the water and everybody died the national transport safety board NTSB arm and they are working with the in another investigator as well they land a look at these are all these incidents and is trying figure what went wrong and they do root cause analysis and may try to understand the sequence of events that led to the accident so that people can learn from that and not make these mistakes can so they found several causes first one finally enough so funnel enough is not funny at all oddly enough we talked about this are on point string actually which was information overload or abnormal situation management where the user interface and obviously idea is to before this is old school this is not arm a digital touchscreen iPad or anything like that this is old school yellow audible alarms flashing annunciator panels and flash indicator lights so all this information doesn't matter house displayed it still user interface so all these alarms going off all the same time and there was no grouping of these alarms to say there is no logical rationales or so in a modern system that the system will try to give you a summary alarm and so you know what I've got this I got the solicitor of this most likely cause is blah and either give your air should give you a list of the most likely cause of the most follicles on top and each subsequent because it could have been the OR belief that the Czar but it into these pilots of logs in a tens of thousands of hours these were not amateurs they were not new they weren't in a wet behind the ears they were experienced pilots very experienced and they got information overload and is to the point where there is actually a Ya a radar altimeter radar until it doesn't rely on the port static ports and arm as a result it was giving them a reading my understanding is that is not as accurate reading or it may be was in the 77 but it was still a reading of altitude so they they if had they checked they would have known at that time that that backup instant second instrument a redundant instrument was actually working to a point where they would know how roughly how high they were a Lisa could have prevented them from crashing into the ground and give them time to get over land and when I may have another navigate to the airport visually based on city lights and time unfortunately because of the information overload are they did not see that sounds crazy afterwards looking at afterwards but is true that's they did not see it that was the first course was opera was some operator confusion pilot confusion however the most direct cause was that the ports had adhesive tape placed over the front of them when you might say why the hell would they have taped over the front of the support they need the ports for the for the instrumentation aircraft so what happens is in this particular model of plane it was their standard method of cleaning to clean the plane down that I want water our high-pressure water or any other grid getting into the static ports so they plugged them up with some tape they washed plane down clean all up there supposed to take the tape off and then the plane goes on its merry way someone forgot to take the tape off and because of that arm and is in conjunction with the fact that there was an multiple conflicting alarms essentially an alarm storm happening in the cockpit arm seven people died because of a piece of tape that you think okay well you keep asking why write so why how was this possible so the guy who will I ring is named Amarillo try and pronounce at its the maintenance technician who had the clean plane and who is responsible for putting the tape on the in the first place are not removing it went to jail for two years for armed negligent homicide but the question that I didn't get an answer to and I had don't have the full report obviously I only have the the Wikipedia article and the episode itself release the full details of the report or if they do I know when I get it so I don't have that report in front of me I don't offered answers these questions but the questions that occur to me are how how is it that he just forgot if there is a procedure that says you take them up surely there is a step in the procedure says remove them you know if it's ever there in a manual there is a checklist says you remove tape surely that's what it said if that's the case did he step it is skip that step for some reason was was he distracted someone distracted with a critical moment and he forgot to go back to it because abstraction was he overtired had been pulling double ships don't double shift workers needed the money you know what was it because he was working two jobs and in the money we are and I are not sure what it was like and this was symptomatic of his work ethic is everything in what what if he had a long history of forgetting to put a nut and bolt back on or something some people dislike that attention to detail and proper should be doing the job so then what about the employee screening of a supervision order someone to doublecheck their work these are all procedural things that could be put into place they could have but you know I don't have answers to any of those questions but those are the questions that that I would go on to find out I would like to find out both course will never know our the deep the final details even if there is further appointed as we now have a report because they are error-free when a business they were in a good way before the accident after the accident arm it was, like the last nail in the coffin so they were our business within a few years of the accident and some that was it however the other interesting part of this is that our Boeing actually got sued and Boeing was sued because they apart from being the last person staying at have any money to sue they were shown to have been neglectful in their design procedure while the training procedure sorry R4 for cleaning the aeroplane so they put it down to a training issue in the ice sued the hell out of you Boeing Boeing coughed up massive amounts of money one White Plains cost so much is not amateur cost to build them is for the down insurance right so although I'm sure a lot of cemetery cost to build but still assurances in sheep so in any case Boeing took the majority of the blame for that but the question I've got is why the 77 was not employed at that point so with all of the procedures and everything in place it's never happened before wasn't something that was it so they should have been able to predict what woman did they do their fault tree analysis correctly when they when they went through this and the UN when I do the design for these things you surely the thought occurred to them that this was kind of important arenas from a thousand things property 2000 x 10,000 things are important with an aeroplane that if a failure in deep trouble may be a figure the backup ultimate was enough maybe they figured are that you having two people doing the year the checks after that the flight was was being of the play was been clean maybe that was enough they figured I was in a statistically me I don't know how many other planes of that area had a proper system might apparently either as a result of this around the same time this happened a lot of planes have physical plugs that fit in those ports and they have a dangling bright orange coloured flag so it's visually very obvious so if you see a plain rock and up are about about to leave it's got a bunch of dangling red flags under it you might suspect that you should look at so in any case question is was a standard procedure was required by law will what happened I don't know so that sounds dramatic maybe I mean I guess the point is that it's a terrible thing but from my own experience I can't's claim that I've been involved in any fault analysis where someone died I've been involved with ones where they were injured and I've been involved are at a place where there was a massive explosion song I'll talk about that one next so I did some work at the sternal powerstation arm as an intern back in 1996 was factually really really great I really enjoyed it when I was there or rather I wasn't there at the time and most people won't the incident happened about two in the morning are there is 6.6 KV switch room full massive some switchboards and circuit breakers and is one of the circuit breakers had a face-to-face fault and exploded explosion was so big it actually blew a hole in the wall wall was some are cheese half a foot thick rock block so yeah you know significant and blast energy from this so what happened just outwardly randomly bang and this powerstation was new and I say new I mean it had been running for maybe that particular unit and switchboards which broaden so Braverman there for about a year tops but operational for maybe nine months 10 months they followed all of their maintenance requirements so circuit breakers are present. Time even if they don't operate our service checked so they go through much activities all of your inside but they also have a number of operations every time you operate the breaker after a certain number breakers thus raise a number of separate operations we do you check it out make sure still good so well done all that correctly so I happen fortunately there was not in the room the time explosion till the morning while you hang out nature switch room well be a normally they would walk through the switch rooms they advise you not to arm that's explosion is a good reason why it is very rare very uncommon so I haven't what happened was again the sternal powerstation was bills are able this distributor backgrounds 1.44 MW coal-fired power station this for arm R4 turbines arm and is essentially the buildings they try to cut costs as wide as always start with that they try to cut costs on ancillary buildings we did is they builds the ancillary buildings in the adjoining spaces between our the difference turbine generator sets Australia people you know the basic architecture of other power plant that you tends to break them up into our generator sizes because the bigger generators get the more unwieldy they get more of diminishing returns, thing so that even a manageable size and maintainable size you tend to limit their size and have multiples of them so rather than having one massive one now gigawatt generator you'll have four smaller 350 MW generators now and that just makes everything easier in a lot of ways solicitor and all these are spaced apart because Yeadon is the one to cause an incident while the others so a bit of independency so what they do arm they want they figure will build this maxillary buildings and the connections between the R turbine Hall are in his buildings and save the cost of X number of buildings that required but subtle change to the way the concrete foundations had been poured such that the pulveriser is which were directly beneath the offices were on the same slab as the actual circuit breakers in the AHV switch room sorry, slab, slab means not isolated which means vibration will be carried from one to the other and back again one be happy vibrating family was a common issue actually in the year and in the buildings in the bill between negative war between them and their hotspots so you stand in one spot on the floor and feel the vibration and in C&S nearby and there be like a pencil Donald dance on faithful and you like our days are normal or is that possessed the first time I saw it I thought poltergeist or something as crazy and when I explained all this my arm that makes sense though it has a happy ending no one gets hurt okay but still so what happened pulveriser is over because people might understand the way the whole work so dearly, about the ground sponsor Little Rock's right little black rocks are David's blackhole which diluted Stanley because Theo glycol was plentiful in that part of the Queensland so the coal comes up from the ER on the conveyors and splits across all the different the boiler unit while houses and goes each one has its own arm ITS not entirely true I think there is so is one pulveriser freighters and I can't remember actually arm instead memories rusty sorry anyway less as a pulveriser freed some turbine generator set Sokol falls in the polarisable pulveriser is is a bunch of enormous ballbearings sitting in a rut of source drop coal in spendable bearings around and crush crush crush crush crush what you get yet fine coal dust and then that is extracted are things vacuum extracted or positive pressure I forget which one but anyway it's blown or sucked one of into the boiler words torched and it was a generates heat but it is a steam steam makes the generator spend in a lawyer you got electricity so the pulveriser is though as you can imagine massive ballbearings going round and round and round around what I can do; vibration so the problem was that the circuit breakers are being shaken the pieces all the previous plans and on exactly the same idea but they should just change this one thing they put a common slab and suddenly it caused a massive problem so is the sort of thing that someone somewhere in a meeting reminders picture in my mind I like how can we save 250 grand on a bunch of you know maybe that wasn't it maybe it was the decision had been made and maybe it was that there is a better way of doing an assault will just pour a common slab and summit associates at spinal work anyway so it was a solution to all this locate happy ending happy ending was they simply separated the slabs which you can do like cutting bit of digging out a separator slabs and that you cut back the vibration immensely meals to get vibration conducted through the soil obviously but nowhere near to that level and I believe they also stepped up the frequency and the thoroughness of the inspections of the circuit breakers and to the best of my knowledge there hasn't been a problem since but I find that whole thing fascinating because it's fascinating how you had a functional design one tweak to the design and I say a functional design of Stanwell was actually there were three plants built the same time you had to wrong followed by coward be then followed by Stanwell are all built the same blueprint same control systems are seen in some of our was telling them some ancient now goodness arm anyway and is arm year anyway and is there as I had task generators in a way that the details no matter the point is they are essentially carbon copies except that one detail that was a pretty clever how problem so okay when you do this for a while ultimately you reach the conclusion that all problems are caused by people and I realise it's a bit like saying the school would run perfectly without the students had to get in the way which are now in times of her that arm but some of the trains around a lot of people would just use the but you obviously maybe obviously maybe not all problems come from people in one form or another because we make mistakes and I have said before it will have bad days the governor also the best things go wrong so what we try to do the controllers we create procedures we create our safety measures and we some, structure and rigidity around from what we do to ensure in higher risk activities that we reduce the risk of there being a problem you can never eliminate it though that's the problem less than the ultimate problem is can eliminate so anyway one of the more interesting modern modern I think one developments is the concept of fatigue management for the longest time if you want to work extra hours it was a sign of cheer me upon for the company in your bid on a great job gate way go-go you your bike in for a raise or whatever the hell right he is the thing that Lisa fatigue fatigue leads to mistakes and mistakes in certain parts and jobs can cost people's lives one big pushes in the construction industry that I've witnessed in the last 10 years in Australia in particular and on my understanding it's a global thing is not just Australia is the focus on fatigue management and safety safety is a topic for another show that's another very long topic which you maybe will get to one day but honestly our people now having enforced maximum work hours are you cannot have your own overtime is nice you will restrict your Site 12 hours a day and that has to include travel to and from your place of primary residence to the place that you are working so if you've got a two hour commute each way then you're stuck work an eight hour day that's it that includes a lunch break and in all that time is the time you are consciously engaged on-site because too many mistakes were happening because people fatigue they worked really long hours to meet deadlines and it was causing people to get her or die so fatigue is something also arm on a non-industrial front on nonengineering front modern vehicles are I'm led to believe that start their developing systems for like Mercedes and BMWs where they actually track our how alert the driver is an now that sort of stuff is amazing and fantastic and every car should have driving fatigue is like driving when you're drunk this terrible and yet people do it why because there is no definitive test for it was not really I suppose if you apply a DUI test the other roadside test when I try make you walk a straight line your briefing organising another still be that it is a meme in the movies right in all I know is that nowadays I get pulled over may give it a blown to omnia white hollow tube so the ultimate temperature are no I can't arm living wrote down first and I can do it I was I scared the so the average young mean right this is what wears the tests for fatigue and it really isn't a good one so the problem is a fatigue causes so many of these issues I mean that the thing is that errorprone disaster who is to say that that wasn't the reason what the guy been up all night out with with a baby or something which is something you could probably personally testy runner arm you know me that you just don't know and the rules when you're dealing listening is so important like aircraft maintenance men surely they would have to have fatigue clause now back then they probably wouldn't have these days it's becoming more and more common because people are realising how dangerous it is so arm the problem with that is the problem you can't force someone to switch off gut for someone have a good night sleep I will restrict you to 12 hours door-to-door what you are now 12 hours your business I can't stop you your spend 12 hours partying hard at the local pub or nightclub or whatever. You know and therein lies the problem so people still have to be responsible to a certain point but the tightening up on on fatigue and I've actually arm debating whether I would actually admit this one moment may as well since in the subject when I was our when they are my third is either for kids must moan when my third child are was still a baby he was keep me awake a lot and I came into work and is a very big safety culture the place was working at the time and it became obvious that I was not getting enough sleep and I was pulled into my manager's office and told are quite succinctly that they notice that the quality of some of my work in the last two or three weeks had been now not as good as generally I have well I like to think I have a high standard some people say I do but depends on what the talk about spies in the ends they notice a slight inequality my work and they knew that I was the only time and looking very tired bags and was all the signs of the warning signs and I was told not certain terms are said John get your fatigue under control or we will get under control for you which is of course a nice way of saying because it's in the employment contract arm that if it's not in room remains unresolved that they can actually find you expected the shop to work well rested and ready for work which is you want fair enough they're paying me to show up so I had no choice but to seek out alternative arrangements are what are the details are still in the same house that but the point was that fortunately for me in the weeks following that just as luck would have it arm my son started sleeping through and is problem solved through indirect means but temporary measures were taken and this can become more and more serious and as it becomes more comfortable for anyway alright so this is all well and Cemetery and lovely in the own domino effect cause-and-effect chain of events lots of stuff I find it fascinating but it's the thought process that is really useful and a lot of people do this without thinking about a lot of people don't do it is deep enough as they should soul the lookers only goes wrong in their life and go down a step or two and so I hours because of La when the truth is about Digging they were found there are actually other reasons that they were either too afraid to admit or were not able to see because they were happy to accept some higher level reason for what went wrong so I do a little bit of retelling a suppose of what happened me when I was in Calgary says you know I worked in Calgary forearm to 1/2 years all up those into segments are first time as an intern second I was a full-time employee Nortel Networks and it was wonderful love that I loved every minute of it except the minute where I was laid off and on like that minute and is what happens not sell stock and on the tank and is essentially we had they have are cutting the budgets and only talked about this before and the project was working on the appliance PCS got canned along with me and CI worldwide 60,000 other people also happens to you you hey you really need to ask yourself the hard questions as to why why so to walk you through my thought process on this arm to illustrate how I think it can be a useful mechanism for people to keep digging until they get down to what they want so you file anything I recommend the original root cause analysis and why but you gotta be honest this off as can be painful some of the answers you get might not be what you hope to find but if you're honest about it that will help you a lot so for me okay I'm an engineer are always have been either didn't realise it early on I won't be a theoretical physicist but or inventor development that generalises later on testicle because this person out all through soul-searching but I as an engineer accomplished with the soul-searching to think of it as the root cause analysis was at Teo but anyway so why was I went off I was laid off because the stock when out and take they ran out of money they cancel much in R&D projects that won't key and I was on a project got canned or why I moved to that department only four months previously I used to be working a reliability department go white while I leave is that that I was chasing a dream that I want to be at that point in my life RF hardware design engineer doesn't want to beef all my background amateur radio I love that I want to electronics or do more of that stuff so the outbreak of the RFI will design a fantastic can't can't wait as brilliant so while Lee was facing my dream this point you can't just keep asking why because it splits into two pieces so the first piece digger split off is what I have been laid off I stayed in a reliability team yes is yes because they were cut back as well less projects to support less requirements reliability last in first out statistically I would have been laid off their somehow I stayed in reliability it wouldn't have mattered and of end of path switched the other path the guy still chase my dream job if I could lose it any time through no fault of my own and I would have the rug pulled out of money was this still appoints chasing a dream job if I'm sacrificing everything else for it better things get interesting truth was I love Canada beautiful country I miss it but North America from in timespan the US was great to buy miss time I was homesick that is true I miss the beaches me Calgary was close what are 1200 km inland from the nearest ocean such little ways inland are you got the lakes nearby just a handful of lakes and is the Bow River DA but it's not the same as the ocean whilst me anyway I guess and I didn't see it that way, surf you anything not a great swimmer either but is love the sound of the ocean and I so miss the ocean a miss my family and I was over there by myself the only family overhang was store in Australia I was over there by myself so I miss them so I reached that point after 2 1/2 years I've been laid off was I actually happy chasing his dream of mine is mine and I was to be happy was that going to actually make me happy I came to understand as I keep digging down deeper and deeper and deeper is icy just wanted to be happiness the simplest thing that everyone wants what it means for me I realised that I want to be living I want to live and my job and career was simply going to have to be whatever it turned out to be the chasing that dream going around the world try and find it arm ultimately would not make me happy and as for my records analysis and it ended their packed up so my stuff and came back to Australia I didn't get back to where I grew up because there was no jobs there I went to Brisbane instead expressed I am so this is how I see it is a useful exercise and I know it's maybe I do know how that sounds that sound strange to you is that is only an hour soul-searching but at same time there is a process to NIAID arm I went to the same thing when BR stood up and applauded arm and a bit about arm he had eight wild actually relates back and away to the discussion of fatigue because it was and I was one of those stereotypical death marches rate arm email, one mistake after another in one none of us really knew what we were doing what we got ourselves into arm in our something where I've gone to a number of failure analysis on it) could go through and point out all the other the moments we need serious mistakes in the moment we made little ones but arm I'd safer from me arm the big one was was the same Fang which was our never really asking myself that this is what I wanted to do arm I guess how it fit into my goals and my plan for what was actually important for me and I sent anything I was in when you're working 12:16 hours a day every day of the week are you get that deep fatigue rate that's behind just being tired is that some parts of your brain just come off stealth and are unified in crappy working under pain of being a crappy person is that that salad up from me and arm now I think sometimes I wonder if it's the kind of thing that you really can are can figure out how to time rate I think you have to be pretty self-aware and really have it together tonight ever make a mistake like that because I really member in a couple years my life you know and a Saturday yell, Kim and healthy physically mentally emotionally all these things on the other hand you know it was arm there were good things about the experience though so it's not like it was a complete wise and anything going through the are a process not unlike what you're talking about arm can unpack it all took months arm was able to innovate they had today I could mount Loudon point to the good things in point of the bad things by on the horse could not recommend doing it that way are there it's a thing I want to discuss why things I found fascinating about my experience was that I went over there I was with my eyes closed essentially and I like you said you you got it go you don't know and we try I suppose that is the way think about that I went over there chasing a dream and the truth was that I learned i.e. I learnt records analysis when I was working in a reliability team are over there I didn't want to do reliability out-of-the-box I found interesting up to a point despite the fact I nearly fell statistics at uni was doing statistics Armageddon this terrible gas you possess the lights are CME anyway so that but in this me what happens is you will learn about this technique of records analysis and then you use it to basically realise that you shouldn't be there the irony of that is not lost on me as I went some way to learn about something that is it eventually taught me that I didn't need to be the site anyway I just I find it funny amusing this is years later looking back I can look back and sort of laugh but you know what the funny thing is that like is not like you said there were good things too and I'm in my ice I got to live in a country for years it it's changed my view on on the world and on life and people in it it was wonderful if you haven't done a highly recommended if you position to do it because staying in one country and seeing things through 11 set of well one country's perspective is always gonna be limiting and there is so much more out there and die and I consider kanamycin away from home and I miss the Rocky Mountains I miss no crazy that sounds don't miss deicing my car in the morning I will admit but in no are you guys it is gone through that no polar vortex would have had they called it in and it is still cold over there I hear but even so I still miss it and there were good bits they really were so animist 10 horns which is not good but I guess an ego anyway arm I left him a small add to that that some the one thing I made is noted down arm we are talking about the people now give all my virulent was that are in terms of being the source of failure on a very familiar with the the original the source of Murphy's Law and are an Admiralty movie was the guy it's arm is named after our that he was working arm as an Air Force writer forgot wasn't was crashed us set up one was English signal marketing how much damaging dummy but arm the actual call of failure is a one day after finding the transducer was worried wrong he cursed the technician response was that if there is any way to do it wrong help find it and I'm reading that a few years ago and it is stuck with me that actually it's been misquoted and its mind yet while it's discharge what will never strew the result of that anywhere the neuron alcohol find it so depressing only look at it that way mild nose but is that the point is so if I'm making a system so not living in the smokeless mob and be kind to the Dumbarton call already made while arm I'm cutting thing were done at your wrapup if you want talk more about this you can find John on Twitter at John Geagea seem one after that you should check out John say take distortion do come diligence and email you can send it to China take distortion account and then Alexander and you can reach me on Twitter at feel like southbound can follow our pragmatic joint literacy show announcements and other related materials thank you Ben and thanks to life�
Duration 58 minutes and 8 seconds Direct Download

Show Notes

Related Links:

Premium supporters have access to ad-free, early released episodes with a full back-catalogues of previous episodes


Ben Alexander

Ben Alexander

Ben created and runs and Fiat Lux

John Chidgey

John Chidgey

John is an Electrical, Instrumentation and Control Systems Engineer, software developer, podcaster, vocal actor and runs TechDistortion and the Engineered Network. John is a Chartered Professional Engineer in both Electrical Engineering and Information, Telecommunications and Electronics Engineering (ITEE) and a semi-regular conference speaker.

John has produced and appeared on many podcasts including Pragmatic and Causality and is available for hire for Vocal Acting or advertising. He has experience and interest in HMI Design, Alarm Management, Cyber-security and Root Cause Analysis.

You can find him on the Fediverse and on Twitter.