Root Cause Analysis

Root Cause Analysis resources

Root Cause Analysis photo

Root Cause Analysis Article

Root Cause Analysis

What is Root Cause Analysis? Root Cause Analysis (RCA) is a management process that seeks to locate the ultimate cause or 80/20 rule causes behind performance or process-related problems in a business or engineering environment, and then proceed to resolve the problem by treating these underlying causes.

The advantage of Root Cause Analysis as a failure-management method over troubleshooting, for example, is that the latter is a knee-jerk reaction to the occurrence of some critical problem or failure. Some fire-fighting is carried out in order to handle and recover immediately. Since this expeditious approach deals with the patching up symptoms quickly, the problem seems temporarily solved. Over time, the problem is likely to recur, resulting in a similar knee-jerk troubleshooting process, racking up huge costs along the way.

The benefits of Root Cause Analysis, as a result, are the deeper investigation into the reason for the occurrence in the first place. The root cause or causes might be much deeper than outward symptoms reveal, and several layers may have to be pushed aside to reach the "root" cause. So, the focus is on analysis of this fabled "root cause" that propagated forward and manifested in the form of the problem at hand, rather than exclusively treating the symptoms, as troubleshooting does.

Having identified the root cause, we then proceed to treat the cause(s) within the organizational perspective, thereby eliminating or reducing the anomalous impact such as maintenance cost. The critical importance of is this prevention of recurring failures.

Summarized, the goals of are:

  1. Failure identification: what exactly went wrong
  2. Failure analysis: why it happened, discover the root cause
  3. Failure resolution: provide a solution that prevents recurrence

Practical Applications

Applications of this method can be found in diverse fields and industries.  Examples of positive results from Root Cause Analysis include Chemical, Petroleum, Power generation, Transportation, Healthcare, Construction, et al.

Training

While training to perform Root Cause Analysis, the analyst should learn to identify the following causes during their investigations:

  • Causal Factor - This is a condition or an event, which resulted in some effect to take place, or it may have shaped or influenced the outcome in some way. For example, a leaking overhead pipe carrying oil on the factory floor is a causal factor that may lead to, perhaps, fire in that area.
  • Direct Cause - This is the cause that resulted in the occurrence that finally came to light with some significance to the management. For instance, in the case of the overhead pipe which oozed oil on the factory floor, the actual leakage is the direct cause.
  • Contributing Cause - This is a cause that indirectly affected the outcome or occurrence. On its own, the cause might not have the sufficient power to result in the event taking place. In our example of the overhead pipe leakage, selection of a supplier - who supplies low-quality pipes - by the purchase manager is a contributing cause.
  • Causal Factor Chain - These are simply a chain of events, one leading to the other. Some specific action creates some condition that results in an event. This event in its own turn creates yet another set of conditions, which lead to another event, and the chain of cause and effect continues. In this sequence or chain, the earlier events or conditions are known as Upstream Factors.  The Cause and Effect Diagram is often utilized to assist in this process.
  • Root Cause - This is finally the cause that, if corrected, would prevent the occurrence of the particular event or phenomenon. This is the most fundamental aspect of the causal chain that can be logically identified.

RootCause Phases

Root Cause Analysis training teaches us to phase the analysis process into the following, and this closely matches the goals identified above:

  1. Collection of data - Phase I
  2. Event Investigation - Phase II
  3. Resolution of occurrence - Phase III

Root Cause Analysis - Phase I, Data collection, should ideally begin as soon as possible after the occurrence of the event or phenomenon. This ensures that no data is lost. If possible, data may be collected even while the event or phenomenon progresses. All information pertaining to the occurrence should get noted - including conditions before, during, and after the occurrence; what current actions were taken by the personnel involved; environmental factors; if any, and so on.

While collecting data, it is critical to investigate what actually happened, rather than focusing on what could have happened. To this end, data collection should be a fact-finding investigation, and not a fault-finding mission. Objectivity, as opposed to subjectivity, is critical.

Data collection techniques include interviewing personnel most familiar with and directly or indirectly involved in the incident. The first contact with them may be restricted to hearing their perspective on the failure. Records pertaining to the incident are another excellent source for data collection. These may include correspondence between the key players, minutes of meetings, operation logs, maintenance records, equipment history records, and the like. As is obvious from this list, data collection methods may be as varied as the scenarios where the analysis is being performed.

Root Cause Analysis Phase II, Event / phenomenon investigation, involves an objective evaluation of the data collected, in order to identify any causal factor chain that may have led to the occurrence of the failure. Usually, one or several of the following categories of causes are involved:

  • Failures related to malfunctioned equipment or material. This could also be due to non-availability of such equipment or sub-standard material.
  • Failures related to procedural issues. Either the procedures have been short-circuited by personnel, or new circumstances have made established procedures inadequate or obtrusive.
  • Failures caused by personnel. This could be from improper training, or distraction caused by environmental factors while operating equipment and such.
  • Failures related to equipment design. Perhaps some ergonomic factor was overlooked when designing the equipment, or one component fails to align with the rest of the equipment.
  • Failures related to management policies. Perhaps management shortsightedness is one of the root causes.
  • Failures related to external phenomena. Perhaps some external or uncharacteristic events caused an unforeseen occurrence.

Root Cause Analysis Methods

Depending upon the Root Cause Analysis training path you follow, there are a number of methods available at this stage of analysis. The ultimate Root Cause Analysis training would provide in-depth knowledge and awareness of all root cause analysis methods.  This rounded training is critical, so that determination of the root of the failure is quite thorough, leading to the right conclusions being reached. A few popular methods are discussed below:

Events and Cause and Effect Analysis:This method is used when the data collected in the investigation phase points to a long chain of causal factors, or when the failure at hand apparently has several dimensions.

  • Change Analysis:This method is a simple process of six steps, and is especially useful for evaluation of failure of equipment. The method, due to its superficial nature, may not be able to identify all the root causes of the occurrence, and can at best be used as a supplement to a larger investigation activity.
  • Barrier Analysis:This method provides a systematic approach to identifying failures of equipment and / or any procedural or administrative failures. It is quite powerful in the hands of someone who is familiar with the details of the processes involved.
  • MORT: Management Oversight and Risk Tree, or MORT, is a method that can be deployed when there are few experts who know the right kind of questions to ask, and the failure is a recurring one, with no let up. Visually oriented, this method involves drawing a tree with the left side listing all factors that are relevant to the occurrence, and the right side listing deficiencies in management that led those factors to come into existence. For each factor, a set of questions is included that need to be addressed. This method helps prevent oversight, and ensures that all causal factors that have the potential to be part of the chain leading to the occurrence are considered.
  • Human Performance Evaluation (HPE): This method comes into play when the data collection phase clearly points the role of personnel as a contributory node in the causal chain within a system. Thus, its focus is on man-machine interface studies, and on system operability and work environment. Psychological insight on the part of the analyst, along with training in ergonomics is required to carry out HPE effectively.
  • Kepner-Tregoe Method: This is a highly-structured method that looks into all the aspects of the occurrence. This method provides a systematic framework for gathering, organizing and evaluating of data. A formal training in K-T method may be required in order to be able to adopt this approach.

Root Cause Analysis Phase III, Occurrence Resolution, is a realistic assessment of the viability of the corrective action that the previous phase has revealed, followed by application of said corrective action. The phenomenon must then be monitored periodically to verify resolution and effective recurrence prevention.

Applying RCA

Determining the root causes of the Space Shuttle disaster which caused the death of seven astronauts is a classic application of Root Cause Analysis process.

Treating the symptoms, such as the falling of foam (so make the foam stronger), or the missing of tiles on the left-wing (so count the tiles before taking them inside the shuttle) - would be suicide from NASA's point-of-view, besides risking the lives of more astronauts.

The failure undoubtedly lies in systemic failures that were part of a long and multi-faceted causal factor chain, which ultimately led to the crash. Investigators working on reaching the root cause of the failure are perhaps deploying the Events and Causal Factor method or the Kepner-Tregoe method, or software which delivers a variant of the two methods to get to the bottom of the problem.

Contact us at Root-Cause.info .


© Copyright 2007, Envision Software, Tampa Marketing firm - Directory - Friends