What it takes to manage a failure reporting analysis and corrective action system.
As a lifetime reliability geek, I am a firm believer in creating continuous improvement loops. Once an organization has attained stability and control of its assets and has developed a maintenance program, it is only logical to measure the success of the program and have it continuously improve. One thing I admire about a world-class organization is its ability to stay driven and focused on improvements. I have been in worldclass operations and found that if you compliment them on the state of the operation, they tend to quickly move to the “Yes but…” conversation. Yes, we won that award, but we still can improve in this area. They have tasted success and like the flavour, and they understand the objective is a program, not a project.
It is organizations from the bottom quartile that tend to think they are done. You will often hear statements like, “We tried vibration analysis; it didn’t work here. We were going to do reliability centred maintenance (RCM), but the vendor gave us a great program to follow. We contract out everything; we just need better contractors.”
When I find bottom quartile operations, it shows they don’t know what they don’t know. Not only are they innocent, but they also have little motivation to change. If your belief is that you feel there is no room for improvement, the market and your competitors will soon prove you wrong.
Reliability initiatives are a heavy lift; it takes a lot of effort to build a foundation. This effort must include high-level executive sponsorship and visual felt leadership throughout the program. At some point, you will have the foundational elements in place.
Foundational elements for reliability include but are not limited to, the following:
• Operational envelopes established for process reliability;
• Operational interfaces and responses consistently executed;
• Automation and control systems optimized;
• Lubrication best practices from selection to disposal;
• Competency-based learning programs
for all personnel;
• An integrated CBM program that utilizes multiple data sources; and
• A failure mode-driven maintenance strategy.
At this point, an organization tends to be comfortable with the direction reliability is taking. It has seen a multiple number of good wins, and KPIs are starting to trend upward. Management is happy with the ROI from the multiple reliability initiatives, and it moves on to the next focus area. Often the wins are short lived, as no reliability strategy program is perfect, and incidents happen that lead back to the reactive life. It is unlikely organizations at this point will regress to the starting point, but they will not continue with the step changes that lead to redefining what is considered world class. Managing the issues that slipped through the cracks of the program design is where a failure reporting analysis and corrective action system (FRACAS) comes into play.
FRACAS was developed by the U.S. government and introduced in Navy operations; the original application was to test missiles and address any deficiencies that resulted in testing. The original military standard is still available as MIL-STD-2155 (AS).
FRACAS is conducted by following a process of reporting failures, classifying those failures into logical groupings, analyzing the failures to understand cause, effect, and impact to your operational objectives, then planning corrective actions to eliminate the failures or minimize the impact of the failure. This process records the issues related to a failure (equipment, process, product, etc.) and its associated causes. This information is analyzed through various methodologies to develop effective solutions. The solutions are then applied and the success of the solution will be evident as shown by the non-reoccurrence (successful elimination) or reoccurrence (unsuccessful solution).
To further explore this, let’s break down the acronym.
Failure – A failure can be expressed as a malfunction; something did not deliver on the value proposition. Quite often, FRACAS is applied to equipment failures, though it can also be applied to process and business process failures.
Failures and their impact to organizational goals can be categorized by frequency and magnitude. A failure will have some notable level of function loss, a functional failure. A failure will also have some defect that causes the loss of function. The output of an RCM study will list functional failures and can provide an input into the failure codes requited to enable categorization. It is worth noting that the RCM output requires conversion to create readable codes that can be easily identified by personnel who physically interact with the assets (maintain, operate, clean).
FRACAS uses a predefined list of codes that identify the failures that the system can have and a second set of codes noting the defects upon repair. The listing of codes should be filtered by the component that has suffered the loss of function. For example, one would not expect to see engine codes on an electric motor. When structured properly, the failure codes will be a focused list that is relevant to the asset or component selected, as with the associated defect codes.
Reporting – With lists of failure and defect codes created, reporting must be integrated with the workflow process. Normally, there is a function loss (or potential function loss) associated with the work request or inspection exception report. The point when work is requested aligns with the failure codes, as the failure, or potential failure, is known. This field should be selectable at this point. The true defect is not known until post repair.
The defect code should be assigned upon completion of the work order. If you use “other” as a stopgap to ensure nothing is missed in reporting, it should be the last selection on the list. Even so, it should prompt a free-form text box, as using “other” for a failure or defect code identifies a deficiency in the list. This approach ensures constant failure reporting, which enables analysis.
Analysis – Once the failures are reported in a queryable format, it enables multi-level analysis. If this is conducted within your CMMS, it ties costs and effort to the lists, which enables a clear understanding of failure impact.
On a base level, your analysis will give a living Pareto chart of your failures. The Pareto provides a focusing tool to select the highest value failures to further analyze with tools like root cause analysis. Further analysis of the data can yield information to correlate failures to establish analytics.
Some outputs enabled by FRACAS inputs include the following:
• MTBF – mean time between failures on the component level and individual defect level;
• MTTR – mean time to repair correlated to failures and/or defects;
• Parts consumption and comparative analysis between parts;
• Reliability growth analysis;
• Failure incident distribution by failure code and/or defect code;
• Rolling up the information can result in comparative analysis of asset types, life cycle stages; and
• Reliability engineering.
As listed, there are multiple levels of analytics that can be correlated with the searchable fields produced within the failure/defect registry.
CA Corrective action – The failure analysis processes identify the cause factors of the failures. When effective solutions are discovered, they are deployed and implemented. The deployment stage includes identifying all assets that solutions are applicable to and conveying requirements of the solution to the stakeholders. Implementation includes the creation of all standard jobs, standard operating procedures, purchasing of storage requirements, or whatever solution will resolve the failure.
System – FRACAS is a system, as it naturally produces feedback loops. If I have a listing of failure codes and have implemented an effective solution, the code should not re-emerge. One of the issues with most maintenance program development methodologies is they tend to drive linear thinking; there is one failure mode and one solution. In many cases, it may take several solutions to address one failure mode, and one solution may address several failure modes. If your primary solution resolves 70 per cent of the occurrences, it may be acceptable, or you may require additional solutions to resolve the complete issue.
FRACAS is a disciplined, repeatable process that provides query criteria to link failure information. This enables utilization of the information for informed decision-making and continuously improving your programs and processes.
If the information is looped back to the structured work process, RCM for example, you develop living programs.
Jeff Smith is a reliability subject matter expert and the owner of 4TG Industrial. His work spans a cross-section of industries, including oil sands, mining, pulp and paper, packaging, petrochemical, marine, brewing, transportation, synfuels, and others. Reach him at email@example.com or visit www.4tg-industrial.com.