MRO Magazine

The establishment of an effective failure analysis program

Failure analysis programs are in place at production and manufacturing plants in many industries, but they tend to stop at a certain stage.

January 26, 2022 | By L. (Tex) Leugner

Photo: zapp2photo / Adobe Stock

Photo: zapp2photo / Adobe Stock

These programs are in place to determine and understand the root causes of component and machine failure, avoid recurrence, reduce costs, and improve equipment reliability. Unfortunately, and frequently, the root cause failure analysis process stops at the identification of the physical causes of a component or
machine failure.

Root causes can be categorized into physical, human, and latent (organizational and managerial) causes. When the failure analysis team stops at uncovering the physical causes, a complete picture of why the failure occurred in the first place has not been fully investigated and determined. Delving into greater depths to identify human or latent causes of failure allows one to recognize incorrect or inadequate human actions, or inappropriate corporate policies that unknowingly or unintentionally permit the wrong or inappropriate human actions to occur.

A common example of human causes is lack of training in many maintenance or operational functions. A latent and very common cause of failure is a philosophy among some plant executives and managers that maintenance is a cost rather than an investment, which creates mistakes because the deference of scheduled PM tasks, in favour of production results in unexpected stoppages or failure.

What are the most common types of failures in your particular facility?
LOGIC: There are several very common types of failures, they include overload or over-speed (a common result of a latent cause); fatigue, of bearings for example, which can be considered normal if the bearing has reached the end of its life cycle; corrosion, causing material loss in a component; elevated temperatures, resulting in lubrication failure or changes in the metallurgical condition of the component.


What methodology of failure analysis is used in your facility?
LOGIC: There are six typical steps recommended in the failure analysis process. First is “diagnosis”: inspect the component carefully, using high magnification photos to determine if the failure is one of those frequently occurring, such as corrosion, temperature, lubrication related or fatigue.

Next, “collect background data”: frequently failures are a direct result of an inadequate repair just completed and review the machines complete maintenance history in detail.

Then “inspect the component (or pieces of it)” with a good quality microscope, develop a logic tree and list every symptom of the failure. Now “complete a detailed chemical, scanning electron microscopy or metallurgical analysis” to determine the related condition of the component or its pieces. Then “determine the physical failure mechanisms and arrive at a conclusion”. Finally, “determine human and latent root causes” that may have contributed to the failure.

Does your organization fully understand that failures can occur throughout a machine’s life cycle?
LOGIC: Without exception, failures belong to one or more of these seven causes: faulty design, material defect, manufacturing or processing deficiencies, assembly or installation defects, unintended service applications, maintenance neglect or procedural deficiencies, and improper operation.

How does your organization prepare and complete improvement projects?
LOGIC: ypical projects include changes to machinery to increase production. If a project does not consider proper material selection, accurate dimensioning and any operating condition that changes the operational result, failures will occur. One such recent production increase project was that of a conveyor system in a crushing plant. In order to supply more material to satisfy a newly acquired crusher, the conveyor system was expanded to increase the supply of material. Wider, stronger belts were installed, along with an improved support roller and bearing system. The gear drive mechanism which operated the conveyor failed catastrophically six weeks later. No thought had been given to whether the drive system could support the increased capacity.

How does the organization manage and facilitate increased customer demand?
LOGIC: Often customer demand necessitates increased production that may call for modifications to machinery. If the organization doesn’t give serious thought to how and why modifications are managed, failures will occur.

Plant management simply ordered increased speed of production machinery. Within three weeks, bearing failures began to occur. If a typical bearing load is doubled, the life cycle of the bearing may be reduced by as much as 90 per cent. Doubling the rated speed of a bearing can also reduce its life by as much as 50 per cent.

These engineering guidelines must be kept in mind whenever production increases are demanded by an unknowing management, or if machine modifications are considered. The obvious lesson is that every machine or mechanical drive system is only as strong as its weakest component. It is important to remember that about 80 per cent of bearing failures are usually a symptom of a much larger problem, such as excessive loads or speeds, extreme vibration conditions, poor lubrication practices, extreme temperatures, improper replacement bearing selection and/or poor installation.

After a failure has occurred, does your organization review its operational procedures and practices to address human root causes?
LOGIC: As discussed above, failures are often related to human and latent causes. A typical example is related to poor operating and maintenance practices. These may be one of the most common causes of machine failure, and in the opinion of many equipment reliability specialists, contributes directly to a two-to-five per cent reduction in plant productivity in North America. Poor operating practices are the direct result of two conditions prevalent in our present society; an uncaring attitude by workers, and an absence of appropriate training (in fact, the absence of adequate training may be directly related to worker attitudes).

Regardless of whether it’s physical, human, or latent causation, root cause failure analysis should be carried out after every incident or failure, no matter how insignificant or unimportant it may appear at the time. This is the only way an organization has any chance to effectively eliminate recurrences entirely while continually improving its operational and human resources.

To remain competitive, the goals of industrial plant facilities must include high levels of machine reliability, to reduce downtime, extend equipment life, reduce repair costs, improve equipment efficiency, reduce capital costs, increase productivity, and maintain employee morale and satisfaction. Maximum equipment and process reliability cannot be achieved or maintained if the plant is continually subjected to breakdowns, inadequate or incorrect repair procedures or recurring failures. MRO
L. (Tex) Leugner, the author of Practical Handbook of Machinery Lubrication, is a 15-year veteran of the Royal Canadian Electrical Mechanical Engineers, where he served as a technical specialist. He was the founder and operations manager of Maintenance Technology International Inc. for 30 years. Tex holds an STLE lubricant specialist certification and is a millwright and heavy-duty mechanic. He can be reached at


Stories continue below

Print this page