MRO Magazine


Anticipate failure through condition monitoring

"In most cases we won’t be monitoring our equipment continuously. We’ll do it periodically on a fixed frequency – say once per shift, daily, weekly or monthly. How do we choose the frequency? This is where condition monitoring involves a bit of informed guesswork." —James Reyes-Picknell

Equipment and systems are designed to perform some function at predetermined levels. The design of the system is such that its components are not overloaded in normal operation, so it should operate long and successfully. However, there are times when something goes wrong. That might happen after a certain amount of use or time has elapsed or it might happen at any time (that is, randomly).

If the problem arises after a period of time or usage and if that is fairly consistent every time, then you can act to avoid the problem just before that time or level of usage is reached. That is what we do with preventive maintenance. We restore or replace the item at a known fixed interval. If the problem arises at any time however, its occurrence is random and fixed interval intervention won’t help to avoid the problem. After all, the problem could arise at any time, including right after you “fix” it.

In those cases where failure is random we may be able to use condition monitoring. We monitor the performance, or some other signal that indicates its condition on a regular basis.

Monitoring signals in our car, for example, happens over a period of time, sometimes for a long time or many cycles of monitoring, before we detect anything abnormal. That long period of monitoring and finding nothing wrong is what you should expect. Keep in mind that the failure is random, therefore it could happen at any time. We need to remain vigilant. When we detect that signal, indicating that we have a problem, then we must act expeditiously to correct the problem that has occurred. Acting on it entails doing some corrective maintenance. The combination of condition monitoring with that condition triggered corrective maintenance is known as Condition Based Maintenance (CBM).

If we act right away we catch the failure before it progresses too far and causes additional damage or sudden loss of capability. Using our car example, we get a noise, we arrange to have the car checked by a mechanic and arrange our day around it. The mechanic recognizes the noise, she tells us we have a loose valve and proceeds to fix it. The problem goes away with minimal effort and cost. If we didn’t act on the noise, then the valve may break, we’ll lose engine power, possibly damage a cylinder and piston, we may damage other components like our valve lifters or timing chain and cause the engine to seize. We will lose use of the car for much longer, maybe even to the point where we replace it. That repair will be much more costly to repair and it will only happen while you are using the car – also very inconvenient. Condition monitoring doesn’t avoid the failure – in both cases the valve needed repair, but it does avoid the unwanted excessive consequences.

Anticipate failure

In most cases we won’t be monitoring our equipment continuously. We’ll do it periodically on a fixed frequency – say once per shift, daily, weekly or monthly. How do we choose the frequency?

This is where condition monitoring involves a bit of informed guesswork. We need to estimate how long the failure will take to progress to an intolerable state after we have found the problem. There are no statistics for this either! We are dependent on the experience of those who maintain and operate the equipment who may have seen the problem before. They will likely have a “feeling” for how long the deterioration may take. This is no place to take chances – go with the most conservative gut feel – the shortest time you get from your experienced workers’ memories. Once we have the deterioration time – all known as the “Potential Failure to Failure” interval, or P-F interval, we can determine task frequency. The time between our monitoring checks should be long enough to allow us time to take action if we find a problem, but not so long that we miss the signs of deterioration between checks. A good rule-of-thumb is to take that P-F interval and divide it by two. If that time, half of P-F, is long enough to arrange for taking the equipment down in an orderly fashion in order to work on it before it fails, then we have a good task interval. The task frequency is 1 divided by that interval – 1 per month, 1 per shift, etc.

The time for that deterioration to occur, P-F, is often much shorter than the statistical average (or mean) time between failures (MTBF). However, because the failure is triggered randomly, we don’t know when it will be detected. That is why we must monitor quite frequently and most of the time we can expect to monitor it a lot before ever finding a problem.

An important factor to consider is that we can monitor for signs of failure in a variety of ways. In the car example we are using our hearing. We can detect signals in the audible range of sound frequencies, 2 Hz to 20 kHz. If we can detect sound outside that range we might find the problem sooner. Also, if we have some other signal we can monitor we may also find the problem sooner, or later, depending on what we monitor. The signal we want to monitor and the technology we use to monitor it will impact on our P-F interval. Some technologies detect certain failures sooner than others (lengthening P-F). If we are introducing new technologies and our experienced people can’t forecast P-F using the new technology, then we need to get some advice from those who are more experienced with it.

Techniques and devices

There is a fairly broad range of technologies available to choose from. Some of the more commonly used are: ultrasonic, thermal / thermographic, vibration, oil analysis and NDT.

·       Vibration Analysis

Vibration analysis works on the principal that moving machinery will vibrate to some degree – more if it is in distress. We can detect minute changes in displacement (position), velocity of the movement or acceleration (g-forces) – as we move from lower to higher frequencies. The higher the frequency at which the defect is likely to be detected the more likely we are to use velocity or acceleration to detect it.

Vibration monitoring equipment can be very simple, providing a single readout of overall vibration level or energy, or very complex, providing full vibrational spectral displays over time in what we call a waterfall display. Whatever we use, we are generally looking for two things – the magnitude of the vibration (bigger is worse) and the changes to it over time. If it changes rapidly over time, then we have a rapidly developing problem.

Sensitive vibration equipment will detect far more than we can detect with our hands, which tend to detect only lower frequency problems (like our hearing). Using this technology opens us up to the possibility of finding a wider array of problems and doing it sooner.

·       Ultrasound

Ultrasound can also be used to detect problems outside of our audible hearing range. If we have problems that give off sound frequencies below 20 Hz (less than we can hear), then we have “infrasound.” If we have frequencies above 20 kHz, then we have ultrasound. So long as sound can propagate through some medium (air, water, steel) we can detect it. The sound propagates through molecular impact (like vibration), dissipating energy as it travels. The further we get from the source, the more difficult it is to detect. In both of these technologies it is important to get as close to the source as practical and minimize the number of media interfaces (such as bearing to housing contact points) to minimize the energy lost.

·       Vibration analysis

Vibration analysis usually ignores the material through which the signal travels because we are generally measuring at the same point each time. Comparisons are valid regardless. Sound measurements are different though. The sound waves travel at different speeds in different media. We need to take that into account, as well as the dispersion of the signal with distance through the media. For example, if we take an ultrasound reading of the same signal at a two different distances – say 10 cm and 80 cm, we’ll get readings of 60 and 42 dBµV respectively.

·       Ultrasonic detectors

Ultrasonic detectors tend to be directional. They can be rigged to detect sounds over an area using horn-shaped housings to gather the sound, but the actual sensor is unidirectional. For this reason we can find a general area where there is a problem and then get closer to pinpoint its precise location. There are also contact sensors that rely on sound transmission through metals and other materials, like vibration sensors. The sensors used to create ultrasonic images of a baby in-utero are of this kind. They use a lubricant gel to provide a better transfer of energy from the mother’s body to the sensor.

·       Thermography

Another useful technology for CBM is infrared thermography. The human eye detects light but misses a large part of the total electromagnetic energy spectrum. Infrared energy is largely undetectable to the human eye although we do see some of it in the form of glowing hot surfaces. Beyond that however, we need infrared detectors (cameras) that are sensitive to the infrared energy waves which have longer wavelengths and lower frequencies than we can see.

Infrared technology produces either a reading of a temperature that it determines from the energy the detector receives, or it produces an image showing temperatures and temperature gradients. Of course it isn’t quite so easy because every surface material emits energy at different rates – we need to know the characteristics of the surface we are looking it as well as its condition at the surface, the wavelength of the device we are using to measure the temperature and the geometry of the area being measured. Like ultrasound, infrared can get complicated to use.

Infrared can be used to detect electrical problems, connection problems, mechanical problems that generate heat, insulation or refractory problems, moisture problems and even fluid flow problems.

·       Oil analysis

Oil analysis is another tool in the arsenal to help us spot problems in equipment that is lubricated or cooled by the oil, or to detect problems with the oil itself that could ultimately lead to equipment problems. Oil analysis is probably one of the more widely used methods. There are simple checks we can do on the spot (we can see emulsified oil due to excessive water content, or we can see when oil is excessively dirty) or we can take samples for analysis. The analysis can determine oil condition and oil contents (wear particles from machinery). Sampling must be done carefully and consistently for good results and the lab doing the analysis should know what sort of problems you are looking for so that they perform the right tests.

·       Non-Destructive Testing

Finally, we have Non-Destructive Testing (NDT), which can be used where the other methods won’t work. It is a CBM technique but unlike the others that don’t require the machinery to be shut down, this often does. In NDT we are looking at material properties to detect corrosion, erosion, cracks or other defects. We have an array of techniques including visual inspections, magnetic particle detection for sub-surface cracks, liquid penetrants for surface cracks, radiography (x-rays) for thickness, material density variation and voids, pulse-echo ultrasound to detect thickness or material anomalies. Additionally, there are even more advanced methods, several of which make very creative use of ultrasonic with both active and passive components. Needless to say, NDT is a field for specialists.

Comprehensive program

Condition monitoring is the first part of a CBM program. It takes advantage of a wide array of technologies ranging from the human senses to highly sophisticated sensors with computerized signal manipulation and interpretation. It is a powerful tool in dealing with failures proactively, so we avoid the consequences of those failures through early detection and timely follow-up corrective action.

James Reyes-Picknell, is president of Conscious Asset and the author of Uptime – Strategies for Excellence in Maintenance Management (Productivity Press, 2015). Reach him by phone at 705-719-4945, email him at or visit