Data and RCM
By James Reyes-PicknellFacilities Maintenance Machinery and Equipment Maintenance Preventative Maintenance Energy Food & Beverage Machine Building Manufacturing Metals Mining & Resources Packaging Transportation & Logistics Utilities CMMS computerized management data information precision RCM reliability reliability centred maintenance
Data, or more specifically, a lack of it, is one of the more common reasons given by many to delay or forego doing reliability centred maintenance.
Is a lack of data really a problem? Or is this just an excuse to sustain the status-quo?
Digging deeply into reliability requires math, which implies the need for numeric data and it implies precision. After all, one cannot do precise calculations without it. Many will stop there while arguing that the data they have is ill suited for purpose; it is lacking in sufficient quantity to be statistically valid, or it is just just not there.
However, is precision needed for reliability centred maintenance (RCM)? Is a lot of data needed? Does the data need to come from your computerized management systems? The answers to all three of those are “no” and here is why.
RCM is used to make decisions about tasks to be performed, primarily about task frequencies. Decisions are based on information, and some of that information can be based on data. Task intervals will usually be specified to be carried out daily, weekly, monthly, and quarterly.
Often during RCM, the task frequencies are specified to fit the standard frequencies already in use. If calculated properly, the task frequency comes out to 35.7 days, it will be rounded to “monthly”. If it comes out to 70 days, some might round up to “quarterly”. No matter how precise the calculation, the specified result is inevitably rounded up or down depending on how conservative the team might be, and more cautious ones usually round down.
Even highly precise calculated frequencies, like the interval for testing of a safety device, depend on inputs that are usually estimated. For example, one input is the “tolerable mean time between multiple failures.” That is invariably based on gut feel and estimates of past event frequencies. Whenever working with estimates, your results will always be approximate. If your estimates are reasonable and not just random selections of numbers, the results will be similarly reasonable.
Data also informs information about failure characteristics, such as the value of Weibull Beta and characteristic life. Without data from a CMMS, however, you can still get decent estimates of these parameters.
In one situation, there was a field supervisor with roughly 20 years of experience. He could recall three incidents of a specific catastrophic failure that he had to deal with over the period of time. There were three supervisors and one wasn’t available, but the other was. He said that he had a similar experience. A few questions about those incidents revealed that they were indeed speaking about different events. That meant there were six, and possibly nine events over the period. The number of devices were known (nearly 800 transformers that were used in underground vaults), giving roughly 16,000 operating years and six to nine events.
MTBF was therefore in the range of 2666 to 1777 years for that particular event. None of the data came from the CMMS. In fact, the memory of those supervisors covered 20 years, the installed CMMS had only been there for six. The information about those transformers in the CMMS didn’t actually record any of those incidents, even though both supervisors remembered several incidents in that six-year time frame. There was also wonder whether the events were truly random or related to age. Since the incidents were spread out fairly evenly, the assumption was that it was random.
The supervisors remembered that those events were associated with severe weather events and flooding. Historical replacements were looked at to identify which were associated with known weather events. Although it was difficult to tell which were the ones remembered, it was found out that those replacements had taken place in older devices of a specific model. Finding relevant information required far more than what the CMMS had stored, and it relied heavily on the memories of experienced field personnel.
Doing RCM requires dealing with teams made up of experienced maintainers and operators. In doing this work since the mid-1980s, what is in their heads is usually far more valuable than the volumes of data stored in today’s maintenance management systems. Those systems help in managing the processes, but are not particularly helpful in gathering relevant reliability related information.
When dealing with reliability, we base decisions on information, not just data. While the latter feeds the former, that data is often lacking in details that inform us of what actually happened. One big reason is that we don’t ask anyone to gather the data. When implementing these systems, we rarely ask reliability engineers what data they need, and if we did, we’d find that it is very difficult to capture failure mode related data in a system designed to capture work order transactional data. They are simply not the same.
Field maintainers rarely fill in all the details unless the data fields are “mandatory”. Even then, if those fields have default values, those are often left untouched. Those maintainers do not often see value in collecting data. They don’t use it themselves, and they rarely see any maintenance or reliability engineers using it. To them, it’s a waste of effort to collect and record it. As we install more IIoT devices, they argue that we could be collecting data automatically.
The problems with field gathered data in the CMMS that existed over 25 years ago are no different in today’s systems. It cannot be relied on without a serious concerted effort to do better. Of course, improved data gathering may not be a big help even if we do achieve it.
Another problem with data for reliability is if you do your job right as a design engineer or a reliability engineer, you will have very few failures. Even if the design is only “so-so”, it won’t fail all that often. Reliable equipment isn’t failing, so there’s no data on failures to collect.
Not long after Stan Nowlan and Howard Heap published their paper, Reliability-centred Maintenance (1978), Howard L Resnikoff argued, “One of the most important contributions of the reliability-centred maintenance program is its explicit recognition that certain types of information heretofore actively sought as a product of maintenance activities are, in principle, as well as in practice, unobtainable.”
He was observing that in doing RCM analysis, we must work with assumptions and information that can rarely be substantiated with field observation prior to analysis. In fact, if we are doing work at the design stage, there is no directly relevant field data. However, there is often indirectly related data from similar systems, and invariably, we need to be cautious using it because of differences in how the old and new systems will be operated.
New aircraft designs are subjected to RCM analysis before they are put into service. The maintenance programs for new aircraft are well thought through and many of the calculated aspects are based on assumptions drawn from consideration of data from other systems. Those aircrafts usually operate quite reliably and safely. Had there been a wait for data to determine what maintenance to be doing, a form of root cause analysis would be used, and many failures allowed to occur in order to gather relevant data. Few would fly in aircraft that are having their maintenance programs designed that way, and the world’s various aircraft regulators would never allow it.
If we want good data from the field, we need to have technicians in the field who are trained in and understand some basics of reliability. They’ll spot problems and their minds will make connections based on memory that a data base simply can’t achieve with data that is often gathered but unfit for this purpose.
A lack of data on failures is no reason to delay or avoid performing RCM on your critical systems. That situation is the norm for any critical system being subjected to RCM analysis, and it has been long proven over the 44 years since RCM was “invented” to work. It’s just an excuse, and a poor one at that. If you are among those who are avoiding RCM, please tell us what’s holding you back now? MRO
James Reyes-Picknell is President of Conscious Asset and the Author of Uptime – Strategies for Excellence in Maintenance Management (Productivity Press, 2015). Reach him by phone at 705-719-4945, email him at email@example.com or visit www.consciousasset.com.