MRO Magazine

Shutdown Work: What’s needed, why and when?


December 14, 2001
By PEM Magazine

Major cost items come in three flavours — capital investments, catastrophic failures and deliberate plant shutdowns. The first is subject to intense scrutiny/justification and, with the exception of the patchy adoption of life cycle costing, is pretty much understood. We are trying desperately to avoid the catastrophic events — and have been grappling with systematic and quantitative analysis methods for many years (HAZOP, QRA techniques, risk-based inspection etc.). The third area, planned shutdowns, is still an enigma for many organizations.

Much effort has gone into the efficient planning and delivery of the work involved, but relatively little guidance exists for determining what work is worth doing in the first place, and how this should be clustered into appropriate packages to share shutdown opportunities. A surprising number of organizations (particularly in the utilities and service areas of operation) still do not even know how much a shutdown costs them.

This article examines some recent advances in quantitative evaluation of shutdown programs. It looks at the bundling of tasks — the logistics of delaying some activities to coincide with others, and the compromise economics of shared downtime costs versus the performance and risk impact of premature or deferred work.

Orgins of the new approach
This methodology has been developed by the European MACRO project, a recently-completed five-year collaboration program sponsored by the British government, Halliburton Brown & Root, Yorkshire Electricity, The National Grid Company and The Woodhouse Partnership. MACRO has yielded a suite of methods for cost/risk/performance trade-off decisions — such as optimal maintenance or inspection intervals, equipment renewal or upgrade justification, shutdown strategy, spares requirements, etc. In each of these areas, a blend of innovative, risk-based evaluation techniques have been developed alongside structured guidance "rules". These have been developed and proven in field by those faced with the decisions (i.e. not some academic theoreticians).


What work is needed, why?
The first step is the systematic determination of the tasks that might warrant a shutdown in the first place. Here the methodology splits a "greenfield" from "brownfield" environment. If there is an existing regime of shutdowns, inspection cycles, etc., it is somewhat wasteful to rebuild the task list from scratch. However, even in such cases, a "zero-based" maintenance program (e.g. FMECA and RBI/RCM combinations) can be a good stimulus to challenge existing habits and preconceptions.

Reasons for tasks
The FMECA stage is fairly well evolved — albeit with some variations depending upon the existence of local historical data. One minor advance in this area, emerging from the MACRO program, is the observation that, for greenfield projects (with no operational experience), it is often easer to populate the list of potential degradation and failure modes in reverse — i.e. by mapping intended functions first, then listing functional failure consequences and finally brainstorming the failure modes that could result in such effects.

Where maintenance history exists, on the other hand, known failure modes comprise the "seed" information, from which to extrapolate and consider other potential (not yet observed) modes. Generic libraries or templates can also act as such seed material, provided that local conditions and potential failure modes are also considered.

The criticality (the C in FMECA) assignment to failure modes is a subject in its own right — a source of confusion or clarity, depending on where you stand. It is certainly needed, and in shutdown studies, we have found that the main decisions are determined by just five to 10 dominant failure modes and the tasks designed to address them. Identifying these critical items, however, is not easy. The American Petroleum Institute (API) Recommended Practice (580/581) on Risk Based Inspection (RBI) is predominantly a criticality assessment and prioritizing of failure risks. Structured risk-ranking workshops, involving operators, engineers and maintainers, offer less rigour but are, in many cases, just as effective in identifying the key drivers, often at a fraction of the cost.

Types of tasks
RCM is the most widely accepted set of rules for relating individual threats (failure modes) to the best preventive, predictive, corrective or detective tasks. The method is particularly suited to the complex plant with many different types of failure modes. Static equipment holds less variety — most maintenance is condition-based and the predominant concerns are "what inspection method, and how often?" API RP580/ 581 were developed specifically to provide such guidance. Both RCM and RBI can be exhaustive (and exhausting!) but various criticality— streamlined versions have emerged to focus on the bits that matter most.

Whatever the identification method, individual tasks fall into two groups for our purposes — cyclic activities (such as preventive maintenance, inspections and periodic replacements) and one-off tasks, (such as modifications, capacity upgrades or other changes). The one-off tasks are generally subject to the same evaluation and justification as other projects or capital investments, and their timing is a matter of cashflow/payback/NPV/IRR calculations. The disadvantages of delay represent continued levels of risk, inefficiency or constrained performance, diluted to some degree by the advantage of deferring major expenditure.

Cycle tasks, on the other hand, are much more complex to evaluate and optimize. They exist because of (actual or potential) deterioration and risks or performance that changes with time. This topic is covered extensively in the relevant MACRO modules — how to build a model of the cost/risk/performance trade-off and determine the optimal interval, the impact of premature or delayed work, and the sensitivities to any key data assumptions. In summary this involves:

1. Structured, quantified description of the degradation process, using range estimates wherever hard data is not available. This description is built around distinct families of quantification techniques:
– Reliability & risk (failure modes, probability patterns and consequences);
– Operational efficiency (energy, consumables, output volumes and quality);
– Lifespan effects (life extension, capital deferment etc.);
– Regulatory compliance (safety, environmental);
– "Shine" factors (public and customer impressions, employee morale, etc.).

2. Cost/risk performance calculations for alternative intervals — putting numbers to the familiar trade-off curves below.

3. Sensitivity testing to the extremes of possible data uncertainty (often variations by factors of 10 or more for the speculative elements).

4. Identification of key decision "drivers" (which assumptions have the greatest effect upon the optimal decision).

If justified, more detailed investigation of these key assumptions to determine the correct strategy is needed. In most cases, range estimates are enough to identify the optimal interval, and only when the "cost of uncertainty" is high will the additional research be justified.

The trade-off calculations vary with the components involved — in many cases there are several interacting failure modes, efficiency profiles and effects upon life expectancy all in the same evaluation. For example, an overhaul of a heat exchanger will consider tube leaks and blockages, performance effects of fouling and cumulative damage to the bundles due to cleaning. The analysis results reveal which factors drive the maintenance strategy, and how that strategy varies with equipment usage, operational criticality, fouling rates, etc.

In the case of inspection intervals, there is a further split in the modeling methods required. The predictive/condition monitoring inspections dominate in major process industry shutdowns to identify and track vessel and pipework corrosion or cracking. Functional testing or detective inspections, on the other hand, are those designed to reveal existing "hidden" failures — typical of protective or standby equipment. The MACRO procedures for quantifying and evaluating these two families of tasks differ in the questions that need to be asked, but then calculate the same cost/risk trade-offs for various task intervals.

Combining tasks — compromise decisions
The shutdown strategy is a compromise. Some tasks will be performed ahead of their ideal timing, others will be delayed to share the downtime opportunity. The risks and performance impact of delayed tasks, and the additional costs of deliberate "over maintenance" in others, both contribute to the price paid for a particular shutdown packaging. The degree of advantage, on the other hand, is controlled by the costs that can be shared as a result. The downtime impact (lost opportunity costs) often dominates such sharing advantage, but the direct costs (planning, facilities, labour, etc.) of shutting down and starting up again must also be considered. The critical path of component tasks will determine the bundle’s total downtime impact — and this will vary with the degree of sequential or parallel working that is possible (as well as the discovery of defects that need corrective work, task overruns etc.). Uncertainty is often high but, like component task justifications, these bundle characteristics can be explored in "what if?" mode to determine if, and which, assumptions make a difference to the final outcome.

External constraints exist at both the individual task and shutdown bundle levels. Some inspections should occur at least tri-yearly, or that a maximum acceptable risk is 10-6 for a certain failure mode. This limits the range of allowable intervals for that task. At the bundle level, logistical, safety or resource restrictions might constrain the grouping of certain tasks. Such bottlenecks force a greater cost of compromise: a sub-optimal combination and timing for the work.

Another form of bottleneck is that introduced by the need for a task at short intervals while all other tasks can be performed substantially less often. This introduces the option of nested cycles (the other tasks being performed every two, three or more cycles of the short interval work). It also reveals the scope for design changes to de-bottleneck the requirements — eliminating the frequent shutdowns and extending run lengths. The analysis process itself calculates the net payback for such modifications or de-bottlenecking.

The grouping and regrouping of tasks, and "what if?" exploration of de-bottlenecking, can be manual (combining tasks in different bundles and moving the bundles to shorter or longer intervals) or semi-automatic. The MACRO R&D work has researched a number of methods for the latter — including Artificial Intelligence techniques such as neural networks, genetic algorithms and simulated annealing. The final combination is still being refined — but the various prototypes have yielded some astonishing results. In short, the scope for re-bundling tasks and timings is much greater than expected, with corresponding substantial impact on costs, performance and risk exposures. The National Grid Company in the UK did some early work using a genetic algorithm approach, and revealed scope for 21 percent improvement in system availability, at the same time as a 23 percent reduction in total cost/risk impact. Since then, ICI Eutech has been using the methods to evaluate shutdown intervals for chemical manufacturing plant (revealing the Canadian equivalent of $5 million in savings), and my team have been re-bundling the maintenance and inspection tasks on process plant, railways and water utilities. In one such case, shutdown intervals were extended from two years to four years, releasing over the Canadian equivalent of $10 Million/year in net improvement.

The National Grid Company in the UK did some early work, using a genetic algorithm approach, and revealed scope for 21 percent improvement in system availability, at the same time as a 23 percent reduction in total cost/risk impact. Since then, Eutech, the consulting arm of the European chemical firm ICI, has been using the methods to evaluate shutdown intervals for chemical manufacturing plants (revealing the equivalent of $5 million in savings), and my team have been re-bundling the maintenance and inspection tasks on process plant, railways and water utilities. In one such case, shutdown intervals were extended from two years to four years, releasing an estimated $10 million per year in net improvement.

CASE STUDY: Power distribution circuit
Here, the shutdown (or "outages") comprise a variety tasks on the connected assets of a critical supply route. These could involve up to 30 or 40 discrete items of equipment in the circuit, and each item (for example the circuit breakers at each end) may have several tasks assigned to it, with optimal intervals that vary from short (six to 12 monthly) to long (some only every 12-15 years). The circuit outage program is a complex blend of small-and-frequent, and larger-but-rarer tasks, with a vast number of permutations possible. Some tasks are required by law, others can be brought forward or delayed. The cost/risk impact of delay varies greatly with the deterioration rates — some items have critical timing and other have fairly ‘flat’ curves of total impact.

The analysis process calculated the net present value (NPV) of all future costs, risks and outage timings and, in this case, the optimal regime involved bringing forward several of the ‘next maintenance due’ dates to create a better alignment. The subsequent avoidance of multiple outages more than paid for the earlier initial expenditure.

CASE STUDY: Chemical production unit
In May , 2000, the above-mentioned ICI Eutech presented a paper to the MACRO results seminar on results achieved in studying a bulk chemical manufacturing plant. An existing bi-yearly shutdown typically involved approximately $700,000 and involved 21 days of downtime. The criticality analysis revealed which units were the main drivers for the shutdown — the HCl stripping column, the reactor unit manway lining, some sacrificial iron packing in a column and some of the smaller piping. It was noticeable that these items were NOT the biggest, most expensive items to inspect or maintain, but were deterioration rate limiting — the component tasks necessary to inspect or maintain them had the shortest intervals.

The elimination of some of these run-length constraints (bottlenecks) involved, for example, using high performance alloys (Monel) to achieve longer life. The payback for such additional periodic cost was revealed be measurable in months. The study revealed that a shutdown could be achieved once every four years, with NPV savings of more than $5 million.

CASE STUDY: Conversion reactor and condenser
In 1999, the Woodhouse Partnership was involved in a similar study ?± looking at the possible extension of run-lengths for a specialized reactor/condenser process. The initial criticality assessment took three days, using a combination of structured interview techniques and survey of existing FMEA and QRA studies. This revealed a potential ‘decision driver’ list of about 30 items, each with a number of inspection and/or maintenance tasks required. In addition, there were a few one-off tasks that were accumulating — technology upgrades and mandatory modifications that needed to be scheduled into the program. The following items were identified as the most influential in the shutdown decisions:

– Reactor vessel: internal support beams, injector nozzles, shell integrity
– Quench tower: internal supports, nozzles, shell integrity, relief valves
– Re-circulation pumping system: gate valves, seals, cooler
– Product chiller: cleaning cycle, bypass unit
– Separator unit: relief valves

Working from the shortest cyclic tasks outwards, we created individual cost/risk/performance models by interviewing operations, maintenance and engineering staff, recording their experience, opinions and extrapolations (how the equipment would behave if we extended the intervals). The resulting range-estimates were explored for all sensitivities, so that the recommendations included the future data requirements for further refining the strategy. Over 75 optimization studies of component tasks were performed to create the necessary raw material for the shutdown optimization. This took three weeks for a team of four individuals (two full-time and two part-time).

The component task studies themselves revealed the scope for substantial cost/risk/performance improvement. Around $2 million per year in savings were identified from a number of minor changes in work scope, in timing or design/operations changes. These included, among several other recommendations:
– Upgrading materials for the reactor support beams;
– Changing the cleaning process for the product chiller;
– Installing dual pilots on the relief valves (allowing on-line maintenance);
– Stainless steel lagging of injector nozzles.

The big prize, however, was the extended interval between major shutdowns. The de-design changes and bottlenecking allowed a doubling of the shutdown interval, with net total impact worth a further $8 million/year across the six units. This figure comprises the net effect of increased availability, reduced maintenance costs, all changes to risk exposures, performance impact and even projected changes to equipment replacement requirements. It is the conservative sum of the ‘pessimistic’ projections, so we can be confident that:
a) the real benefits are substantially higher than this and,
b) the proposed strategy is appropriate even in the extreme case of projected risk assumptions.


These studies are fairly typical — a combination of some hard facts, a lot of range-estimated speculation, a long list of potential influences but relatively few that really matter, and complex interactions between failure modes, deterioration assumptions, design options and maintenance tasks. It has confirmed, however, that structured approach, combined with modern "what if?" optimization tools, hold substantial scope for increased performance and cost/risk improvement.

John Woodhouse has 20 years experience in cost/risk optimization. His activities include designing and implementing change control procedures, optimal maintenance reviews, inspection strategies and company-wide training initiatives. John can be reached through his UK-based company, The Woodhouse Partnership, at