MRO Magazine

Feature

Domtar’s award-winning Windsor mill examines the sources of error


(Photo: Carroll McCormick)

(Photo: Carroll McCormick)

A worker is tasked to cut some failed bolts on a pressure vessel door. In the process, he cuts his finger and takes five stiches. An investigation determines that this accident was just waiting to happen. In a new approach to understanding the cause of incidents like this, Domtar’s pulp and paper mill in Windsor, Que. is improving its safety programs by adopting a more correct way of understanding the nature of errors.

Adapting an approach developed to reduce the number of incidents in the nuclear power industry, called Human Performance Improvement (HPI), the Windsor mill is focusing less on how an incident happens, and more on why. Behind this tactic lies the dawning awareness that individual errors contribute far less to incidents than do organizational weaknesses.

Examine an event, incident, or unwanted outcome – whatever name you prefer. A traditional belief, now considered false, is that human errors arise only at the level of the individual. The truth is more like this: 20% of events are due to equipment failures and 80% are due to human error. But only 30% of that 80% are due to individual mistakes. The other 70% are due to organizational weaknesses.

So how did that worker cut himself? “In the old days we asked the worker why he screwed up. But when we analysed this one, we found a lot of organizational weaknesses,” says Eric Ashby, general manager, Domtar Windsor Mill.

First, the bolts had been improperly torqued. Second, the inside door was bigger than the outside door. The worker had to wrestle with it to remove it. Third, his supervisor warned, “don’t drop the bolts in the tank.” The only way to do all this and remove the bolts was to grind with one hand. He lost control of the grinder and cut himself.

The worker just happened to be the last one to show up at someone else’s party.

Additionally, although there were protocols in place for the task, there wasn’t one for what to do if the job could only be done with one hand. Among the lessons learned during the post-event investigation, Ashby says, “As part of the corrective action, what do you do if you have to use one hand? Stop.”

This example reveals several bold changes in the approach to safety at the Windsor mill. One, it is an example of what is called a just culture. Simply put, in a just culture workers are not punished for actions that are in line with their training and experience. “You separate individual and organizational culpability. We have a just culture decision tree that tells us if it is an individual or organization that is culpable,” Ashby says.

Two, rather than simply blaming the worker, look to management for the cause of events. For example, says Ashby, “If a lockout system is too complex, there is a higher probability that an employee will make an error. This is an organizational weakness.” Or, a tricky startup sequence may be begging for an event to happen.

Three, moving away from what is called the circle of despair: an incident, panic, fixing it, and then waiting for the next incident. “Part of HPI is how to move from reactive to proactive. Can we do better analyses of existing incidents and put in SMART+ER (Specific, Measurable, Actionable, Realistic, Timely + Effective Reviews) corrective actions? The issue we see now is that when we analyzed our investigations in the past, we were very good at identifying what happened, but not why it happened,” Ashby explains.

He continues. “The big shift is moving away from Root Cause Analysis. In [HPI], we don’t focus on identifying the root cause. We believe we have causal factors. You want to eliminate all causal factors you can identify. Every contributing factor is a weakness in your barriers.”

The barriers Ashby refers to are ways set up to reduce the consequences of of errors, while granting that people will always make them. Ashby lists four types: Cultural: norms that a group of people generally agree on. For example, people in one region might interpret a yellow traffic light as a cue to speed up, while those in another understand it as a cue to slow down; Administrative: Examples include lockout sheets, checklists or procedures; Engineering: guards on equipment and seatbelts are examples; Management/oversight: A radar and a sign displaying your speed is an oversight barrier designed to get you to slow down.

Barriers are imperfect. Each has weaknesses, or holes, which is why Ashby refers to the “Swiss cheese model.” The more holes that line up, the greater the likelihood of a significant event linked to an error. “Our job is to identify all the holes. You have to figure out a way that there are the best barriers possible. If you are only protected by one barrier, there probability of an incident is 100%,” Ashby says. “Seventy per cent of our work is to identify latent organizational weaknesses and put better controls in place.”

Ashby cites a formula that captures this process: “RE + MC –> zero significant events.” Written out, it reads, “reducing the probability of making a human error, plus instituting the proper management control in the plant, leads to zero significant events.”

The Windsor mill has replaced the term “incident” with “significant event.” “The flaw of Occupational Health and Safety Act (OHSA) incident rate is that it does not show its severity. One fatality equals one cut on the OHSA scale. We have four levels of significant event: Level 4: near misses, or minimum first aid; Level 3: Minor recordable (requires diagnosis and treatment); Level 2: hospitalization of more than one person; Level 1: permanent impairment; for example, amputation or long lost time,” Ashby says. This extra detail lets Domtar distinguish between, say, three cuts with two stitches, and losing a leg.

There are many moving parts to this version of HPI that Domtar has adapted to the Windsor mill, all designed to give a face to errors, quantify them and reduce the number of events. “We are pushing for zero significant events. We know that the company that survives is [the one that is] controlling significant events,” Ashby says.

For example, he says, “We have defined 14 tools to lower the probability of human error. For example, there is the concurrent verification tool, where one person does a lockout and someone else verifies it. This decreases the probability of error.” Then there are the 32 error precursors Domtar had already identified, such as frustration, fatigue, rushing and complacency, which are expanded in HPI. For example, a dirty area will increase the probability of making an error.

Domtar is also capitalizing on the concept of the three performance modes that every human supposedly has, and their associated error types and rates: Skill base, rule base and knowledge base. The main error mode for skill-based performance is inattention, at a rate of one error per 10,000 actions. For rule-based behaviour it is misinterpretation. “The person doesn’t know what to do, but follows step-by-step processes, following a set of instructions. When you work in rule mode, your error rate is 1/1,000,” Ashby says. The main error mode for knowledge-based behaviour is an inaccurate mental model of, say, a system or process. The error rate here is 50%.

Ashby illustrates what to do with this information. “Imagine two rule-based workers doing concurrent verification. The probability becomes 1,000 times 1,000, or one chance in one million of an error.”

Domtar uses 14 HPI human intervention tools to minimize the probability of errors. “The supervisor will work with his workers, have the proper tools; e.g., risk assessment, circle of danger, etc. This is the RE,” Ashby says.

For now, HPI at the Windsor mill is being driven at the leadership level, where 70% of human errors reside. It begins with the mill manager, then managers, superintendents and then supervisors. As the plant evolves its version of HPI it will get more involvement from employees.

Initially, HPI was applied after there was an error. But doing audits is taking it past reaction to proaction. There are also continuous improvement mandates; e.g., improve an area as a result of several near misses, or the perception of a risk. Here, employees get a lot of involvement in audits.

Ultimately, Domtar wants to bring its Windsor mill employees to a team mindset, or, for those familiar with the Dupont Bradley Curve of a maturing safety culture, an interdependent stage where zero injuries is believed to be an attainable goal. Ashby says, “The key way to approach a just culture is how to approach human error. It is a shift, a continuous evolution. We are very proud that our mill and community have moved in that direction.”


W mill exterior

(Photo: Carroll McCormick)

A Proud Safety Record

Domtar’s Windsor pulp and paper mill has long sought to drive down its incident rate. Between 2008 and 2012 it was 1 to 1.2 per 200,000 hours. With the addition of techniques such as reporting and analyzing near misses in 2012, Domtar was awarded the 2012 Paper Week Canada Safety Leadership Award, with just five recordable incidents in the course of 1,483,056 hours worked.

Combined with tools such as the safety sphere – an imaginary, three-metre diameter bubble which workers constantly analyze for danger, a notebook called “J’analyse la tâche” (I am analyzing the task) with a checklist of possible risks and room for noting problems, and risk assessments that, for example, led to a plant-wide ban on utility knives, the mill pushed the incident rate below one: 0.8 in 2013.

“The question was if we could do better: 0.5,” says Eric Ashby, general manager, Domtar Windsor Mill. In what would signal a move from individual responsibility to a team approach, a just culture and improving its understanding of the source of errors, Domtar began to modify an approach to safety from the nuclear power industry called Human Performance Improvement. “Can we catch the errors, human or organizational, before we get a consequence?” Ashby asks.

Domtar’s efforts have paid off: the Windsor mill had its best ever safety performance in 2014, and 2015 was the third best year for safety performance.

The mill’s safety approach has gone through a major evolution. “[It has] brought us to a level which has a lot of differences from 2013,” Ashby says. “We are very proud to have moved in that direction.”

This article ran in the February 2016 issue of Machinery and Equipment MRO magazine.

Montreal-based Carroll McCormick is the award-winning senior contributing editor for Machinery and Equipment MRO