Failure is often a dirty word in maintenance. Your department gets chewed out when equipment breaks down. And being measured on downtime metrics makes any equipment issue the ultimate sin.
That’s unproductive. Worse, it’s unfair. There are some things you just can’t control, like asset age, design, or user error. And failure can actually be a valuable resource.
“You need failure to improve,” says Thibaut Drevet, a solutions engineer at Fiix and former industrial and maintenance engineer. “Failure helps you understand the systems you are maintaining, how they operate, and how you can maintain them.”
This article explores how you can use a FRACAS to harness the power of failure and use it to help your business increase output.
What is FRACAS?
FRACAS stands for failure reporting, analysis, and corrective action system. It’s a closed-loop reporting system for controlling and eliminating equipment failure with three main components:
- Failure reporting (identifying asset failure)
- Failure analysis (learning from the failure)
- Failure correction (taking steps to fix the failure and stop it from happening again)
A FRACAS takes the history of equipment performance into account to find common failures and determine the best way to handle future breakdowns. It also informs your reliability maintenance strategy, from design to scheduling.
How to create a FRACAS using the FRACAS loop
The FRACAS loop is a set of processes that help you report, analyze, and correct failure. These processes are always repeating so you can continually find, learn from, and correct failure.
The FRACAS loop has five main activities:
- Failure modes and effects analysis
- Failure code creation
- Work order analysis
- Root cause analysis
- Strategy adjustment
Failure modes and effects analysis
A failure modes and effects analysis (FMEA) is a plan for when the worst happens. It’s a list of all the ways equipment can fail, the impact of each failure, and what to do about it.
An FMEA is made up of 10 main elements:
- Asset component
- Potential failure modes
- Potential failure effects
- Severity of failure
- Potential causes
- Expected frequency of failure
- Current processes to detect and prevent the failure
- How detectable the failure is
- The total risk of the failure
- Recommended action
Download your own FMEA template here
An FMEA is a baseline for failure. It lays out every scenario so you can prioritize action based on asset criticality, impact, frequency, and resources needed. it’s also a living document. As you find out more about failure at your facility and how to eliminate it, your FMEA will be modified to reflect the changes. That’s why the FRACAS loop always comes back to this step.
Failure code creation
Failure codes translate equipment issues into a very short description that identifies the part, defect, and cause. For example, a failure code on a variable speed transfer conveyor might be: Bearing, wear, lack of lubrication.
We did a full rundown of failure codes a while back, but here are a few quick best practices:
- Make sure each part has a distinct naming convention. If two similar components are confused, it could lead to even bigger problems.
- Classify defects into categories to keep things simple, yet clear (ie. based on condition, like wear, overheated, etc.).
- If you’re using preloaded codes on a CMMS, use only the most common ones. Anything over 10 is usually too much and leads technicians to hit the ‘other’ option rather than spending time finding the right code.
- Use your FMEA to create an initial list of key failure codes. Validate this list with technicians.
Tracking failure codes will help you see trends in failures over time. It allows you to pinpoint which ones happen most often and cause the most loss for your company so you can build a plan to prioritize and wipe them out.
Work order analysis
One failure is a nuisance. A dozen of the same types of failures is a trend that’s costing your team a ton of money, interrupting your schedule, and getting you on the bad side of production. A work order analysis is a step in the FRACAS loop that’ll help you spot these trends and resolve them.
One of the easiest ways to analyze failure data in work orders is to look at failure codes and their frequency in completion notes. For example, let’s say there are four pieces of the same equipment that have experienced 12 failures total over six months. Because these machines are money makers and take a long time to fix, the failure rate is going to jump off the page.
When you look at these failures, you see the most common failure code (10 out of 12 instances) was bearing seizure from misalignment. Now you know what problem to focus on. You also have a baseline to measure your response against. If the instances of this failure drop to two or three in another six months, whatever you did is working.
There are dozens of other ways to use failure data from work orders. We’ll cover a few of them below, but you can also check out this short guide on finding and using work order data.
Root cause analysis
Root cause analysis is not a troubleshooting tool. It’s a tool to get value from troubleshooting. You and your team can fix a misaligned bearing without an RCA. But you’re going to have to fix it more than once. And that means using time, budget, and parts more than once.
A FRACAS is only valuable when it’s making long-lasting improvements that put money back in your pocket and time back on your schedule. That’s what an RCA is there for.
We covered strategies for doing a root cause analysis and built an entire root cause analysis template you can download, so this article won’t cover the finer points of conducting an RCA. But here is one example of how to integrate an RCA into a FRACAS using the misaligned bearing from the previous section:
- Why is the bearing misaligned? Because the shaft was misaligned.
- Why was the shaft misaligned? Because the machine was improperly assembled.
- Why was the machine improperly assembled? Because the technician rushed to assemble it.
- Why did the technician rush to assemble it? Because they weren’t given the proper amount of time for the job.
- Why wasn’t there sufficient time allotted for the job? Because the window for routine maintenance before production was too small.
The most important thing to remember when doing an RCA is to not jump to conclusions and stop your investigation short, says Thibaut.
“It’s easy to assume that the simple cause is the reason an asset broke down,” says Thibaut. “That’s why you need a diverse set of people to contribute to the RCA so you have different viewpoints and ideas, and so you avoid these assumptions.”
All the insights you collect with the FRACAS loop won’t amount to much if you don’t act on them. Taking action isn’t always about huge changes. It can be as small as adding more specific instructions for applying lubrication to a work order. But big adjustments are sometimes necessary, like hiring a contractor to do specialized tasks your team wasn’t trained for.
While each response will be different, there are some common strategies that will help you correct and prevent failure in the long term:
- Include technicians in the process: Technicians might offer solutions you didn’t think of. Tell them exactly why you’re making a change and how it benefits them. This increases buy-in. And show them the results of the change. If a modified process led to a 40% drop in after-hours call-ins, let them know. It shows appreciation for their work and improves buy-in for future changes.
- Monitor the outcomes: If a strategy isn’t working, you can catch it early and continue adjusting until you get it right. Keep an eye out for the domino effect. A change might be good for one area of your operation, but take away from another. Lastly, track your success stories to get buy-in and budget from your manager when you need it.
- Start small and expand slowly: If big changes are necessary, don’t do them all at once. Focus on one piece of the overall change. For example, if you’re trying to get a few extra hours of maintenance time on equipment (at the expense of production), start with one machine. Not only will it be easier to implement your plan, it also gives people time to adjust to the change.
Closing the loop
After adjusting your strategy, the FRACAS loop starts all over again. Here are a few ways to bring your strategy full-circle so you can keep finding and correcting failure:
- Update your FMEA to reflect any new failures you’ve discovered and the impact of the changes you’ve made. Maybe a failure is happening less frequently or there’s a new procedure for handling a certain failure based on the work of your FRACAS.
- Audit your failure codes. Add any new and common failure codes you’ve discovered and remove any codes that are now less frequent. Make sure the codes you have are still relevant, clear, and useful.
- Create reports to track the impact of the changes you’ve made. Is failure happening less often in the areas you’ve addressed? What does this mean for costs, scheduling, etc.?
How to get good data for a FRACAS
Data guides you through every step of a FRACAS. And like any good guide, your data needs to be trustworthy, which we all know isn’t always the case. Your numbers might never be bulletproof, but you can improve the quality of information with a few key actions.
Create a culture where the value of maintenance is understood
Most data errors occur when technicians are rushed, says Thibaut. They’re barely given time to complete a job before hurrying to the next one. Rather than face the ire of production in this situation, technicians leave data input to the end of the day when their memories aren’t as good. Or they skip it altogether.
A healthy culture, where everyone at a plant understands the value of maintenance, helps to counter this.
“Everyone needs to understand that maintenance is not the enemy of production,” says Thibaut.
“When everyone understands that maintenance is necessary and beneficial, it allows technicians to take their time and log data properly.”
Build clear, easy-to-fill work orders
It’s easy to blame bad data on human error. But human error always has a deeper cause. One of the most common is unclear, overwhelming work orders.
For example, without pictures, diagrams, or proper naming conventions, it’s easy to misidentify a component. This could throw off future failure analysis and reporting for that asset and similar ones. And not having a clear process for reporting and following up on failure will usually result in no action being taken at all.
Here’s a great starter pack for creating world-class work orders that support an effective FRACAS:
- Mastering the fundamentals: Maintenance work orders
- A short guide to designing work orders that help you crush your goals
- Maintenance work order template
- Equipment maintenance log template
- Preventive maintenance checklist
Automate and integrate
Building great work orders won’t totally eliminate human error. Everyone makes mistakes. But technology makes fewer of them. Installing condition-monitoring software on equipment will replace manual data entry with automated measurement.
It’s easy to get a meter reading on a broken asset wrong when you log it manually. Maybe it took you five minutes to get to the machine. In that five minutes, the meter reading changed. Now you’re associating failure with the wrong measurement.
Having software that logs meter readings in real-time removes this risk. It marks the exact reading at the time of failure so you’re sure it’s right. You can capture and analyze all this information in one place by integrating this system with your maintenance software. There’s also the added perk of being able to trigger maintenance immediately based on meter readings.
Audit your data frequently
Set aside time every month to check your data and make sure it’s accurate. That doesn’t mean combing through every single work order and number to verify them. Conduct spot checks, look for red flags, and talk to technicians to identify where pencil whipping may be cause for concern. Avoid finger pointing. Pencil whipping is often more about external obstacles than the character or skill level of technicians. Some good questions to ask include:
- Is there any inspection or task that feels unnecessary? Remove this task, reduce its frequency, or explain why it’s important.
- Are you clear on what data to log and why it’s important? Get everyone on the same page about what to measure and how (ie. measure in minutes, not hours).
- Is data easy to log? If not, why? Uncover processes that made sense on paper, but don’t work in practice (ie. a long list of failure codes or measurements that are hard to quantify).
How to use a FRACAS: 5 maintenance reports to help you drive results
Finding and correcting a failure is great. Finding and correcting a failure that’s holding your company back from producing more things and making more money is even better. To do that, you need reports that find this kind of asset failure. Here are five to get started with:
- Failures after start-up
Failures that stop production before it gets started puts operations way behind. This report helps you pick out these damaging failures and prevent them.
- Maintenance costs by failure code
Tally up the cost of labor and parts for all failure codes on closed work orders to identify which ones are costing you more and prioritize them.
- Maintenance hours by failure code
When you spend time fixing the same failure again and again, it robs you of time on tasks that could prevent downtime elsewhere.
- Failures found through scheduled vs. unscheduled maintenance
This report helps you prioritize recurring failures that cause costly reactive maintenance.
- Failures by shift or site
This report helps you identify big problems that exist with processes or training that, when addressed, could lead to huge gains. If a shift or site has a lower failure rate, you can look into what they’re doing differently and replicate that across all shifts or sites.
How to use a FRACAS: Real-world examples and use cases for your business
A FRACAS is always at risk of being just another file on your computer. That’s because it’ll change the way you and your team works, which isn’t easy. Understanding what problems a FRACAS solves helps ease these growing pains. Here are some real-life examples of how a FRACAS can help you target some of your maintenance team’s biggest pains:
- Through your FRACAS, you’ve discovered that equipment is breaking down most often when old parts are used for repairs or replacements. You can also see how much these failures are costing in total maintenance and lost production. You can make a case to get a higher inventory budget to eliminate these failures.
- An asset that rarely broke down before is failing more often and you don’t know why. A FRACAS analysis reveals the failures are happening to one component and started three months ago. That’s when the line began using different product specs that maintenance was not aware of, which affected the machine set-up. You develop a new process for communicating line modifications that decrease downtime at several sites.
- A review of failure codes identifies three common types of failures. You only have the resources to tackle one this quarter. You dive into your FMEA, cost reports, and root cause analyses to find the failure with the biggest impact. After this success, you secure the budget to hire more technicians to fix the other failures you found.
Building a FRACAS takes three ingredients: Data, time, and commitment. You need a lot of data about failure to find its root cause and address it. You need the time to get this data. And you need the long-term commitment to capture accurate data and apply its lessons. It takes a while to master these elements, so start small, track your wins, and don’t give up if you don’t see immediate results. The effort is worth the long-term return on investment.