Effective troubleshooting is vital for driving down unplanned downtime and minimizing Mean Time to Repair (MTTR). Often, machine troubleshooting is regarded as an art rather than an essential maintenance technician skill. This leads to inconsistency in how equipment failure is addressed, which impacts operational efficiency.
This blog outlines a structured approach to maintenance troubleshooting. After exploring the importance of troubleshooting and its relationship to equipment maintenance, it delves into diagnostic techniques and ways of preventing common problems. On completion, the reader will have a clear view of the steps necessary for improving MTTR and availability.
Introduction to maintenance troubleshooting
Most industrial manufacturers use preventive maintenance techniques to ensure production assets perform as required. Despite their best efforts though, problems can arise. These range from unusual noises or smells to excessive variation, occasional jamming that requires a speed reduction or a complete functional failure.
When any of these occur a maintenance request is raised and a Work Order assigned to a technician. The technician attends to the machine, investigates the problem and, if possible, carries out repairs. This process of determining the type of equipment repair needed is what’s referred to as troubleshooting.
Troubleshooting differs from other aspects of asset management and maintenance in that at the outset it’s unclear what work is needed. In contrast, preventive maintenance tasks are clearly defined in advance, and where predictive maintenance strategies are in place the technician knows what type of problem to anticipate.
Good troubleshooting skills are essential for resolving problems and returning equipment to normal operation quickly. However, it’s an area where skill levels can vary throughout the maintenance team.
Understanding common maintenance issues
As every experienced maintenance technician will confirm, problems requiring their troubleshooting skills are often of a predictable nature. While specifics will vary by plant and type of equipment used, common issues are:
- Dirt/contamination: Leads to accelerated wear, can restrict motion and increase friction, causes sensor faults.
- Operator errors: Includes misloading of parts, entering wrong settings or values, leaving items where they can jam.
- Poor lubrication: Leaving oil too long between changes, blocked filters, worn or ineffective pumps, leading to accelerated wear and build-up of heat.
- Overloading: Accelerates fatigue and causes premature failure.
- Overheating: Causes problems with electrical and fluid systems, expansion can lead to seizure.
- Wear/abrasion/fretting: Problems include damaged insulation and inconsistent motion.
Historically, preventing issues like these comes down to routine inspections and requests from production for equipment repair.
A more effective strategy for improving system reliability is to implement fault detection methods. These range from simple poka-yoke (error-proofing) devices to sensors and machine health monitoring technologies that give advance warning of deterioration. Temperature, pressure and flow sensors, along with vibration monitoring and thermal imaging, are some of the most widely used.
Step-by-step guide to effective fault diagnosis
Troubleshooting equipment problems in a timely manner requires taking a methodical and systematic approach. In essence, there are four steps: Identify the problem, decide on a solution, test the solution and repeat until the issue is resolved. In many cases you may also add a fifth step: Perform root cause analysis to ensure the issue doesn’t arise again.
However, identifying the problem can be hard to do as the observed behavior could have many causes. This step-by-step guide (specific troubleshooting techniques vary by equipment type) is intended to support a systematic diagnosis:
1. Review historical records
On receiving the Work Order, start by reviewing maintenance reports to determine if the problem has occurred before with this specific machine or similar equipment. If records are brief or incomplete, check with the last technician who attended the machine to see if they have any troubleshooting tips.
2. Retrieve relevant documentation
Operating instructions, wiring diagrams, pneumatic diagrams and manufacturer manuals can all provide valuable information on how the equipment should function. Preventive maintenance checklists calling out recommended inspections and settings will also be useful.
3. Gather information
Talk to the person who requested maintenance support and to whoever first spotted the maintenance issue. (Typically, the section supervisor and a production operator respectively.) Have them describe both the fault and what was happening before the issue occurred. Pay particular attention to what product was being run and any abnormal environmental conditions but also beware of the human tendency towards recency bias.
4. Observe the fault or behavior
If the equipment can run, watch what happens. Depending on the nature of the fault, use instrumentation to monitor characteristics like temperatures and signals coming from sensors and going to actuators.
On high-speed production lines it can help to use a high frame rate video recording system. (Dedicated event recording systems are available for this purpose but tend to be expensive.)
5. Formulate and test a hypothesis
For many problems, having gone through the previous four steps, the technician will have some ideas as to the cause. These can be tested by swapping out components or bypassing or simulating signals. If these steps don’t resolve the problem or if it’s unclear exactly where it originates, a root cause analysis exercise can help.
Tools and techniques for efficient troubleshooting
While sharp observational skills and a multimeter are two of the most important tools for troubleshooting maintenance issues, other techniques, devices and equipment can help. These include:
- An efficient documentation system: Records of previous work performed are invaluable, along with relevant manuals and diagrams.
- A failure code system: Failure codes let the person raising the maintenance request categorize the fault or problem in a way that tells the technician what to expect. They also simplify searching through historical records.
- Skilled and experienced technicians: While it’s important to avoid the problem of specialist knowledge being restricted to a limited number of individuals, there is no substitute for team members with extensive technical troubleshooting experience.
- Advanced analytical equipment: This can include thermal imaging (to identify hot spots), vibration monitoring (to detect imbalance or bearing wear), acoustic sensors, imaging probes and high-resolution measuring devices (like the ballbar for machine tool condition assessment).
- Instrumentation for machine health monitoring: An alternative to taking equipment to the machine is to instrument the machine with sensors that report operating status and conditions in real-time. When combined with powerful analytical tools, including AI, this enables predictive maintenance strategies that reduce the need for troubleshooting.
Best practices for root cause analysis
When troubleshooting problems with production equipment there isn’t always time to establish the root cause: It’s more important to implement a quick fix that lets production resume. Inevitably though, the problem will reoccur and cause more downtime in the future.
Root cause analysis (RCA), if conducted with rigor, prevents problems reoccurring. Ideally, it forms part of the troubleshooting process but can be carried out later. The main tools for RCA are:
- Pareto analysis: Separates the important few causes from the trivial many.
- 5 Whys: A tool for drilling down into the reason a problem occurred.
- Scatter diagrams: Used to establish connections between causes and effects.
- Fishbone or Ishikawa diagrams: Used with the 6Ms of manufacturing to promote consideration of all possible causes.
- FMEA: A tool for exploring how system components might fail and the consequences that could result.
With all these tools there are some important practices to follow:
1. Gather enough data to allow a detailed understanding of the problem to emerge.
2. Use the tools in an open-minded and objective manner. (Having an experienced facilitator helps.)
3. Plan corrective actions in detail.
4. Follow through to ensure corrective actions are implemented.
Finding reliable technical support services
Troubleshooting eats into time that could otherwise be used for preventive maintenance. It’s also difficult to predict how much time will be needed for each troubleshooting request. An effective way of meeting these challenges without increasing the maintenance work backlog is to partner with external support specialists.
Some manufacturers are reluctant to take this step. Their equipment is often highly specialized and extremely complex, and they have concerns over aspects like response time and technical competence. While valid, these points can all be addressed. The key is to identify the most important factors needed in technical support services, usually:
- Familiarity with the type of equipment used. Is a specialist required?
- Scope of support provided. For example, purely quick fixes or root cause analysis and implementation of corrective actions?
- Response time. How long will you wait before help arrives?
- Technical resources. Do they have access to the latest diagnostic tools and are they trained in advanced troubleshooting methods?
When evaluating potential support partners, seek information from the widest possible range of sources. Contact industry associations and professional networks for references and read reviews and case studies to evaluate technical support service reliability and expertise.
As the leader in outsourced industrial maintenance, ATS employs technical specialists with deep expertise in many different types of manufacturing equipment and ensures they have the skills needed to provide high-level support. What’s more, ATS specializes in helping manufacturers implement a wide range of predictive maintenance solutions that reduce unplanned stoppages and the need for troubleshooting.
Leveraging maintenance software for troubleshooting
Technicians often have their own methods of troubleshooting, but those who achieve the shortest MTTR statistics invariably follow the steps outlined above. A vital tool for helping them is the Computerized Maintenance Management System (CMMS).
CMMS software comprises an asset database, records of maintenance work scheduled and performed and a system for raising and prioritizing Work Orders that go to technicians. More extensive systems also support activities, such as MRO inventory management and purchasing. An emerging trend in CMMS software is the inclusion of analytical tools for predictive maintenance strategies.
An effective CMMS supports maintenance planning and troubleshooting efforts by giving technicians easy access to equipment histories and documentation. It may also use a failure code system to simplify searching for similar problems on other equipment. Businesses not using the benefits of CMMS are at a clear disadvantage to those that do, with a major difference being troubleshooting speed and MTTR.
Support for improving maintenance operations effectiveness
Maintenance troubleshooting is often essential for ensuring continuity of plant operations. To keep availability at its peak, while using strategies like preventive and predictive maintenance, manufacturers should also seek ways of performing it quickly and effectively.
Troubleshooting should always be undertaken in a systematic and methodical manner. Resources, particularly detailed equipment repair histories, must be made available to avoid dependence on the memories of a few individuals. In addition, root cause analysis will drive down recurrence, so reducing the amount of troubleshooting needed.
Upgrading troubleshooting effectiveness — and reducing the need to perform it — is not easy, especially in the hectic industrial maintenance team environment. In such cases, ATS has the expertise and resources to help. As a leading provider of technical support services, we help businesses improve maintenance operations effectiveness, thereby lowering costs and strengthening their competitive position. Contact us to learn more.