Thinking Like an Engineer in a Healthcare World: Using Fault Tree Analysis to Understand Complex Failures
When a serious patient safety event occurs, healthcare organizations often search for “the” root cause. Investigators may identify a communication breakdown, a missed assessment, a policy deviation, or a documentation error and conclude that they have found the source of the problem.
Yet many adverse events do not occur because of a single failure. Instead, they emerge when multiple conditions, weaknesses, and human actions align at the same time. This reality is one reason why healthcare organizations continue to struggle with complex safety problems despite conducting extensive root cause analyses (Agency for Healthcare Research and Quality [AHRQ], 2024; Kellogg et al., 2017).
Fault Tree Analysis (FTA) offers a different way of thinking. Originally developed in high-risk engineering industries such as aerospace, nuclear power, and manufacturing, FTA helps investigators understand how combinations of failures interact to produce an unwanted outcome. Rather than asking, “What caused this event?” FTA asks, “What combination of conditions had to exist for this event to occur?” (McElroy et al., 2015).
For healthcare professionals, learning to think like an engineer can reveal why serious events are rarely caused by a single point failure.
What Is Fault Tree Analysis?
Fault Tree Analysis is a deductive method that starts with an undesirable event, known as the “top event,” and works backward to identify the contributing failures that could have led to it (McElroy et al., 2015).
For example, the top event might be:
- Wrong-site surgery
- Central line-associated bloodstream infection
- Medication overdose
- Patient fall with serious injury
Investigators then build a diagram showing all the possible contributing factors and how they relate to one another. The resulting “fault tree” visually maps the pathways through which the event could occur. Unlike many traditional healthcare investigation tools that focus on a linear chain of events, FTA emphasizes the interaction between failures across the system (Abecassis et al., 2014).
Understanding AND and OR Logic
The power of Fault Tree Analysis comes from its use of logical relationships, particularly AND gates and OR gates.
OR Logic: Any One Failure Can Cause the Problem
An OR gate means that any one of several failures can produce the event above it.
For example, a medication may be administered incorrectly because:
- The wrong drug was selected.
- The wrong dose was entered.
- The wrong patient was identified.
Any one of these failures could potentially lead to a medication error.
In engineering terms, OR gates identify vulnerabilities where a single failure can lead directly to harm. These represent weak points in the system because there is little redundancy protecting against failure (Abecassis et al., 2014).
AND Logic: Multiple Failures Must Occur Together
An AND gate means that several conditions must occur simultaneously before the event can happen. For example, a retained surgical item may require:
- A counting process failure,
- An interruption during the procedure,
- And a missed final verification.
None of these failures alone may be sufficient to cause the event. The harm occurs only when all of them happen together. AND gates represent the reality that many healthcare systems contain layers of protection. Harm occurs when multiple defenses fail at the same time (McElroy et al., 2015).
Why Healthcare Events Are Often Multi-Factorial
Healthcare investigations frequently identify a human error as the apparent cause of an event. A nurse administered the wrong medication. A physician overlooked a critical result. A technician failed to follow a procedure. However, Fault Tree Analysis encourages investigators to ask a deeper question:
Why was the error able to reach the patient? An engineer examining the same event would rarely stop at the final action. Instead, they would explore the combination of conditions that allowed the error to occur and remain undetected.
For example, a medication overdose may involve:
- Similar drug packaging,
- An electronic prescribing issue,
- Staff fatigue,
- Inadequate double-check processes,
- High workload,
- And ineffective clinical decision support.
The overdose occurs not because one person made a mistake, but because several system weaknesses existed simultaneously. This systems perspective aligns with broader patient safety research, which consistently demonstrates that adverse events arise from interactions among human, organizational, technical, and environmental factors rather than isolated mistakes (Driesen et al., 2022; Brook et al., 2015).
Fault Trees Reveal Hidden System Vulnerabilities
One of the greatest strengths of FTA is its ability to expose latent conditions that may otherwise remain invisible. In a study examining postoperative bloodstream infections, investigators used Fault Tree Analysis to identify more than 100 contributing faults. The analysis revealed numerous interacting failures that were not routinely captured by existing quality reporting systems. Many underlying contributors existed far below the surface event and involved issues such as protocol design, training, workload, and process reliability (McElroy et al., 2015).
Similarly, an FTA of wrong-site surgery demonstrated that many failures occurred before the patient ever entered the operating room. Scheduling errors, documentation issues, patient identification failures, and site-marking problems interacted in complex ways to create risk. The study showed that safety depends not on a single verification step but on the reliability of multiple interconnected processes (Abecassis et al., 2014).
These examples highlight an important lesson: the factors closest to the event are not always the factors most responsible for creating the conditions in which the event became possible.
Moving Beyond the Search for a Single Root Cause
The phrase “root cause” can sometimes unintentionally encourage investigators to search for one primary explanation. Yet many patient safety experts have questioned whether complex healthcare events can truly be reduced to a single root cause. Modern healthcare systems involve thousands of interacting processes, technologies, decisions, and environmental influences. Identifying one cause may oversimplify the reality of how harm occurs (Kellogg et al., 2017).
Fault Tree Analysis challenges this mindset by recognizing that:
- Multiple failures can contribute to the same event.
- Some failures are independent while others interact.
- Events often result from combinations of conditions.
- Improving one component may not eliminate the overall risk.
This perspective encourages organizations to strengthen entire systems rather than focusing narrowly on individual performance.
Practical Applications in Healthcare
Healthcare organizations can use Fault Tree Analysis to investigate:
- Serious safety events
- Healthcare-associated infections
- Diagnostic errors
- Medication incidents
- Surgical complications
- Equipment failures
- Delayed treatment events
FTA can also be used proactively to identify vulnerabilities before harm occurs. By mapping potential pathways to failure, teams can determine where additional safeguards, redundancies, or monitoring systems may be needed (Abecassis et al., 2014; McElroy et al., 2015). Perhaps most importantly, the method helps teams visualize how individual failures connect to broader system weaknesses.
The Real Lesson from Fault Tree Analysis
Fault Tree Analysis teaches investigators to think less like detectives searching for a culprit and more like engineers studying system reliability.
Instead of asking, “Who made the mistake?” the analysis asks:
- What conditions existed?
- Which safeguards failed?
- How did multiple weaknesses interact?
- What combinations of failures made the event possible?
This shift in thinking is particularly valuable in healthcare, where serious events rarely result from a single breakdown. More often, they emerge when several vulnerabilities align at the wrong moment. The most important insight from Fault Tree Analysis is that patient harm is usually not the result of one failure. It is the result of multiple conditions occurring together. Understanding those interactions allows organizations to move beyond blame, strengthen system resilience, and design safer care processes.
References
Abecassis, Z. A., McElroy, L. M., Patel, R. M., Khorzad, R., Carroll, C., & Mehrotra, S. (2014). Applying fault tree analysis to the prevention of wrong-site surgery. Journal of Surgical Research, 193(2), 627-632.
Agency for Healthcare Research and Quality. (2024). Root cause analysis. PSNet.
Brook, O. R., Kruskal, J. B., Eisenberg, R. L., & Larson, D. B. (2015). Root cause analysis: Learning from adverse safety events. RadioGraphics, 35(6), 1655-1667.
Driesen, B. E. J. M., Verboom, L., Weesie, Y. M., van der Velde, N., van der Schaaf, T. W., & Wagner, C. (2022). Root cause analysis using the Prevention and Recovery Information System for Monitoring and Analysis method in healthcare facilities: A systematic literature review. Journal of Patient Safety, 18(4), 342-350.
Kellogg, K. M., Hettinger, Z., Shah, M., Wears, R. L., Sellers, C. R., Squires, M., Fairbanks, R. J., & Huecker, M. R. (2017). Our current approach to root cause analysis: Is it contributing to our failure to improve patient safety? BMJ Quality & Safety, 26(5), 381-387.
McElroy, L. M., Khorzad, R., Rowe, T. A., Abecassis, Z. A., Apley, D. W., Barnard, C., & Holl, J. L. (2015). Fault tree analysis: Assessing the adequacy of reporting efforts to reduce postoperative bloodstream infection. American Journal of Medical Quality, 32(1), 79-85.