# Root Cause Analysis Methodologies

Decision criteria, templates, and implementation guidance for RCA techniques.

---

## Table of Contents

- [Method Selection Matrix](#method-selection-matrix)
- [5 Why Analysis](#5-why-analysis)
- [Fishbone Diagram](#fishbone-diagram)
- [Fault Tree Analysis](#fault-tree-analysis)
- [Human Factors Analysis](#human-factors-analysis)
- [Failure Mode and Effects Analysis](#failure-mode-and-effects-analysis)
- [Selecting the Right Method](#selecting-the-right-method)

---

## Method Selection Matrix

### When to Use Each Method

| Method | Use When | Problem Type | Team Size | Time Required |
|--------|----------|--------------|-----------|---------------|
| 5 Why | Single-cause issues, process deviations | Linear causation | 1-3 people | 30-60 min |
| Fishbone | Multi-factor problems, 3-6 contributing factors | Complex, systemic | 3-8 people | 2-4 hours |
| Fault Tree | Safety-critical failures, reliability issues | System failures | 2-5 people | 4-8 hours |
| Human Factors | Procedure/training-related issues | Human error | 3-6 people | 2-4 hours |
| FMEA | Systematic risk assessment, design review | Potential failures | 4-10 people | 8-16 hours |

### Quick Selection Decision Tree

```
Is the issue safety-critical or involves system reliability?
├── Yes → Use FAULT TREE ANALYSIS
└── No → Is human error the suspected primary cause?
    ├── Yes → Use HUMAN FACTORS ANALYSIS
    └── No → How many potential contributing factors?
        ├── 1-2 factors → Use 5 WHY ANALYSIS
        ├── 3-6 factors → Use FISHBONE DIAGRAM
        └── Unknown/Many → Use FMEA (proactive) or Fishbone (reactive)
```

---

## 5 Why Analysis

### Overview

Simple, iterative technique asking "why" repeatedly (typically 5 times) to drill from symptoms to root cause.

### When to Use

- Single-cause issues with linear causation
- Process deviations with clear failure point
- Quick investigations requiring rapid resolution
- Problems where symptoms clearly link to cause

### When NOT to Use

- Complex multi-factor problems
- Safety-critical incidents requiring comprehensive analysis
- Issues with multiple interacting causes
- When systemic factors are suspected

### 5 Why Template

```
PROBLEM STATEMENT:
[Clear, specific description of what happened, when, where, and impact]

WHY 1: Why did [problem] occur?
BECAUSE: [First-level cause]
EVIDENCE: [Data/observation supporting this cause]

WHY 2: Why did [first-level cause] occur?
BECAUSE: [Second-level cause]
EVIDENCE: [Data/observation supporting this cause]

WHY 3: Why did [second-level cause] occur?
BECAUSE: [Third-level cause]
EVIDENCE: [Data/observation supporting this cause]

WHY 4: Why did [third-level cause] occur?
BECAUSE: [Fourth-level cause]
EVIDENCE: [Data/observation supporting this cause]

WHY 5: Why did [fourth-level cause] occur?
BECAUSE: [Root cause - typically systemic or management system failure]
EVIDENCE: [Data/observation supporting this cause]

ROOT CAUSE VALIDATION:
- [ ] Can the root cause be verified with evidence?
- [ ] If root cause is eliminated, would problem recur?
- [ ] Is the root cause within organizational control?
- [ ] Does the root cause explain all symptoms?
```

### Example: Calibration Overdue

```
PROBLEM: pH meter (EQ-042) found 2 months overdue for calibration

WHY 1: Why was calibration overdue?
BECAUSE: The equipment was not on the calibration schedule
EVIDENCE: Calibration schedule reviewed, EQ-042 not listed

WHY 2: Why was it not on the calibration schedule?
BECAUSE: The schedule was not updated when equipment was purchased
EVIDENCE: Purchase date 2023-06-15, schedule dated 2023-01-01

WHY 3: Why was the schedule not updated?
BECAUSE: No process requires schedule update at equipment purchase
EVIDENCE: Equipment procedure SOP-EQ-001 reviewed, no such requirement

WHY 4: Why is there no requirement to update the schedule?
BECAUSE: The procedure was written before equipment tracking was centralized
EVIDENCE: SOP-EQ-001 last revised 2019, equipment system implemented 2021

WHY 5: Why has the procedure not been updated?
BECAUSE: Periodic procedure review did not assess compatibility with new systems
EVIDENCE: No documented review of SOP-EQ-001 against new equipment system

ROOT CAUSE: Procedure review process does not assess compatibility
with organizational systems implemented after original procedure creation
```

---

## Fishbone Diagram

### Overview

Also called Ishikawa or cause-and-effect diagram. Organizes potential causes into categories branching from the problem statement.

### Standard Categories (6M)

| Category | Focus Areas | Typical Causes |
|----------|-------------|----------------|
| **Man** (People) | Training, competency, workload | Skill gaps, fatigue, communication |
| **Machine** (Equipment) | Calibration, maintenance, age | Wear, malfunction, inadequate capacity |
| **Method** (Process) | Procedures, work instructions | Unclear steps, missing controls |
| **Material** | Specifications, suppliers, storage | Out-of-spec, degradation, contamination |
| **Measurement** | Calibration, methods, interpretation | Instrument error, wrong method |
| **Mother Nature** (Environment) | Temperature, humidity, cleanliness | Environmental excursions |

### Fishbone Template

```
PROBLEM STATEMENT: [Effect being investigated]

                    ┌── Man ────────────────┐
                    │  ├─ [Cause 1]         │
                    │  ├─ [Cause 2]         │
                    │  └─ [Cause 3]         │
                    │                       │
┌── Machine ────────┤                       ├── Method ──────────┐
│  ├─ [Cause 1]     │                       │  ├─ [Cause 1]      │
│  ├─ [Cause 2]     │     PROBLEM           │  ├─ [Cause 2]      │
│  └─ [Cause 3]     ├───────────────────────┤  └─ [Cause 3]      │
│                   │                       │                    │
├── Material ───────┤                       ├── Measurement ─────┤
│  ├─ [Cause 1]     │                       │  ├─ [Cause 1]      │
│  ├─ [Cause 2]     │                       │  ├─ [Cause 2]      │
│  └─ [Cause 3]     │                       │  └─ [Cause 3]      │
                    │                       │
                    └── Environment ────────┘
                       ├─ [Cause 1]
                       ├─ [Cause 2]
                       └─ [Cause 3]

CAUSE PRIORITIZATION:
| Cause | Category | Likelihood | Evidence | Priority |
|-------|----------|------------|----------|----------|
| [Cause A] | Method | High | [Evidence] | 1 |
| [Cause B] | Man | Medium | [Evidence] | 2 |

ROOT CAUSES IDENTIFIED:
1. [Primary root cause with supporting evidence]
2. [Contributing cause with supporting evidence]
```

### Facilitation Guidelines

1. Assemble cross-functional team (3-8 people)
2. Define problem statement clearly before starting
3. Brainstorm causes without judgment first
4. Organize into categories after brainstorming
5. Drill down on each major cause (sub-causes)
6. Prioritize based on evidence and likelihood
7. Validate top causes with data

---

## Fault Tree Analysis

### Overview

Top-down, deductive analysis starting with undesired event and systematically identifying all potential causes using Boolean logic (AND/OR gates).

### When to Use

- Safety-critical system failures
- Complex system reliability analysis
- Events with multiple failure pathways
- Regulatory-required investigations (FDA, MDR)

### FTA Symbols

| Symbol | Name | Meaning |
|--------|------|---------|
| Rectangle | Top Event / Intermediate Event | Undesired event or intermediate fault |
| Circle | Basic Event | Primary fault requiring no further analysis |
| Diamond | Undeveloped Event | Event not fully analyzed (data limitation) |
| AND Gate | Requires all inputs | All child events must occur for parent |
| OR Gate | Requires any input | Any child event causes parent |

### FTA Template

```
TOP EVENT: [Undesired event under investigation]

LEVEL 1 (Immediate Causes):
[Top Event]
    │
    └── OR GATE ──┬── [Cause 1.1]
                  ├── [Cause 1.2]
                  └── [Cause 1.3]

LEVEL 2 (Contributing Causes):
[Cause 1.1]
    │
    └── AND GATE ──┬── [Cause 2.1]
                   └── [Cause 2.2]

MINIMAL CUT SETS:
(Combinations of basic events that cause top event)
1. {Basic Event A, Basic Event B}  ← Both required (AND)
2. {Basic Event C}  ← Single point failure (OR)
3. {Basic Event D, Basic Event E}  ← Both required (AND)

CRITICAL PATH ANALYSIS:
Most likely failure pathway: [Description]
Single points of failure: [List]

RECOMMENDATIONS:
- Address single points of failure first
- Add redundancy where AND gates show vulnerability
- Prioritize controls on highest probability paths
```

### Cut Set Analysis

Minimal cut sets identify the smallest combination of basic events causing the top event:

- **Single-element cut sets**: Single points of failure (highest priority)
- **Two-element cut sets**: Dual failure scenarios
- **Probability calculation**: P(Top Event) = Union of P(Cut Sets)

---

## Human Factors Analysis

### Overview

Systematic analysis of human error focusing on cognitive, physical, and organizational factors contributing to performance failures.

### HFACS Categories

Human Factors Analysis and Classification System:

| Level | Category | Examples |
|-------|----------|----------|
| **Unsafe Acts** | Errors, violations | Skill-based, decision, perceptual errors |
| **Preconditions** | Conditions for unsafe acts | Fatigue, mental state, CRM, physical environment |
| **Unsafe Supervision** | Supervisory failures | Inadequate supervision, planned inappropriate ops |
| **Organizational Influences** | Organizational failures | Resource management, organizational climate |

### Human Error Types

| Type | Description | Example | Mitigation |
|------|-------------|---------|------------|
| Slip | Execution error in routine task | Wrong button pressed | Error-proofing, forcing functions |
| Lapse | Memory failure | Forgot step in procedure | Checklists, reminders |
| Mistake | Planning/decision error | Wrong procedure selected | Training, decision aids |
| Violation | Intentional deviation | Skipped step to save time | Culture change, supervision |

### Human Factors Investigation Template

```
INCIDENT DESCRIPTION:
[What happened, who was involved, when, where]

UNSAFE ACTS ANALYSIS:
Type of Error: [ ] Slip  [ ] Lapse  [ ] Mistake  [ ] Violation
Description: [Specific action or inaction]
Task Being Performed: [Activity at time of error]
Experience Level: [Novice/Intermediate/Expert]

PRECONDITIONS FOR UNSAFE ACTS:
Cognitive Factors:
- [ ] Task complexity exceeded capability
- [ ] Time pressure
- [ ] Distraction/interruption
- [ ] Mental fatigue

Physical Factors:
- [ ] Physical fatigue
- [ ] Inadequate lighting
- [ ] Noise interference
- [ ] Workspace ergonomics

Team Factors:
- [ ] Communication breakdown
- [ ] Coordination failure
- [ ] Inadequate leadership

SUPERVISORY FACTORS:
- [ ] Inadequate supervision
- [ ] Failed to correct known problem
- [ ] Inappropriate staffing
- [ ] Authorized unnecessary risk

ORGANIZATIONAL FACTORS:
- [ ] Resource management deficiency
- [ ] Organizational process issue
- [ ] Organizational culture/climate

ROOT CAUSE(S):
[Human factors root causes identified]

CORRECTIVE ACTIONS:
| Action | Target Factor | Priority |
|--------|---------------|----------|
| [Action 1] | [Factor addressed] | High |
| [Action 2] | [Factor addressed] | Medium |
```

---

## Failure Mode and Effects Analysis

### Overview

Proactive, systematic technique identifying potential failure modes, their causes, and effects before failures occur.

### FMEA Types

| Type | Application | Scope |
|------|-------------|-------|
| Design FMEA (DFMEA) | Product design | Component and system design failures |
| Process FMEA (PFMEA) | Manufacturing process | Process step failures |
| System FMEA | System-level analysis | System interaction failures |

### Risk Priority Number (RPN)

RPN = Severity (S) × Occurrence (O) × Detection (D)

**Severity Scale (1-10):**

| Rating | Effect | Criteria |
|--------|--------|----------|
| 10 | Hazardous | Failure affects safe operation, no warning |
| 8-9 | Very High | Primary function lost, high impact |
| 6-7 | High | Performance degraded, customer dissatisfied |
| 4-5 | Moderate | Some performance loss, moderate impact |
| 2-3 | Low | Minor effect, slight inconvenience |
| 1 | None | No discernible effect |

**Occurrence Scale (1-10):**

| Rating | Likelihood | Failure Rate |
|--------|------------|--------------|
| 10 | Very High | >1 in 10 |
| 7-9 | High | 1 in 20 - 1 in 100 |
| 4-6 | Moderate | 1 in 400 - 1 in 2,000 |
| 2-3 | Low | 1 in 15,000 - 1 in 150,000 |
| 1 | Remote | <1 in 1,500,000 |

**Detection Scale (1-10):**

| Rating | Detection | Criteria |
|--------|-----------|----------|
| 10 | Absolute Uncertainty | No inspection/control, defect will reach customer |
| 7-9 | Very Remote to Remote | Controls unlikely to detect |
| 4-6 | Moderate | Controls may detect |
| 2-3 | High | Controls likely to detect |
| 1 | Almost Certain | Controls will almost certainly detect |

### FMEA Template

```
PROCESS/PRODUCT: [Name]
FMEA TEAM: [Members]
DATE: [Date]

| Item/Step | Failure Mode | Effect | S | Cause | O | Controls | D | RPN | Action |
|-----------|--------------|--------|---|-------|---|----------|---|-----|--------|
| [Item 1] | [How it fails] | [Impact] | 8 | [Why] | 4 | [Current] | 6 | 192 | [Action] |
| [Item 2] | [How it fails] | [Impact] | 6 | [Why] | 3 | [Current] | 4 | 72 | [Action] |

RPN THRESHOLD: Actions required for RPN > [threshold]
HIGH SEVERITY RULE: Actions required for S >= 9 regardless of RPN

ACTION PRIORITIZATION:
1. Address all items with S >= 9 first
2. Address items with highest RPN
3. Focus on reducing Occurrence (prevention)
4. Then improve Detection (inspection)
```

---

## Selecting the Right Method

### Decision Flowchart

```
START: Investigation Required
    │
    ├── Is this a proactive assessment (no failure yet)?
    │   └── Yes → Use FMEA
    │
    ├── Is the issue safety-critical?
    │   └── Yes → Use FAULT TREE ANALYSIS
    │
    ├── Is human error the primary concern?
    │   └── Yes → Use HUMAN FACTORS ANALYSIS
    │
    ├── Are there multiple contributing factors (3+)?
    │   ├── Yes → Use FISHBONE DIAGRAM
    │   └── No → Use 5 WHY ANALYSIS
    │
    └── Uncertain? → Start with 5 WHY, escalate to FISHBONE if needed
```

### Hybrid Approach

For complex investigations, combine methods:

1. **Initial screening**: 5 Why for quick cause identification
2. **Detailed analysis**: Fishbone to explore all categories
3. **Validation**: Fault Tree for critical failure paths
4. **Systemic factors**: Human Factors for people-related causes
5. **Prevention**: FMEA for future risk mitigation

### Documentation Requirements

| Method | Required Outputs | Retention |
|--------|------------------|-----------|
| 5 Why | Completed template with evidence | CAPA record |
| Fishbone | Diagram + prioritized causes | CAPA record |
| Fault Tree | FTA diagram + cut set analysis | DHF/CAPA record |
| Human Factors | HFACS analysis + actions | CAPA record |
| FMEA | FMEA worksheet + action tracking | Design file |
