Writing for Engineers
FMEA—Failure Mode and Effect Analysis
This exercise will help you:
- Learn about FMEA and how it is used by engineers
- Practice constructing matrices/tables in documents
- Compose engineering documents that integrate text, graphics and tables/matrices
- Prepare for Assignment #D
Quick Links to FAQs:
- What is FMEA?
- How will we use FMEA for our products?
- How do I compose a FMEA matrix?
- What are the different FMEA categories?
- Example: a pizza FMEA
- What are the FMEA severity classifications?
- What else is covered in a FMEA?
- What should we cover in our subassembly FMEA?
- FMEA links
What is FMEA?
FMEA is an acronym for Failure Mode and Effect Analysis (pronounced fa-MEE-uh, or FEE-muh). Since at least the mid-20th century, FMEA has been used to design and improve everything from wrist watches to satellites, helping to minimize failures while maximizing quality and reliability. FMEA is employed for new designs, and whenever modifications to existing designs are made on complex systems. They are often consulted whenever there is a catastrophic failure involving the loss of life or property.
How will we use FMEA for our devices?
You will construct a FMEA matrix for your device to assess its potential for failure—any kind of failure. You may build your FMEA matrix using the MS Word "Table" command (or an equivalent feature in another word processing program), or format as an Excel spreadsheet. While building your FMEA matrix, try to think of all the different ways your device could fail. (You may already sees signs of failure in your device.) Possible sources of failure include, for example:
- Mechanical failures due to "wear and tear," metal fatique, faulty castings or moldings, faulty welds, overloading, etc.
- Electrical failures due to loss of power, short-circuits, battery failure, power or signal noise, crosstalk, fan-out, etc.
- Systems failures and/or cascading failures due to associated failures from connected, neighboring components and subsystems. For example, a faulty brake pedal/shifter interlock switch on an automobile (part of the brake subassembly) may cause the ignition/starter subassembly to fail; on some switches, this will also cause a brake light failure when the brake is applied (part of the lighting and signaling subassembly).
When you are done constructing your FMEA matrix, you will exchange it with your fellow team members in class on the due date.
How do I compose a FMEA matrix?
Technically speaking, a "matrix" differs from a "table" in that the boxes, or "cells," may contain almost anything. (That is, with matrices, you can place numbers, text, photographic images, footnotes, hyperlinks—really, anything—within the boxes.) Tables typically contain text headings, with numbers everywhere else. This is, however, a mere technicality; when working with word processing programs like MS Word, you will use the "Table" or "Insert Table" commands to build both tables and matrices. We'll cover this in class.
FMEA matrices use a fairly standard format, although details vary widely, depending upon the type of device or system you are analyzing. At the bottom of this page, you will find useful links to help you learn more about FMEA matrices.
What are the different FMEA categories?
Four categories are commonly included within most FMEAs; there are others as well, depending upon the complexity of the system, how it functions, and who the customer is. Each category has its own column in the FMEA matrix, for example:
Failure mode— description of the failure
Failure effect— how the failure impacts the system
Severity classification— the severity of the failure's impact on the system, ranked from "10" (most severe) to "1" (least severe).
Corrective measures— what to do if/when the failure occurs to correct it
The first task is to carefully define what the system is and how it should function when everything is working properly. (Because if you don't know what the system is supposed to do, you can't really know when it's broken or how to fix it.) Depending upon the number of categories, failure modes and effects, there could be many columns and rows in the FMEA matrix. But for now, we'll stick to the four above.
Example: a pizza FMEA
Let's try a simple example: a pizza. If we were to compose an ultra-simple FMEA matrix for a pizza, we would first have to define what "the system" is when everything works properly. Let's try this: a pizza "system" is a tasty meal that satisfies the customer's hunger. Our pizza FMEA might look something like this:
Pizza System FMEA
Failure mode |
Failure Effect |
Severity classification |
Corrective measures |
| Completely burned | Inedible pizza | 10 |
Reduce oven temperature |
| Reduce cooking time | |||
| Apply oil to pizza pan | |||
| Insufficient cheese | Unsatisfying pizza | 5 |
Add more cheese |
| Cold pizza | Unappetizing pizza | 8 |
Deliver pizza faster |
| Anchovy topping | Unappetizing pizza | 6 |
Remove anchovies |
| Tomato topping | Unappetizing pizza | 3 |
Remove tomatos |
For our pizza FMEA example, notice how important it is to define the system first. For example, almost everyone would agree that a completely burned pizza is inedible—it is a total system failure, no matter which toppings are used. Thus, we have assigned the failure "completely burned" with a severity classification of 10. This part of our FMEA matrix is okay for now; however, the rest of the matrix is vague because we have not defined our system precisely enough. For example, we might imagine a partially burned pizza crust that still allows us to eat some of the pizza by picking off the burned portions. Thus, we could potentially define another failure that is labeled "partially burned" or "scorched" and assign it a severity classification of, say, 7 or 8. If we maintain high manufacturing standards as pizza chefs, however, we may decide in advance that any degree of burning is unacceptable to our "system"—that is, all degrees of burning are given a severity of 10—and if we burn a pizza, we'll either make another one (after applying one or more corrective measures) or we will return our customer's money. So, as you can see, our definitions of failure and the manner in which we define the system are inextricably linked.
When writing FMEAs, the precision with which we define our terms is critical. For example, in the FMEA example above there are actually three separate burn failures, not one, even though the FMEA groups all three together under the umbrella term, "completely burned." The first burn failure mode is corrected with a reduction in oven temperature (the "corrective measure"), but could be caused by any number of sub-failures: e.g., operator error, a faulty thermostat, a faulty temperature readout, or clogged gas burners. Ideally, each of these will receive its own row in the matrix and be analyzed as a separate failure mode. We could even add another column alongside, labeled "occurrence" to reflect the fact that operator error is a more common cause of excessive oven tempertures than a faulty temperature readout.
Notice that our failure effects also vary according to how the system is defined. Not having enough cheese on a pizza ("insufficient cheese") may partially fail to satisfy one customer's appetite; however, for another customer this may be less important (in fact, some customers actually request less cheese on their pizzas). Similarly, different temperatures and toppings appeal to different customers in different ways. For some customers, anchovies are completely unacceptable (a severity of 10) and must be withheld or removed from the pizza; for other customers, anchovies are a favorite and would not even appear in the FMEA matrix. To avoid arbitrariness, we must define exactly what our customers need and demand from the system, and our success (in terms of engineering) hinges very heavily upon how well we know our customer. This is also why many engineers find themselves frequently interacting with the sales-marketing department of their company, often in written form.
If we redefine our pizza system to this—A pizza system is a tasty meal that satisfies the customer's hunger and meets his/her expectations based upon what was ordered—we can compose a slightly more refined FMEA matrix, replacing topping and temperature failures, for example, with order failures (i.e., based upon whether the customer received exactly what they ordered). In the real world, pizza chains commonly list pizzas with specific names, lists of toppings, type of crusts, sizes, etc. Most chains uniformly select, purchase, transport and weigh ingredients so that a pizza at one location is made consistently the same way at all other locations. Often, the ovens are the same make, model and size, and they operate at the same temperatures. Many restaurants even include photographs on their menus to remove as much arbitrariness as possible, both for customers and the workers who prepare the food. In the engineering world, specification sheets, contracts, and blueprints serve the same purpose. They are carefully written to ensure that customers and manufacturers agree on the same definitions and expectations. These documents help to define the system and are a regular part of an engineer's responsibilities.
What are the FMEA severity classifications?
In our pizza example above, we assigned severity classifications subjectively. In the engineering world, severity classifications are more objectively defined:
1: no effect
2: very minor failure detectable by discriminating customers (<25%)
3: product functions diminished; minor failure detectable by average customers (50%)
4: product operates for non-primary function; failure detectable by most customers (>75%)
5: product operates at reduced performance; customer is not fully satisfied
6: loss of a non-primary function and reduced primary function; customer worried
7: product barely operates; customer noticably unsatisfied
8: product does not operate for primary function; all customers notice failure
9: extremely high probability of failure with potential for emergency conditions
10: complete system failure requiring emergency measures
Another commonly used severity index (MIL-SRD-882) rates failure on a different scale:
I: catastrophic
II:
critical
III: marginal
IV: negligible
What else is covered in a FMEA?
A FMEA can include far more than what we will use here in our subassembly studies. For example, we might also include the following in our FMEA:
- Failure rate
- Failure detection
- Failure mode
- Likelihood of occurence
- Potential causes
- Mechanism of failure
- Occurrence
- Criticality
The number and degree of criteria are determined by the complexity of the system and by the customer's requirements. A toy may have several failure scenarios for just one subassembly; a government-contracted satellite may have hundreds.
What should we cover in our subassembly FMEA?
For our class, we will construct ultra-simple FMEAs (for now) of our subassemblies, using the pizza example above as a template. That is, your FMEA should cover:
- Failure mode (column 1)
- Failure effect (column 2)
- Severity classification (column 3)
- Corrective measures (column 4)
For the severity classification, use the 1–10 scale defined above.
Your FMEA should include at least ten (10) failure modes. Each failure mode should have at least one (1), and possible more than one, corrective measure.
FMEA Links
National Aeronautics and Space Administration (NASA)
