A technical postmortem is a retrospective analysis of events that resulted in a technical failure.
The purpose of a technical postmortem is:
- Find out what went wrong and why
- Identify trouble areas
- Determine what can be done to prevent future failures
- Create best practices for your business
- Inform process improvements, mitigate future risks, and promote iterative best practices.
This outline is not meant to be comprehensive but is meant to serve as a starting point for your technical postmortems. These questions generate discussion about what went well, what the team struggled with during the failure, and what the team would do differently moving forward.
4 questions to ask during a technical postmortem
1. What happened?
You can’t analyze what you don’t understand so establishing a clear understanding of what went wrong is crucial.
2. Why did it happen?
Identify the major events that led to the failure and try isolating the root causes for those failures. Determine if they are the underlying cause of the failure or did they initiate a process that leads to the technical failure. Low hanging fruit include defects in design, process or poor maintenance practices. In addition to looking strictly at technical causes of failure, also examine the underlying organizational, management, and team environment. Be aware some team members may ignore warning signs of impending failure due to the organizational culture, time crunches, and budget pressure.
3. How did we respond and recover?
How your team responds to failure can determine how quickly you identify the root cause and fix it. A major technical fail can have a direct impact on shareholder value, revenues, market share and brand equity so a quick recovery is paramount. A useful technical postmortem requires a reasonable level of honesty, insight, and cooperation from the organization. The outcome of the postmortem should be to recognize what worked, and fix the processes that didn’t. Remember, the idea is to learn from your successes and failures, not just to document them.
4. How can we prevent similar unexpected issues from occurring again?
Unexpected technical issues do arise in mission-critical or complex hardware systems. However, the key to prevention is technical planning to prevent problems from affecting the entire system. Each of the failures uncovered in step 2 represents a risk going forward, so schedule regular inspections or system checks in your maintenance management software. When a risk is detected, certain actions should be triggered immediately to prevent similar failures. Planning must also consider the business process and management responses the team initiates when a failure occurs. A complete postmortem addresses both technical and management issues.
Sadly, technical postmortems have a habit of turning into a blame game. A bad postmortem can create dissension and institutionalize mistakes. If you want honest postmortems, management has to develop a reputation for listening openly to input and not punishing people for being honest. A well-run postmortem can help a maintenance team create a culture of continuous improvement.
- A solid postmortem can help your organization become more effective by learning from mistakes and focusing on what worked best, but it’s up to you to structure the meeting to get the most out of it.
- Ensure your technical postmortem is successful by carefully preparing in advance, analyzing the failure systematically, producing actionable findings, and actively sharing the results.
- Don’t let memories fade by scheduling the postmortem too long after the end of the project; a technical postmortem should occur within 1-2 weeks of the technical failure.
- Make sure to store your postmortems in the asset record in your CMMS so they can be easily found in the future, to prevent similar failures going forward.