Postmortem — incident examination process

Ofir Elarat
3 min readSep 1, 2020

Hey, in this article I would like to present to you an unusual topic, which you may not recognize from the High Tech industry.

Postmortem usually refers to after death examination in order to determine the cause of death.

In a similar way, the postmortem term exists also in the high tech world as after incident examination.

This process born from the risk management topic, which present the idea of handling and planning projects risks.

Incidents such as bugs, downtime of the application that affects the customer’s experience, something that went wrong in project development flow, or everything we can learn from and improve.

So what is exactly postmortem? A postmortem is a written record of an incident.

That include:

· The incident impact

· The actions are taken to migrate or resolve the incident

· The incident root cause

· Follow up actions taken to prevent the incident from happening again

The goals of a postmortem are:

· Understanding all contributing root causes

· Document the incident for future reference and pattern discovery

· Enact effective preventative actions to reduce the likelihood or impact of recurrence

When:

· We carry out postmortem for every incident (bug, issue, or downtime) that affected the customers. Otherwise, it’s optional

· During or shortly after resolving the issue, the postmortem owner create the documented postmortem

5 Whys:

We will use the 5 whys technique to understand what the root cause of the incident is.

Five whys (or 5 whys) is an iterative interrogative technique used to explore the cause-and-effect relationships underlying a particular problem. The primary goal of the technique is to determine the root cause of a defect or problem by repeating the question “Why?” Each answer forms the basis of the next question. The “five” in the name derives from an anecdotal observation on the number of iterations needed to resolve the problem.

Blameless:

When things go wrong, looking for someone to blame is a natural human tendency. It’s in our best interests to avoid this, though, so when you’re running a postmortem you need consciously overcome it.

We assume good intentions on the part of our staff and never blame people for faults.

Some fields we would like to see in our postmortem:

The postmortem needs to honestly and objectively examine the circumstances that led to the fault so we can find the true root cause(s) and mitigate them.

Pre-mortem:

A pre-mortem is a managerial strategy in which the team imagines that the project failed or incident happened, and then work backward to determine what potentially could lead to the incident.

This technique comes to reduce the chances of futures failure and practice the process of handling incidents.

The pre-mortem analysis seeks to identify threats and weaknesses via the hypothetical presumption of near-future failure.

Conclusion:

In a nutshell, a postmortem is an important process that contributing to migrate future risk. Force us to discover the root cause of the incident and take action to reduce future impact.

In the postmortem process, we also document all the incident details and related tasks, and this documentation can help us for a future debriefing of the incident.

Hope you will see this process useful in your organization as well, Thanks!

--

--

Ofir Elarat

Experienced software engineer, eager to learn more technologies and become better developer.