Getting started with major incident and problem management

This getting started article will help Technicians and Users who have access to a Ticketing Application in TDNext to create, review, update and report on tickets that have a classification of Major Incident or Problem. The user must be given access to the appropriate Ticketing Application on their User Record and be given an Technician Security Role in that Ticketing Application.

Overview

Major incident management is a specialized form of incident management for large, production-down issues. Incident management is intended to be a very fast process, helping users get back up and running – with workarounds if need be – to address break/fix issues and is the most recently added classification in the Information Technology Infrastructure Library (ITIL).

The problem management process of the Information Technology Infrastructure Library (ITIL) identifies and investigates the root cause issues responsible for incidents. Problem management is a slow and thorough process that addresses incidents at their source to prevent them from recurring.

ITIL defines both reactive and proactive problem management. Proactive problem management means looking for problems even if they have not caused any incidents yet. This article addresses mainly reactive problem management, which is the more typical form of problem management. If you are interested in proactive problem management, see also ITIL’s guidance on continual service improvement.

Where to Find This

This feature appears in the TDNext interface as part of a Ticket Application.

The Ticket Application in TDNext is where Technicians and Managers can create Major Incidents and Problems as well as review, sort and filter active Tickets based on the Major Incident and Problem Classifications.

Navigate to a Ticket Application following these paths:

  • TDNext
    • [Ticketing Application]
    • Left Panel -> Tickets
      • Major Incidents (if Active)
        • Will show all (New, Open, In Process) Tickets of Classification Major Incidents
      • Problems
        • Will show all (New, Open, In Process) Tickets of Classification Problem
      • The blue +New button will allow Technicians to manually create a new Ticket leveraging a Major Incident or Problem Form

Problem Management Activities

The following are steps that can be taken as part of the resolution process:

  • Problem identification or logging – Identifying that there is a root cause that could be investigating and logging the problem. In incident management, this can mean creating problem records or relating incidents to existing problems.
  • Problem selection – Determining which problems are worth investigating. In this step, IT determines whether the effort of investigating the problem is worth the potential value. Please note that ITIL does not call out this activity.
  • Problem investigation – Pulling together a team to investigate the problem and identify the error or issue. This can be very time-intensive depending on the problem.
  • Defining workarounds – At any point, problem workarounds may be identified. These problems help incidents be addressed more quickly, even if the root cause of the problem is still unknown.
  • Raising a change – Once an error is identified, a change request can be raised to address the root cause issue. Or, before this is done, a change can be raised to turn off the service or component entirely.
  • Problem closure – In typical problem management, the problem record stays open as long as it is still relevant to the organization. The problem record would only be closed if the root cause issue were resolved or if the relevant component(s) were removed or replaced.

In TeamDynamix, problems are classified as a ticket problem. Incidents can then be related to the parent problem record. If a change is raised for a problem record, the change record can be set to be the parent of the problem record.

Gotchas & Pitfalls

These factors may affect how quickly you can implement this module:

  • Automated systems sometimes identify major incidents. If TeamDynamix receives these notifications, then some level of integration may be required.
  • Some organizations believe they want to implement problem management, but do not have the staff necessary to manage the problem management process.
  • If there is no owner for the problem management process, this module can take longer to implement.

Examples

  • Printing is down for campus.
  • Three users have had issues with printing to a network printer. In each case, the issue has resolved itself. Upon further investigation, it is discovered that, due to a bug in the printer firmware, the network printer stops printing whenever it receives a fax, and only resumes printing when it’s rebooted.
  • Long-running network connections to a server tend to be dropped. This issue is occasionally reported in incident tickets that are immediately closed because the user can close and reopen their session. Upon further investigation, it is discovered that the transmission control protocol (TCP) keepalive timeout on the firewall is set to a shorter value than the value on the server.
  • Reports that used to run quickly now run slowly. Incidents are recorded but closed because the reports eventually return data. Upon further investigation,  it is discovered that a department added thousands of values to a key validation table that’s used in many structure query language (SQL) joins.

Major Incident and Problem Management Frequently Asked Questions (FAQs)

Q. What is the difference between major incident and problem management?

A. Problem management is for root-cause investigations, where root-cause investigations may take a lot of time. Major incident management is for large production-down issues (e.g. a campus-wide email outage) and is focused on fast recovery from a major incident. Major incidents or outages may be related to problems or the root causes or reasons for the outage.

 

Q. How do I get started with problem management?

A. Problem management in the tool is relatively straightforward; the real work is in developing a process of problem management. To begin, help your organization to identify significant problems, such as whether you have a particular service that is generating many incidents. Then, consider role modeling problem management by using problem management to address these specific service issues.

 

Q. Should problem records use workflows?

A. If you have a formal problem management process, it can be helpful to use a workflow to assist with reviewing and investigating problems. Often the key step in such a workflow is authorization of the problem investigation. Problems can require cross-functional skills from several teams, and people often want a way to approve the time required to investigate the problem.

 

Q. How do I communicate active problems?

A. You can build a desktop module that reports on active problems and embed that module in people's desktops. Or, you can build a service level agreement that attaches through automation rules, and have that service level agreement use notification rules, such as sending a notification  at 1% of the response time to a listserv.

 

Q. How do I distinguish between major incidents and problems?

A. The major incident classification should be used for large production outages. The problem classification should be used for problems or less urgent root cause investigations.

 

Q. How do people get time to practice problem management?

A. ITIL does not call it out, but it can be helpful to have a problem investigation approval step at the beginning of your process. Problem management often calls for senior people from many different teams; the same people who are in the critical path for important projects and initiatives. If there is an investigation approval step, this helps management say yes to problems that really need to be solved, and say no to ones that are less important, making technical people stop worrying about perfecting systems that are already working.

 

Q. How do problem records and change records work together?

A. Once you've identified the change required to address a problem, you can create parent and create a change request. This will then be linked to the problem record and you can cascade updates from the change to the problem record as needed. We highly recommend that you process any changes as a result of a problem as change requests, rather than use the problem record to track changes.

Details

Article ID: 18249
Created
Thu 11/10/16 12:45 AM
Modified
Fri 9/8/23 9:52 AM