The Invisible N | Netenrich Blog

IT Ops Problem Classification with Resolution Intelligence

Written by Netenrich | Thu, Jun 24, 2021 @ 10:35 AM

See how our classification and context features power an outcomes-driven IT org for you. Netenrich’s Resolution Intelligence Cloud platform uses seminal analytics principles and the power of modern AI to deliver the outcomes of speed, scale, and resilience for your IT. Take the dive.

It used to be that IT’s clock would start ticking the second a user reported a problem. Now the clock runs perpetually, with an endless flood of alerts, logs, and other potential red flags flooding in from a dozen different monitoring solutions.

To solve for what is infamously called alert fatigue, IT has relied on alert and problem classification.

  • Alert classification answers questions like, “Is this really a problem or is it just noise?”

AIOps solutions go a long way in filtering out noise and letting what look like real problems through.

  • Problem classification qualifies these real problems into the type of issue, where it started, why it is happening, its impact to the business, and whether similar issues have been seen in the past.

Ticketing systems provide some rudimentary problem classification. The subject line of a ticket may indicate you have a site-to-site VPN issue but stop just there.

What has been missing so far is the coveted c-word—context. While alert and problem classification help rein in the madness of too many alerts from a typical monitoring stack of more than five tools, the number of alerts that filter through still pose daunting challenges to short-staffed, overworked IT teams. Like, “How do I know the problem won’t resolve itself?” or “If it won’t resolve itself, will it turn into a site-wide problem?”

The Netenrich Resolution Intelligence Cloud platform offers complete problem context – what, where, when, why, and how the issue got resolved in the past—with two more kinds of classification that we will break down today.

With a single source of truth, IT's efficiency and credibility improve at every stage.

Known classification types

Classification in IT analytics isn’t new. We even introduced the two knowns above, but they lack meaningful context. Let’s break them down to see how Resolution Intelligence Cloud adds to them to offer actionable insights early on in the alert-incident lifecycle.

Alert classification

The performance monitoring guys appeared on the Ops scene to help IT ensure a great user experience, but created some problems of their own, like disparate tools and solutions that led to a relentless onslaught of alerts. The AIOps guys followed to help IT manage the barrage of said alerts with the promise of finding the proverbial needles in the haystack—distinguishing the real issues that require IT’s attention from the deafening noise.

Now considered table stakes, AIOps solutions aggregate information at the event level, eliminating upwards of 95 percent of useless white noise, but, with few exceptions, their value stops there. Once the machines filter out noise, people (read: you) still need to filter real problems and figure out what to do about them.

Resolution Intelligence Cloud speeds this up by combining AIOps with analyst intelligence. Managed detection and response with 360⁰-visibility and zero blind-spot monitoring fast-track alert and problem classification. Machines filter out the noise and curate events, people and machines collaborate to validate, classify, and fully contextualize problems in a fraction of the time it takes now. 

And as time allows, IT can do a deeper dive into issues classified noise to see where it’s coming from and tune your system to steadily reduce it over time.

Problem classification

Resolution Intelligence Cloud adds actionable insight to understand, classify, and solve real issues as efficiently as possible. Some questions it readily answers for both humans and machines are:

  • Where is the issue?
  • Is it related to the network, a VPN, a public cloud service?
  • What caused it?
  • Did it originate with a user, an application, a service provider?

In managing data sources and correlating events, Resolution Intelligence Cloud collapses related alerts into a single, actionable ticket.

For example, your ThousandEyes system alerts you that customers are seeing 20-second delays and abandoning your site without completing purchases. At the same time, a solution from OpsRamp or Datadog may report connection pool errors. Now, you have two teams fired up chasing the same issue from different angles.

Resolution Intelligence Cloud spots the relationship and identifies the best team or individuals to work the problem. The platform provides full context including which problems to address in what order and who should take what actions.

Intelligent problem classification includes determining which items require action and which can be observed for the time being, a good segue into activity classification.

 
 

Netenrich's classification types FTW

The two-stage classification above helps noise reduction, but doesn’t quite complete the picture. Let’s dive into the two that do.

Activity classification

Here’s a few questions this type of classification answers starting with, “What has to be done?”

  • Can we wait to see if this issue resolves itself?
  • Do we need to act right now? If so,
    • Do we need to remediate the problem, and how?
    • Do we need to do something to prevent an issue from escalating into a full-blown problem?

In prescribing the right course of action, our platform runs impact analyses and contextualization algorithms to answer:

  • Is the issue worth time in investigation?
  • How serious is it? What is its impact on workflows, users?
  • Has it resolved itself in the past? How long do we wait?
  • Is there a proactive fix?

Say, one of your API servers is down as reported by your synthetic monitoring. From historical context, we know this server restarts automatically via Monit scripts, so our AIOps predicts the issue is highly likely to resolve itself in 4 to 5 minutes and does not escalate this to the First Response team.

 

However, if the problem occurs frequently, our AIOps then notes the anomaly in behavior and creates a ticket to look into the issue of more-than-usual restarts and fix it on the server.

For issues warranting rapid action, machine-human contextualization goes beyond defining the problem to include impact, timeline, history, and process.

Resolution Intelligence Cloud includes capturing and operationalizing information about activities that worked, or didn’t work, to solve a specific problem.

Most network and security operations centers lack the type of granular activity classification described here, or even the ability to quickly or automatically distinguish between self-healing issues and events requiring hands-on remediation. It’s either noise, or not, and reports and analyses of the “or not” category remain skewed by items that don’t ultimately require IT’s focused attention.

Resolution Intelligence Cloud provides enhanced granularity that carries over into the next phase of classification - analytics that drive true resolution.

Resolution beyond response

Closing tickets is a bit like gardening. You keep pulling weeds in the same space every day until you stop them from growing back. So, how do you stop pulling weeds and spend more energy making the garden flourish?

The practiced answer? Analyses such as the one we detail below, but conducted on an infrequent or ad-hoc basis. For example, when you’re planning a major hardware upgrade or digitalization initiative. Outages persist and someone calls for a review. Providers consistently under-perform, so “Something has to be done.”

Project and event-driven analysis tends to be reactive in nature and there is no easy way for IT to respond, much less become more forward-looking. Pure-play metrics generated by monitoring platforms mainly talk to your current state. Ticketing systems are a better source of truth but offer little context or actual insight.

Platforms like Jira and ServiceNow optimize activities around workflow processes – which queue should something funnel into, or how long did a particular issue take to close not true resolution. Combining the two sets of outputs—which isn’t easy—still doesn’t get you to the root of the problem.

CIOs can’t immediately use data from these platforms to extrapolate and drive meaningful conclusions toward a clear course of action, and the absence of an aggregate bird’s-eye view creates extra work downstream.

 

Automated problem classification uses AIOps to filter out noise from real alerts. Non-noise must then be classified into problems likely to self-heal and those requiring immediate attention. Resolution Intelligence Cloud combines the two for end-to-end classification, faster time to action, and actionable insight on problems requiring action.

This level of classification and collaborative correlation introduced by Resolution Intelligence Cloudchanges the
game, making it possible for CIOs and their teams to quickly and continuously answer true value-oriented questions. 

  • Which types of problems are consuming the most analyst time to resolve?
  • What are the most frequently occurring—or most costly—network-related issues in our environment during the past three months?
  • What needs to be optimized for higher capacity?
  • What is our real availability and security posture?
  • Which services appear to be engineered better than others?
  • Do we need to rearchitect all or parts of our systems? Drive automation deeper within IT or the company?
  • What actions will you take next to bulletproof your system?

 

Reassessing assessments

Being able to more readily draw these conclusions may negate the need for costly consulting or systems integration engagements and third-party assessments, which many companies can’t afford or support anyway. Resolution Intelligence Cloud makes it viable to continuously tune and assess the overall health of ITOps without relegating such activities to point-in-time efforts. This has significant impact to assessments as a practice and may just, we hope, make ad-hoc assessments redundant.

Armed with an aggregate view of problems facing your company and details of where and why, you need fewer health checks and provider-assisted deep dives.

Modernize as you transform

The global pandemic kicked digital transformation efforts into high gear. Mandates to accelerate depend largely on
Ops—it’s make it or break it right now—and IT needs a new approach. Spending the majority of the budget to just keep the lights on won’t cut it.

Resolution Intelligence Cloud helps in three ways.

  • It improves incident response automation (IRA) and thus, overall incident response with better classification and actionable context.
  • It delivers insights needed to tune the system, reduce cost, and improve ops efficiency.
  • It operationalizes data and tribal knowledge throughout your transformation journey.

Modernization, and leveling silos within IT, also promotes closer collaboration between teams and machines.

Resolution Intelligence Cloud changes the game by delivering next-level analytics that save CIOs, CISOs, and operations teams valuable cycles—while adding invaluable insight—at every stage.

 

Learn about the true costs of existing tools and the benefits of consolidating them

Use our Modern IT Ops ROI calculator to see how quickly you can modernize for:

  • Substantial cost avoidance
  • Exponential productivity improvements
  • Faster payback period
  • Higher User Satisfaction

 

Get your free ROI Report