DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • OpenTelemetry vs. Prometheus: Which One’s Right for You?
  • A Deep Dive Into Distributed Tracing
  • Cilium: The De Facto Kubernetes Networking Layer and Its Exciting Future
  • The State of Observability 2024: Navigating Complexity With AI-Driven Insights

Trending

  • JUnit, 4, 5, Jupiter, Vintage
  • Securing Cloud Infrastructure: Leveraging Key Management Technologies
  • Using My New Raspberry Pi To Run an Existing GitHub Action
  • Continuous Improvement as a Team
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Monitoring and Observability
  4. O11y Guide, Cloud-Native Observability Pitfalls: Focusing on "The Pillars"

O11y Guide, Cloud-Native Observability Pitfalls: Focusing on "The Pillars"

Continuing in this series examining the common pitfalls of cloud-native observability, take a look at how to avoid the trap of focusing on The Pillars.

By 
Eric D.  Schabell user avatar
Eric D. Schabell
DZone Core CORE ·
Feb. 06, 24 · Analysis
Like (3)
Save
Tweet
Share
2.3K Views

Join the DZone community and get the full member experience.

Join For Free

Are you looking at your organization's efforts to enter or expand into the cloud-native landscape and feeling a bit daunted by the vast expanse of information surrounding cloud-native observability? When you're moving so fast with Agile practices across your DevOps, SREs, and platform engineering teams, it's no wonder this can seem a bit confusing.

Feet with arrows pointing to Pillars and Phases
Unfortunately, the choices being made have a great impact on both your business, your budgets, and the ultimate success of your cloud-native initiatives that hasty decisions upfront lead to big headaches very quickly down the road.

In the previous article, we looked at the problem of controlling cost in cloud-native observability. In this article, you'll find the next pitfall discussion that's another common mistake organizations make. By sharing common pitfalls in this series, the hope is that we can learn from them.

After laying the groundwork in the previous article, it's time to tackle a pitfall where we need to stop focusing: The Pillars. I've spent some time in the past talking about Three Phases to Better Observability Outcomes and published an initial take on why Cloud Native Observability Needs Phases, but this article will be a more in-depth dive into the topic.

Focusing on "The Pillars"

For a few years now vendors have been marketing the idea that you need to focus on certain signals or pillars to achieve what you desire in the world of cloud-native observability.

If you look more closely at this, they are pushing hard for you to concentrate on three pillars: metrics, logs, and tracing, with a few even sliding in events to make it sound all-encompassing. These are touted as things you can tangibly check a box on in your observability stack. What they end up doing is creating a focus on functionality and technology features while completely ignoring the problem at hand.

It's like we have a very nice and expensive car that we cherish and it's started to make funny sounds while emitting smoke when we are driving. We rush to our favorite garage and the mechanic listens to our issues, then proceeds to drag out their toolboxes to show off all the great tools they have to fix issues just like ours. While this is going on and on, we look out the window and see that our car is now not just smoking, but it's on fire!

When we ask our on-call engineers who are the front line in the war of keeping our cloud-native business thriving, they will describe a process they have to go through to achieve that in all the various areas that they consider worth monitoring in our business.

When talking about the process and how it's important to our business goals, we hear the business talking in phrases like:

  • Better business outcomes
  • Faster remediation of problems that occur
  • Easier problem detection
  • Greater revenue generation
  • Engineering teams focused on delivering business value

These are all in a language the business understands and describes more the process that needs to be designed for, not the features the tooling needs to have. When we bring this back to cloud-native observability, we want a solution for our on-call engineers that walks them through the following three phases:

  1. Knowing: We start by discovering something is happening as fast as possible, maybe even leading to a quick fix in this phase.
  2. Triaging: If unable to fix immediately, then we start triaging based on specifically targeted information that is directly related to the problem at hand, which then quickly leads to fixing it.
  3. Understanding: Finally, possibly at a later time and slower investigative pace, we need to have a very deep understanding of the issues encountered to ensure it never happens again.

We don't want to be confronted with visualizations that have been designed and grouping information as categorized signals or as The Pillars. For example, here is something that was actually designed without much thought towards the process needed to solve any kind of issue, but it does capture the signals for you:

Visualization of Metrics, Tracings, and Logs

Good luck with this when you are on-call.

We really want to have clean, concise, and effective visualizations that present focused insights and put just enough information at our fingertips to make informed decisions quickly. We don't care if one metric, 3 labels, 1 span in a trace, and 3 log lines are the basis of the exact informational view -  we need to solve the reason our beeper went off:

Sharply-focused dashboard

Sharply-focused insights to get you through the phases.

The road to cloud-native success has many pitfalls and understanding how to avoid The Pillars, and focusing instead on solutions for the phases of observability will save much wasted time and energy.

Coming Up Next

Another pitfall organizations struggle with in cloud-native observability is underestimating cardinality issues. In the next article in this series, I'll share why this is a pitfall and how we can avoid it wreaking havoc on our cloud-native observability efforts.

Observability Signal Visualization (graphics) Cloud native computing

Published at DZone with permission of Eric D. Schabell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • OpenTelemetry vs. Prometheus: Which One’s Right for You?
  • A Deep Dive Into Distributed Tracing
  • Cilium: The De Facto Kubernetes Networking Layer and Its Exciting Future
  • The State of Observability 2024: Navigating Complexity With AI-Driven Insights

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: