Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

The Future of Ops Is Platform Engineering

Two years ago I wrote a piece in The New Stack about the Future of Ops Careers. Towards the end, I wrote: I described the second category as “operations engineering minus the infrastructure,” dedicated to evaluating and assembling a production stack of third-party platform providers, enabling software engineers to self-serve their services and own their own code in production. I said: That second category I was describing now has a name. We call those teams "platform engineering.".

Key Observability Scaling Requirements for Your Next Game Launch: Part III

So far in our series on scaling observability for game launches, we’ve discussed ways to 1) quickly analyze large volumes of telemetry data and, 2) ensure high-quality telemetry data for more effective analysis at lower costs. The best practices in these blogs outline best practices for scaling observability during game launch day – which is necessary to ensure high performance across all infrastructure components – to ensure no lag, no glitches, and no bugs.

Observability and Auto-Remediation

Organizations today are under pressure to stay ahead and maintain IT applications and infrastructure optimally. That means their IT teams are tasked to make sure that functions move along smoothly while minimizing downtime. To keep the lights on, enterprises add whatever domain-specific tools they need. However, these tools are often reactive, and not nearly robust enough to handle complex application topologies.

The Complex But Elegant Relationship Between AIOps and Observability

Digital transformation requires organizational evolution. Constant demand for rapid delivery of upgrades and new products forces change. Surely, the old days of managing monolithic applications housed in private servers are over. Applications consist of virtualized, containerized, and serverless code that’s networked via APIs across a hybrid infrastructure of public and private clouds.

What is an Observability Engineer?

What is an observability engineer? Is it your SIEM admin? How about your application performance monitoring admin? Neither? Both? Observability engineering is more than administering a tool. There is more to it than data onboarding, writing parsers, and getting data in. As an observability tool admin, you work with data producers and consumers to get data in a human-readable and searchable format from the source to the analytics system.

Getting Started with OpenTelemetry: Three Companies Check Into OTel Observability

Comprehensive observability starts with good instrumentation. OpenTelemetry, aka “OTel,” sets a unified standard, enabling you to instrument your applications once, then send that data to any backend observability tool of choice. OpenTelemetry’s standard for generating and ingesting telemetry data is slated to become as ubiquitous as current container orchestration standards. Because of this, development teams are increasingly adopting OpenTelemetry to their applications.

Part 6: Observability Maturity Model Summary

For decades, IT operations teams have relied on monitoring for insight into the availability and performance of their systems. But the shift to more advanced IT technologies and practices is driving the need for more than monitoring – and so observability evolved. With infrastructures and applications that span multiple dynamic, distributed and modular IT environments, organizations need a deeper, more precise understanding of everything that happens within these systems.

Demystifying Observability and Making it Work for You

This article is the final installment in a series that demystifies observability. The first three focused on the history of observability, dispelling myths around observability, and what observability is and what it can offer. In this last article of the series (Check out part 1), I want to offer a complete definition of observability.

Sense and Signals

Complex, distributed software systems are chatty things. Because there are many components interoperating amongst themselves and with things outside their bounds like users, those components and the systems themselves emit many information signals. It’s the goal of monitoring, logging, and observability (o11y) tools to help the systems’ “stewards,” those developers and operators tasked with maintaining and supporting them, make sense of those signals.