Latest News

Solve financial services ITOps challenges with AIOps

Mar 4, 2024 By Nathan Bao In BigPanda

The financial services industry is experiencing a profound shift. Customers now demand a flawless experience across all touchpoints, including online platforms, mobile devices, ATMs, and physical branches. Any lapse in performance or reliability in these channels can lead to dissatisfaction. Moreover, the competition is intensifying as technology-focused companies, more nimble and innovative than traditional counterparts, are continuously disrupting the market.

Read Post

BigPanda

Read more about Solve financial services ITOps challenges with AIOps

DORA vs. DORA!

Mar 4, 2024 By Lee Fredricks In PagerDuty

There was recently some confusion in the office that I thought was worth researching and addressing. Depending on who you are talking to, you may hear the acronym DORA in one of two contexts. (OK, three if you’re talking to a preschooler!) It might be in relation to DORA metrics–that is, a set of metrics associated with DevOps Research and Assessment.

Read Post

PagerDuty

Read more about DORA vs. DORA!

Trade-off Between Reliability and Feature Velocity

Mar 1, 2024 By Anjali Udasi In Zenduty

The pressure to constantly innovate and release new features can often clash with the need for a stable and reliable product. While there might be some temporary cutbacks in testing time to achieve high feature velocity, ensuring reliability doesn't have to be an afterthought. We reached out to industry experts to gather their insights on ensuring reliability during phases that demand high feature velocity. Here's what they had to say.

Read Post

Zenduty

Read more about Trade-off Between Reliability and Feature Velocity

Navigating the Evolving Landscape: A Deep Dive into REST API Versioning Strategies

Feb 29, 2024 By Vishal Padghan In Squadcast

In the ever-evolving landscape of APIs, ensuring seamless interactions and managing changes becomes crucial. While innovation and adaptability are essential, maintaining backward compatibility is equally important to avoid disruption for existing users. This is where REST API versioning comes into play. Versioning allows you to introduce new features or changes to your API in a controlled manner, while simultaneously keeping older versions running smoothly.

Read Post

Squadcast

Read more about Navigating the Evolving Landscape: A Deep Dive into REST API Versioning Strategies

Negotiating Priorities Around Incident Investigations

Feb 29, 2024 By Fred Hebert In Honeycomb

There are countless challenges around incident investigations and reports. Aside from sensitive situations revolving around blame and corrections, tricky problems come up when having discussions with multiple stakeholders. The problems I’ll explore in this blog—from the SRE perspective—are about time pressures (when to ship the investigation) and the type of report people expect.

Read Post

Honeycomb

Read more about Negotiating Priorities Around Incident Investigations

Combating IT Alert Fatigue

Feb 29, 2024 By StatusCast In StatusCast

With the growing complexity of IT systems, managing alerts and notifications without succumbing to the crippling effects of alert fatigue has never been more challenging. Alert Fatigue occurs when the volume of notifications makes it impossible to discern signal from noise, desensitizing the recipient to warnings, some of which end up representing critical issues.

Read Post

StatusCast

Read more about Combating IT Alert Fatigue

Finally: alerting and on-call scheduling for how you actually work

Feb 29, 2024 By Robert Ross In FireHydrant

TL;DR You deserve a better alerting and on-call tool. So we built Signals. In our early days, we often used the tagline, “You just got paged. Now what?” It encapsulated how FireHydrant solved for all of the messy bits that come after your alert is fired, from incident declaration all the way through to retrospective. At the time, we saw alerting and on-call scheduling as a solved problem.

Read Post

FireHydrant

Read more about Finally: alerting and on-call scheduling for how you actually work

Integrating Prometheus AlertManager with PagerDuty in Calico

Feb 29, 2024 By Joao Coutinho In Tigera

In the fast-paced world of Kubernetes, guaranteeing optimal performance and reliability of underlying infrastructure is crucial, such as container and Kubernetes networking. One key aspect of achieving this is by effectively managing alerts and notifications. This blog post emphasizes the significance of configuring alerts in a Kubernetes environment, particularly for Calico Enterprise and Cloud, which provides Kubernetes workload networking, security, and observability.

Read Post

Tigera

Read more about Integrating Prometheus AlertManager with PagerDuty in Calico

Start Monitoring Third-Party Outages in Opsgenie

Feb 28, 2024 By Nuno Tomas In isDown

In today's digital world, we rely a lot on third-party services. These services are great because they help us grow, be more flexible, and work more efficiently. However, they also make things more complicated and risky. If a service we depend on stops working, it can cause big problems. To deal with this, we're excited to introduce a new feature that connects Opsgenie with IsDown.

Read Post

isDown

Read more about Start Monitoring Third-Party Outages in Opsgenie

Balancing Innovation and Reliability: A Guide for SRE Teams

Feb 28, 2024 By Vishal Padghan In Squadcast

In today's rapidly evolving technological landscape, striking a balance between innovation and reliability is a constant challenge for Site Reliability Engineering (SRE) teams. On one hand, businesses and customers crave the constant stream of new features and functionalities that fuel progress. On the other hand, ensuring system stability, minimal downtime, and optimal performance remains paramount for user experience and business continuity.

Read Post

Squadcast

Read more about Balancing Innovation and Reliability: A Guide for SRE Teams

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Solve financial services ITOps challenges with AIOps

DORA vs. DORA!

Trade-off Between Reliability and Feature Velocity

Navigating the Evolving Landscape: A Deep Dive into REST API Versioning Strategies

Negotiating Priorities Around Incident Investigations

Combating IT Alert Fatigue

Finally: alerting and on-call scheduling for how you actually work

Integrating Prometheus AlertManager with PagerDuty in Calico

Start Monitoring Third-Party Outages in Opsgenie

Balancing Innovation and Reliability: A Guide for SRE Teams

Monthly Archive

Follow Us