Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How to deal with alert fatigue head-on

Everyone experiences stress at work—thankfully, it’s a topic folks aren’t shying away from anymore. But for on-call engineers, alert fatigue is a phenomenon closer to home. Unfortunately, like stress, it can be just as insidious and drastically impact those it affects. First discussed in the context of hospital settings, this phrase later entered engineering circles.

Delivering innovation at scale: The 3 pillars of successful Azure cloud operations

Around the world, organizations of all sizes rely on Microsoft Azure to bring modern services online—and deliver innovation at scale. Azure provides the flexibility to roll out cloud-based applications at breakneck speed. But running these applications and services in Azure can add complexity for already overworked IT teams, tasked with boosting performance and reducing costs in ever-evolving cloud environments.

How AIOps improves IT service assurance and optimization

ITOps and DevOps teams face many challenges. Their responsibilities are extensive, from navigating complex IT environments at scale to quickly addressing performance issues and minimizing downtime and outages. Enhancing your organization’s IT service assurance requires you to ensure the reliability, performance, and availability of IT services.

Mattermost AI Copilot: Accelerating the conversation with LLMs

Hello, Mattermost community! We’re thrilled to announce the release of the Mattermost AI Copilot beta, a groundbreaking addition to the Mattermost platform. This plugin is not just a tool. It’s a way for organizations to deploy artificial intelligence in mission-critical environments — a true game-changer. With that in mind, let’s explore how this plugin will establish new standards in workplace collaboration for Mattermost Enterprise customers.

How Squadcast's Snooze Incidents Promotes Focussed On Call Shifts

Dealing with a flood of incidents, each with varying degrees of urgency, can be a daily struggle for Incident Response teams. Suppose a low-priority alert pings while you're tackling a critical incident. This pulls your focus away from the urgent issue. This constant alert bombardment can: How do engineers ensure that high-severity issues take precedence? Don't they want to avoid being bothered or bombarded with notifications while addressing critical matters? They sure do.

Advice for building an incident management program

On this weeks' episode of The Debrief, we chatted with Jeff Forde, an Architect on the Platform Engineering team at Collectors. With a background spanning finance, healthcare, and various product-led startups, Forde has honed his expertise in DevOps, site reliability, and platform engineering. Beyond his professional life, he's also a dedicated volunteer first responder and certified fire instructor in Connecticut, offering him a unique perspective on managing incidents of all typesz.

SOC 2 Compliance Requirements: Examples, Use Cases + More

SOC 2 compliance requirements (Service Organization Controls Type 2) ensure that customer data stays private and secure — essential for any business that stores or processes sensitive data. In this blog, we’ll explore the specifics of SOC 2 compliance, and provide a solution to help you automate and enforce SOC 2 compliance going forward.

Release Roundup March 2024: More ways to discover and test your services

2024 is off to a fast start here at Gremlin. Since our last release roundup, we’ve released new experiment types, new features to improve integration with cloud platforms, and improvements to our auto-detection processes. Now you can push processes to their limits, find dependencies even easier, limit when tests can be run, and much more. We also introduced a slew of platform improvements to improve efficiency, performance, and user experience in the Gremlin web application.