Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How we grew Sentry's monthly active users by rethinking invitations

At its core, Sentry is a tool that alerts you to defects in your production software. But it does more than blast stack traces into your inbox: Sentry provides powerful workflows to help your team determine root cause, triage issues to your team, and keep tabs on ongoing concerns with comments and notifications. These collaborative features can help you resolve problems with your software quickly.

Pro Tip: Instantly Turn Slack Messages into Grafana Annotations with the Memo Tool

I have been a Grafana power user since almost the day it was conceived. During this time, I’ve gotten acquainted with a few quirks but also many features, some of which are rather obscure. One of these features that few know about but I absolutely love is annotations.

Structuring Your Teams for Software Reliability

How well positioned is your team to ship reliable software? What are the different roles in engineering that impact reliability, and how do you optimize the ratio of software engineers to SREs to DevOps within teams? These questions can be hard to answer in a quantifiable way, but projecting different scenarios using systems thinking can help. Will Larson’s blog post Modeling Reliability does just that, and serves as inspiration for this article.

Introducing Git Blame Support for GitHub Integration

At Rollbar, we care about reducing the time it takes developers to find and fix errors. This is why we’re making our integration with GitHub even stronger to provide more context around errors and reduce the mean time it takes to resolve them MTTR. Last year, we launched Code Context to show additional lines of code within each frame of the stack trace, reducing the back and forth between GitHub and Rollbar.

Got Game? Secrets of Great Incident Management

When his phone wakes him at two in the morning, operations engineer Andy Pearson knows it’s bad news. There’s a major server problem, and hundreds of client websites are down. Automated monitoring checks detected the outage within seconds, and paged the on-call engineer. This time, it’s Pearson in the hot seat. Pearson quickly confirms the issue is real and, escalates it to his boss, tech lead Lewis Carey.

Supercharging Workload Security in Your K8s Cluster

2019 was a big year for Kubernetes adoption, and 2020 is sure to exceed that pace. Already, we have seen a large number of organizations migrating their workloads to Kubernetes (k8s) both in public and private clouds as they embrace a hybrid cloud strategy. With so much at stake, what are you currently using for network security inside your k8s cluster?

Incident Response - how great companies do it

An incident response plan is a pre-devised action stratagem for IT teams on how to respond to critical IT events efficiently. As modern applications continue to grow in scale and complexity, there will be more people working on more interdependent systems, consequently, the question is not if a system will fail, but when, and how best to respond.