Operations | Monitoring | ITSM | DevOps | Cloud

Introducing the Spike.sh Alert Reliability Engine

At Spike.sh, our mission is to help dev teams understand and resolve production issues faster. At the core of this is our Alert Reliability Engine, whose job is to make sure that a team member always gets an alert on their preferred channel. Currently, we support 7 channels - phone call, SMS, mobile push notifications, email, Slack, Microsoft Teams and Discord. We wanted to give you a peek into how we achieve high deliverability across these channels.

Transforming Employee Experience with Freshservice Virtual Agent- A Freshworks Story

Employee expectations of the workplace are rapidly evolving, and consumer-like experiences quickly become a benchmark to measure internal support teams. A research report from Harvard Business Review Analytics Services and Freshservice found that 82% of those surveyed say employee happiness is impacted by how well workplace technology performs.

How we use the k6 load-testing tool for developing Grafana

On the last day of GrafanaCONline in June, our CEO Raj Dutt announced that Grafana Labs had acquired k6 , the company behind the open source load-testing tool. In fact, our relationship with k6 had started more than two years earlier. At the beginning of 2019, we were working on replacing Grafana’s “remember me” cookie solution with a short-lived token solution for the Grafana 6.0 release.

Mobile Vitals - Four Metrics Every Mobile Developer Should Care About

Slow apps frustrate users, which leads to bad reviews, or customers that swipe left to competition. Unfortunately, seeing and solving performance issues can be a struggle and time-consuming. Most developers use profilers within IDEs like Android Studio or Xcode to hunt for bottlenecks and automated performance tests to catch performance regressions in their code during development. However, testing an application before it ships is not enough.

How Alert Notifications Make Incident Response More Effective

HR people have a saying: right person, right place, right time, meaning that the right resources can make all the difference when it counts. The same goes for Incident management and response, where very often the wrong person, place, or time can contribute to mounting catastrophe. As systems grow, the right person really can make the difference during an outage simply due to command or knowledge of the system.

Node.js Security and Observability using Lightrun & Snyk

As developers, we spend a lot of time in our IDEs writing new code, refactoring code, adding tests, fixing bugs and more. And in recent years, IDEs have become powerful tools, helping us developers with anything from interacting with HTTP requests to generally boosting our productivity. So you have to ask — what if we could also prevent security issues in our code before we ship it?

Looking ahead to general availability of Collapsed Reply Threads

We appreciate all the incredible feedback the Mattermost community has provided about Collapsed Reply Threads since launching in beta in Mattermost Cloud and Self-Managed v5.37 and later. We are working as quickly as possible towards resolving known issues and then promoting this feature to be generally available.

How MBTA modernized incident response to reduce alert fatigue and improve collaboration

Citizens utilize mobile and consumer-facing applications in everyday life, so it’s no surprise that they demand seamless access and high availability of government services online. Whether it’s making payments or applying for benefits, citizens and constituents alike expect these services to be available around the clock.