Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How to Build an SRE Team with a Growth Mindset

The biggest benefit of SRE isn’t always the processes or tools, but the cultural shift. Building a blameless culture can profoundly change how your organization functions. Your SRE team should be your champions for cultural development. To drive change, SREs need to embody a growth mindset. They need to believe that their own abilities and perspectives can always grow, and encourage this mindset across the organization.

How to get mobile push notifications from Spike.sh

When an issue happens in your software in production, the channel to send the alert on depends on multiple factors. If it's a critical issue requiring immediate attention, you should alert the team member via phone call. But not all issues require a phone call, and in fact it may become annoying if your phone keeps ringing for minor issues. This is where other channels like SMS, Slack and mobile push notifications come in.

Alert Fatigue and Your Health

As an on-call engineer, you might deal with the day-in, day-out occurrence of alerts. These alerts may come from your alerting provider (PagerDuty, OpsGenie, etc.), Slack notifications telling you the site is down, or the ever concerning text message "Hey, is the site down?". These alerts elicit reactions that range from "shit" to "again?" and in many cases, both.

How We Built and Use Runbook Documentation at Blameless

Even if you don’t notice, you are executing runbooks everyday, all the time. When you have an incident in your day-to-day operations, you follow a series of ordered and connected steps to solve it. For instance, if you lose your internet connection, you will follow a series of steps to resolve that issue: This could be different depending on your method, but you have the idea.

IT Trends You Don't Want to Miss

The COVID pandemic has redefined the workplace and accelerated the process of digitization for many. Organizations are migrating to systems that are flexible, distributed and resilient. Per Gartner, IT spending will reach $3.9 trillion worldwide in 2021. IT teams will be channeling investments into enterprise software as remote work becomes essential. Systems that support remote work will see a growth of 8.8 percent this year.

Why we went passwordless on our new product

Passwords are dying. The cost of creating and maintaining passwords is becoming untenable. Which can be seen in the rise of users logging in with social products and developers outsourcing their pain to Auth0 and the likes. We decided to sidestep the password based authentication and went passwordless on our new product. Read on to see how you can go passwordless too.

Using OnPage to Deliver Exceptional Customer Support

The OnPage Customer Support team consists of knowledgeable, friendly technicians that offer 24/7 assistance. Support recognizes the importance of client relationships and always aims to achieve maximum customer satisfaction. The OnPage incident management system is at the center of Support’s quality service delivery. OnPage triggers instant, critical mobile alerts to technicians whenever customer-initiated tickets are created.

SRE as Organizational Transformation: Lessons from Activist Organizers

In the software industry’s recent past, the biggest disruptive wave was Agile methodologies. While Site Reliability Engineering is still early in its adoption, those of us who experienced the disruptive transformation of Agile see the writing on the wall: SRE will impact everyone. Any kind of major transformation like this requires a change in culture, which is a catch-all term for changing people’s principles and behaviors.

Introducing Incident Timer

We’re excited to announce Incident Timer - a “days without an incident” timer for software teams to keep track of major engineering incidents. As the people behind Spike.sh, we keep discussing how to build a culture of reliability with our customers. We loved the idea of safety/accident timers in factories which kept track of major accidents. It's a simple and elegant way to keep safety on everybody’s minds.

What is DevOps?

What is DevOps? DevOps is a term for a cluster of concepts that has become a movement, “a cross-disciplinary practice dedicated to the study of building, evolving and operating, rapidly-changing resilient systems at scale.” (Jez Humble) The definition of DevOps is not agreed upon by everyone because of the complex processes attached to the term, however, the benefits to teams are universally agreed upon.