Latest News

Monitoring Your Platform From Multiple Locations

Jul 14, 2022 By Andrei Danilov In Rootly

Mature start-ups and scale-ups create wonderful and challenging environments for Engineers. As the product they’re creating matures and the brand becomes a successful one, the user base generally starts growing, and, for some companies, in places they might not expect it to grow. As that happens, new challenges arise for Engineers. One of these challenges is pretty straightforward to guess. Basically having a particular product available throughout different regions of the world.

Read Post

Rootly

Read more about Monitoring Your Platform From Multiple Locations

Amazon OpenSearch + Squadcast Integration: Routing Alerts Made Easy

Jul 12, 2022 By Vishal Padghan In Squadcast

Developers often find comfort in embracing open-source software for numerous reasons. One of the most important reasons is the freedom to use that software anywhere and how they wish to. Amazon OpenSearch is an open-source search and analytics suite derived from Elasticsearch. It lets you perform interactive log analytics and real-time application monitoring with ease.

Read Post

Squadcast

Read more about Amazon OpenSearch + Squadcast Integration: Routing Alerts Made Easy

7 ways tagging incidents can teach you about system health

Jul 12, 2022 By Emily Arnott In Blameless

One of the most powerful ways to prepare for future incidents is to study and learn from patterns in past incidents. Blameless Reliability Insights highlights these patterns for you, with out-of-the-box dashboards that automatically collect and present all types of statistical information about your incidents.

Read Post

Blameless

Read more about 7 ways tagging incidents can teach you about system health

SRE Roles and Responsibilities Defined

Jul 6, 2022 By Myra Nizami In Blameless

SRE is a practice that creates a bridge between operations and development. We discuss the roles and responsibilities of a site reliability engineer.

Read Post

Blameless

Read more about SRE Roles and Responsibilities Defined

Top Five Pitfalls of On-Call Scheduling

Jun 30, 2022 By Squadcast Community In Squadcast

On-call schedules ensure that there's someone available day and night to fix or escalate any issues that arise. Using an on-call schedule helps keep things running smoothly. These on-call workers can be anyone from nurses and doctors required to respond to emergencies to IT and software engineering staff who need to fix service outages or significant bugs. Being on-call can be challenging and stressful. But with the proper practices in place, on-call schedules can fit well into an employee's work-life balance while still meeting the organization's needs.

Read Post

Squadcast

Read more about Top Five Pitfalls of On-Call Scheduling

Why More Incidents Are Better

Jun 30, 2022 By Andre King In Rootly

Ask most SREs how many incidents they’d have to respond to in a perfect world, and their answer would probably be “zero.” After all, making software and infrastructure so reliable that incidents never occur is the dream that SREs are theoretically chasing. Reducing actual incidents by as much as possible is a noble goal. However, it’s important to recognize that incidents aren’t an SRE’s number one enemy.

Read Post

Rootly

Read more about Why More Incidents Are Better

Are you doing SRE wrong? 4 questions to ask

Jun 29, 2022 By Auri Poso In Aiven

SRE requires teamwork and planning. Be like Aiven, get it right.

Read Post

Aiven

Read more about Are you doing SRE wrong? 4 questions to ask

Development Velocity (And How To Balance Reliability)

Jun 29, 2022 By Noor-ul-Anam Ruqayya In Blameless

Wondering about development velocity? We explain what development velocity is, how to measure it, and how to balance the need for fast development and reliable products.

Read Post

Blameless

Read more about Development Velocity (And How To Balance Reliability)

Distributed Caching on Cloud

Jun 27, 2022 By Rajiv Srivastava In Squadcast

Distributed caching is an important aspect of cloud based applications, be it for on-premises, public or hybrid cloud environments. It facilitates incremental scaling, allowing the cache to grow and incorporate the data growth. In this blog we will explore distributed caching on cloud and why it is useful for environments with high data volume and load.

Read Post

Squadcast

Read more about Distributed Caching on Cloud

Lightstep Notebooks helps speed troubleshooting for SREs and developers

Jun 27, 2022 By Ben Sigelman In ServiceNow

Digital business is an imperative for 21st-century companies. Increasingly, organizations are directing investments toward technologies that deliver outcomes fast and enable more resilient digital business models. In this landscape, incidents such as software bugs, power outages, or downed networks have major consequences that affect both revenue and customer loyalty.

Read Post

ServiceNow

Read more about Lightstep Notebooks helps speed troubleshooting for SREs and developers

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Monitoring Your Platform From Multiple Locations

Amazon OpenSearch + Squadcast Integration: Routing Alerts Made Easy

7 ways tagging incidents can teach you about system health

SRE Roles and Responsibilities Defined

Top Five Pitfalls of On-Call Scheduling

Why More Incidents Are Better

Are you doing SRE wrong? 4 questions to ask

Development Velocity (And How To Balance Reliability)

Distributed Caching on Cloud

Lightstep Notebooks helps speed troubleshooting for SREs and developers

Monthly Archive

Follow Us