Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Q&A with Alex Hidalgo on SLOs

Alex Hidalgo is a Site Reliability Engineer at Squarespace, and he’s currently writing a book called Implementing Service Level Objectives for O’Reilly Media. The first three chapters of the book are available now through O’Reilly’s early access program. I had a chance to read those chapters and ask Alex some questions about service level objectives and reliability. Thanks, Alex, for sharing your knowledge.

Why Your Online Business Needs SMS Messaging in 2020

2020 might be the year that we get mainstream adoption of folding cell phones and 5G connectivity, but some things don't change, and SMS usage appears to be one of them. This reliable technology is still widely used by businesses, professionals, and consumers on a daily basis, even as social media dominates our lives. SMS messaging is just convenient and quick.

Tech Talk: Agentless Monitoring and Custom Monitors

In addition to Agent-Based Custom Monitors, OpsRamp supports agentless infrastructure monitoring to allow our customers to track the health and performance of network, storage, and virtual resources as well as platform as a service (PaaS) cloud resources. In this interactive session we'll dive into why this is more important than ever as organizations adopt remote work strategies and prioritize digital transformation initiatives.

Modernizing and Consolidating Your Monitoring Without Losing It...

The current days of remote work and “IT Ops from home” may or may not be here to stay, but they definitely reinforce the need for consolidating and modernizing our monitoring. The challenges which multiple siloed tools create for understanding the big picture are only exacerbated by having just one screen to look at when monitoring our IT from our kitchen table.

Coronavirus: From the Office to Working From Home

Coronavirus (COVID-19) is greatly impacting the lives of organizations, employees and stakeholders. With the outbreak’s rising impact, more employees are migrating to remote, work-from-home practices as means of achieving “social distancing.” However, inevitable challenges are emerging with remote workdays. Obstacles include, but aren’t limited to, employee isolation, diminished productivity and poor team communication or collaboration.

Best Practices for Pragmatic Incident Command

The goal of this piece is to provide some practical advice on how teams can coordinate and respond to complex, dynamic incidents. After all, incidents are unplanned investments that surface valuable learnings for improvement. For the purposes of this blog, we define incidents as situations where there is a need for coordination among multiple people working on the same problem. There will be incidents where this is not the case.

IT Operations in the Age of Coronavirus

Coronavirus has been a shock to the system for many IT organizations that are traditionally accustomed to working together in person. When you’re in an office, you can often use informal methods of communication—like swinging by someone’s desk, calling them on their office extension, or even imparting critical information when you run into them in the company cafeteria.