Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Managing Application Uptime - Hosted Status Page vs DevOps Team

When your software goes down, there are two audiences that need to know about it. One: the people who are going to get frustrated and blame you for the inconvenience. Two: the people who can fix the problem. The first audience doesn’t need to know the details of the problem – they just need to know that you’re on top of fixing it, and how long they can expect to wait before full functionality is restored (insofar as you can make a realistic estimate about that).

SaaS Application Uptime- APM and DevOps

If you care about the uptime status of your website or SaaS application, there are two really great pieces of content shared last month that you should look into. One is an article on continuous testing from Parasoft Corporation, featured on DZone. The other is a recorded presentation on Application Performance Monitoring (APM) by Expected Behavior, from the Full Stack Toronto conference.

How to Setup Site24x7's Real User Monitoring (RUM)

Have you ever wondered if your end-users are truly satisfied with your web applications? Would you like to get accurate insight into end-user experience for better business decisions that will impact your bottom line? ~With Site24x7 Real User Monitoring you can! Get ready to gain real-time visibility into end-user experience ( for ALL users, browsers, devices and geographies) and behind-the-scenes performance for your web application.

Network Configuration Management Best Practices

One of the biggest responsibilities of system administrators and DevOps professionals is ensuring networks are always functioning properly. Network configuration management used to be a simple task. Watch resource usage and make the appropriate tweaks when the occasional traffic spike occurred. Since then, the rise of agile principles within the DevOps field has required system administrators to adapt to rapid shifts in their field.

Deploying a Django App with No Downtime

When healthchecks.io started to receive more than 1 request per second, it became clear I could not just go on carelessly restarting web servers after code deploys. For a monitoring service, it would be bad form to miss even a few HTTP requests. And, going forward, if the server gets busier, the problem only becomes bigger.

Intro

I needed a tool to alert me when my cron jobs silently fail. There is already a number of existing services for this, but it seemed like a fun thing to build myself. So I present to you: healthchecks.io. I am using this myself and it has already been useful for me a couple times. Say, a seemingly benign code change in one service causes my batch job to fail 12 hours later, in the middle of night.

Top 10 Reasons AlertOps is Better Than PagerDuty: #2

A service-level agreement (SLA) defines the level of service expected from a service provider. As such, an SLA plays a key role in an organization’s ability to fulfill customer requests. If an organization breaks an SLA, it risks significant revenue and brand reputation damage. Perhaps worst of all, this organization may lose customers to its rivals if it cannot comply with SLA mandates.

Top 10 Reasons AlertOps is Better Than PagerDuty: #3

AlertOps offers more flexibility than our competitors when it comes to alerting capabilities – and that can mean more power for our users. Does your incident management team have the ability to send the right messages, to the right team members, every time? If not, team members may be forced to deal with alert overload due to the sheer volume of notifications that they receive over time. In many instances, an entire incident management team receives notifications.