Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How Uptime.com can Help Improve Internal Documentation

An acquaintance of mine works for a company that still uses Windows XP to manage some internal applications. The higher ups of the company refuse to adopt the new versions, given costs and technical gaps, and it’s created something of a Pandora’s box for employee turnover. With no strong internal reference documentation, each new departure leaves IT wondering two things. This rather amusing conundrum is apparently not an isolated incident.

How to Test Ruby Code That Depends on External APIs

Few things are more frustrating than slow, flaky test suites. You're ready to deploy, wait 20 minutes for CI to run, only to find that a test failure in code you've never touched is blocking you. You dig into the source and find the problem: an external API call. It works (slowly) most of the time. But sometimes the network glitches and it fails. What do you do? In this article, José Manuel shows us several techniques for removing external API dependencies from our tests.

Using Dynamic Thresholding to Monitor Your Cloud Platforms

Whether you are new to the Cloud, mid-transition, or a professional at cloud or hybrid systems, no one likes being bothered with useless alerts. The options are simple: If you take the approach of ignoring the alert like a bad cold-call, you risk the chance of missing a critical alert and watching your system crash around you. No one likes to open their inbox to a few hundred alerts they have been ignoring.

Monitor and Optimize Your Rancher Environment with Datadog

Many organizations use Kubernetes to quickly ship new features and improve the reliability of their services. Rancher enables teams to reduce the operational overhead of managing their cloud-native workloads — but getting continuous visibility into these environments can be challenging. In this post, we’ll explore how you can quickly start monitoring orchestrated workloads with Rancher’s built-in support for Prometheus and Grafana.

Q&A with Daniel Seravalli, Lead Engineer at Holler: Nailing Observability at Scale

Holler is a messaging tech company that enriches conversations everywhere by creating and delivering useful, entertaining, expressive visual content to add texture and emotion to messaging environments. As the company has continued to grow, the engineering organization has scaled to meet the demand for its services. However, without a fully staffed Operations team, most of the engineers at Holler perform double duty across DevOps to keep the service performant for consumers.

Is your Grafana dashboard ready to spot chaos?

When it comes to systems reliability, you wouldn’t normally think that unleashing additional chaos would actually be helpful, would you? As more engineering teams moved toward microservice-based architectures for cloud applications over the course of this past decade, many of them didn’t change their testing strategies.

Transitioning from the ELK Stack to Logz.io in 5 Quick Steps

At Logz.io, we’ve built our Log Management solution on the ELK Stack because we know it’s what modern engineering teams prefer. It’s familiar, powerful, and integrates easily with other DevOps and cloud technologies. That’s what makes migrating from ELK to Logz.io a seamless process. This means current ELK users can easily transition to Logz.io. If you’re currently using ELK, you can ship the same data using exactly the same shipping mechanisms.

Monitoring and Securing Cloud-Based Databases Is the Developer's Responsibility

Modern application development requires more work to ensure the development path and the data it produces are fully in sync, secure, optimized, and error-free. This responsibility has increasingly fallen upon application developers. They’re being asked to double as database administrators to maintain fluidity in the process and support an agency’s rapid release cycle.

DNS Monitoring 101 - Troubleshoot Anycast DNS Issues

Today’s Tip of the Day is the final of three focused on Domain Name System (DNS) monitoring. In the rest of the series, we looked at how digital experience monitoring (DEM) can (i) help ensure users are served by the correct DNS server to reduce latency and (ii) help to guard against DNS-related attacks. In today’s post, we talk about Anycast DNS, the advantages it provides, the challenges it presents in relation to troubleshooting DNS issues, and how to overcome them with Catchpoint.

Service monitoring and availability made simple with Elastic Uptime and Heartbeat

In the world of IT, availability can mean a lot of things. Your website is available if it is up, responding in a timely manner, sending the correct headers, and serving a valid certificate. Your network is available if the correct hosts are online, responding to ICMP pings, and responding to TCP requests on specific ports. Your API endpoint is available if it returns the correct values when sent specific requests.