Operations | Monitoring | ITSM | DevOps | Cloud

Blog

Timestamps On Downtime Alerts

We've made a useful improvement to Downtime Monkey alerts. Each downtime alert now includes a timestamp that shows the time that the website went down and each uptime alert includes a timestamp that shows the time that the website came back up. This turned out to be more work than expected, largely because we thought we'd knock it out in under an hour :) Although it wasn't totally straightforward to develop, the end-result is incredibly simple to use...

Ready to move on and pick up speed again

We are going through an incredibly difficult time of uncertainty, lockdowns, cutbacks, and even fear. Taking this time to optimize and rethink the way we do business is essential in ensuring we get back on track and return even stronger than before. Most of us have been working from home for months now and, in some cases, there is no end in sight. How are you and your operations holding up? Are you able to work, maintain, and control your infrastructure?

A New Chapter

Today is an exciting day for LogDNA! I have two wonderful announcements to make. First, we’ve officially announced that LogDNA has closed a $25 million series C round led by Emergence Capital. Second, and most importantly, I’m thrilled to share that Tucker Callaway, LogDNA’s current President and Chief Revenue Officer, is transitioning into a new role as the company’s Chief Executive Officer (CEO).

How to use check aggregates in Sensu Go

Aggregates, which allow you to monitor groups of checks or entities, were a much-beloved feature in Sensu Core (the predecessor to Sensu Go) — Ben Abrams describes them as “awesome” in his post on alert fatigue, noting that aggregates are like having “a bunch of nodes behind a load balancer where each node is healthchecked, and if a node drops out it may not be worth waking someone up in the middle of the night.”

SpringOne Sessions and Workshops Now Released-Register Today!

SpringOne is going free and online in 2020. On September 2 and 3, hear from all your favorite speakers and companies along with a host of new experts, and access exciting new content—all as part of your virtual event experience: Catch up on the latest announcements from the Spring and VMware Tanzu product teams. Get inspired by hearing real-world success stories from other organizations. Surf across five tracks of breakout content, presented live over two days.

vSphere 7 with Kubernetes Network Service, Part 2: Tanzu Kubernetes Cluster

vSphere 7 with Kubernetes enables operations teams to deliver both infrastructure and application services as part of the core platform. The Network service provides automation of software-defined networking to both the Kubernetes clusters embedded in vSphere and Tanzu Kubernetes clusters deployed through the Tanzu Kubernetes Grid Service for vSphere.

OpManager now supports SMSEagle, Twilio, and Clickatell, so you can get SMS alerts anywhere!

IT admins need to know the status of their IT devices, servers, routers, switches, and firewalls. To meet this need, OpManager has a highly responsive and robust notification and alerting system that sends alerts via email, Slack, and even SMS. Murphy’s law says anything that can go wrong will go wrong, and if you’re in IT, you’re probably familiar with how easily things can go wrong.

Test on-premise applications with Datadog Synthetic private locations

Synthetic monitoring lets you improve end user experience by proactively verifying that they can complete important transactions and access key endpoints. But your applications serve many users, from customers to all the employees who run your business. This makes testing the performance of any internal-facing services within your private network just as critical as monitoring your external-facing applications.

New free Ping Tool. Ping from multiple locations all at once.

Ping is a network tool. The tool seeks out a given address over a network to check if one networked device can communicate with another device. The tool then reports on the quality of the connection based on data loss and response times. Uptrends’ new Ping Tool conducts ping tests from multiple worldwide locations at the same time. You can instantly spot localized downtime and latency issues from around the world using one simple tool.

How to Classify Incidents

Incident classification is a standardized way of organizing incidents with established categories. Incidents can include outages caused by errors in code, hardware failures, resource deficits — anything that disrupts normal operations. Each new incident should fit into a category dependent on the areas of the service affected, and in a ranking of the severity of the incident. Each of these classifications should have an established response procedure associated with it.