Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Topping top! New Real-Time Process Monitoring

What are the essential things to monitor in your infrastructure? Sure, CPU utilization, memory usage, and IO throughput. However, once you notice a significant load somewhere in your infrastructure you want to know what is causing it, and that typically boils down to needing to find the process that’s using too much CPU or memory or that’s doing disk or network IO like there’s no tomorrow.

Kubernetes issues and solutions

Hi all! I am a part of the architecture team at Avito.ru, one of the world’s top classifieds (read more about Avito here). In this post I want to share our experience in implementing kubernetes at scale. Kubernetes is a powerful orchestration tool that helps us manage dozens of microservices, support robust and fast deploy. It’s really cool that we don’t have to manage resources manually, think about service discovery and so on.

LaborDuty: Incident Response For Baby's Arrival

Real-time operations is a term PagerDuty uses to describe the process in which people can acknowledge, communicate, resolve, and learn from impactful events—all in real time. What can be a more impactful and real time than the miracle of childbirth? Whether it’s your first or fifth child, things don’t always go as planned, but the experience also generally comes with a good story filled with hindsight.

Why Your Cloud Costs Might Be so High-and What to Do About It

Recent headlines surrounding big-name IPOs, such as that of Slack and Lyft, have highlighted the very real costs of operating in the cloud. Companies like these are on the hook to pay AWS and other public cloud vendors tens or hundreds of millions of dollars every year, just to run their services.

API Analysis with the ELK Stack

Pulling in data exposed via API is not one of the most common use cases for ELK Stack users but it is definitely one I’ve come across in the past. Developers wrapping their database services with REST API, for example, might be interested in analyzing this data for business intelligence purposes. Whatever the reason, the ELK Stack offers some easy ways to integrate with this API. One of these methods is the Logstash HTTP poller input plugin.

Machine Learning driven Closed Loop Automation

The reliance on digital transformation and data is ever increasing for businesses to be successful in the current environment. The agility at which the business can respond to real-life situations is proportional to the level of digitization that has been implemented in the business. For a business to nimble and agile, it is imperative that all the processes be delivered as a digital service that can be provisioned, monitored and remediated as by an automation logic at the core of the business.

A safer path to Azure deployments: Site24x7 and Azure Deployment Manager

While deploying updates in large-scale production environments, it can be easy to overlook minor issues that may later turn out to be the cause of major infrastructure problems. To ensure these large deployments are safely rolled out to production, they can be staged in one subset of your environment and then another; for example, once an update is deployed to a subset, it is monitored to make sure everything is fine and then moved to the next subset.