Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Monitoring Machine Learning Models Built in Amazon SageMaker

Many data science discussions focus on model development. But as any data scientist will tell you, this is only a small—and often relatively quick—part of the data science pipeline. An important, but often overlooked, component of model stewardship is monitoring models once they’ve been released to the wild. Here we’ll aim to convince any unbelievers that monitoring deployed models is as important as any other task in the data science workflow.

OpsRamp Presents at Cloud Expo Santa Clara

Come and learn the latest best-practices on artificial intelligence, cloud management, and the rise of the data-driven IT organization. Global public cloud spending worldwide has now topped $200 billion, according to Forrester Research. Organizations are moving to the cloud at a breakneck pace, looking for agility, flexibility, reliability and cost control.

Intro to NGINX

If you've been following along with my posts, you have a sound introduction to Apache Web Server, how it functions, it's place in history, and how Sumo Logic can help you sort through the numerous logs provided. Apache Access and Error logs are integral to understanding the traffic patterns and issues your users face when accessing your web applications. Sumo Logic helps administrators parse through logs, isolate issues, and determine the root causes of errors.

Understanding Heroku Error Codes with Scout APM

If you are hosting your application with Heroku, and find yourself faced with an unexplained error in your live system. What would you do next? Perhaps you don’t have a dedicated DevOps team, so where would you start your investigation? With Scout APM of course! We are going to show you how you can use Scout to find out exactly where the problem lies within your application code.

Grafana Tutorial: Simple Synthetic Monitoring for Applications

Often there’s a focus on how a service is running from the perspective of the organization. But what does service health monitoring look like from the perspective of a user? There are many metrics that indicate the overall health of a container, vm, or application, but independently they do not indicate if the system is functioning correctly. Often these metrics (CPU, disk, memory) are too narrow, and they can be poor indicators. High CPU may be desirable or bursts of memory usage may be normal.

Paul Dix [InfluxData] | InfluxDB 2.0 and Flux - The Road Ahead | InfluxDays London 2019

Paul will continue to chart the road ahead by outlining the next phase of development for InfluxDB 2.0 and for Flux, InfluxData’s new data scripting and query language. He will discuss Flux’s role in multi-data source environments and explain how InfluxDB can be deployed in on-premise, multi-cloud, and hybrid environments.

Julius Volz [Prometheus] | Creating the PromQL Transpiler for Flux | InfluxDays London 2019

Flux is not only a new data scripting and query language — it is also a powerful data processing engine. This talk by Julius Volz will focus on how he worked with the InfluxData team to build PromQL support for the Flux engine. Hear about lessons learned from building the transpiler and recommendations on why and how to use PromQL and Flux. This talk will include a demo and will share the current project progress.

Uptime.com Check Types | How to Build the Ultimate Uptime Monitoring System

How much infrastructure for a domain or application can fail before the customer starts to notice? What about before your productivity is affected? The answer to these questions will help you fully utilize uptime monitoring. Here are just a few examples of services that can be monitored for better piece of mind.

Sentry for Good

Errors are expensive; they steal resources allocated for other things and potentially negatively impact revenue and user sentiment. And, for teams comprised of volunteers working in their spare time, errors can take weeks to triage and resolve. So, despite what Google might tell you, Sentry for Good is not merely a solution to your pet’s pesky pheromone problems (although it is clearly also that, if PetSmart’s Google results are any indication).