Operations | Monitoring | ITSM | DevOps | Cloud

%term

Monitoring Machine Learning Models Built in Amazon SageMaker

Many data science discussions focus on model development. But as any data scientist will tell you, this is only a small—and often relatively quick—part of the data science pipeline. An important, but often overlooked, component of model stewardship is monitoring models once they’ve been released to the wild. Here we’ll aim to convince any unbelievers that monitoring deployed models is as important as any other task in the data science workflow.

OpsRamp Presents at Cloud Expo Santa Clara

Come and learn the latest best-practices on artificial intelligence, cloud management, and the rise of the data-driven IT organization. Global public cloud spending worldwide has now topped $200 billion, according to Forrester Research. Organizations are moving to the cloud at a breakneck pace, looking for agility, flexibility, reliability and cost control.

Intro to NGINX

If you've been following along with my posts, you have a sound introduction to Apache Web Server, how it functions, it's place in history, and how Sumo Logic can help you sort through the numerous logs provided. Apache Access and Error logs are integral to understanding the traffic patterns and issues your users face when accessing your web applications. Sumo Logic helps administrators parse through logs, isolate issues, and determine the root causes of errors.

10 Reasons You Should Run Your Serverless Applications & FaaS on Kubernetes

Over the last year, along with Kubernetes, Serverless computing platforms have acquired tremendous mindshare among the development community. As Serverless implementations begin to proliferate, I want to make the case that there are tremendous synergies to be gained by bringing both these paradigms together. Some of these benefits have been covered in previous posts. The majority of enterprises are embarking on their DevOps journey. Scaling such processes across a large enterprise is complicated.

Serverless app to speed up all your Lambda functions

A while back, I wrote about how you can shave latency off every AWS SDK operation by enabling HTTP keep-alive, like this. It had the desired effect and I saw lots of people apply this technique in their projects. But it also resulted in the same 10 lines of code being copied and pasted everywhere! I began thinking about ways to distribute an optimized version of AWS SDK so everyone can benefit.

June 2019 Release Overview: Work In Real Time, All The Time, Wherever You Are

This month, we are excited to announce a new set of product capabilities and enhancements designed to ensure that teams can work in real time, all the time, wherever they are. Whether they’re on-the-go with their mobile devices or at their desks on a typical work day, we will continue to innovate without sacrificing ease-of-use and adoption.

How to Decode Your AWS Bill (and What's within DevOps' Control)

The typical AWS bill, otherwise known as the AWS Cost and Usage Report, includes line items that are useful to both finance and DevOps. However, many of the metrics that are within engineers’ and cloud architects’ control aren’t so simple to discover. To make cost a first-class operational metric for DevOps, teams need visibility into the data that’s relevant to engineering activity.

Investigating Timeouts with Tracing

Tracing is one of the key tools that Honeycomb offers to make sense of data. Over the last few weeks, we’ve made a number of improvements to our tracing interface — and, put together, those changes let you think about traces in a whole new way! Tracing makes it easier to understand control flow within a distributed system. We render traces with waterfall diagrams, which capture the execution history of individual requests.

OnPage and ConnectWise: Incident Alert Management Workflows

Let’s set the scene: You’re an on-call engineer, working for a dedicated support team. Your priorities are twofold, including, (1) speedy incident resolution and (2) satisfying clients and stakeholders. With these demands in mind, you adopt OnPage’s integration with ConnectWise. The integration streamlines the ticketing-to-alerting process, ensuring that your team achieves client service excellence.

17 Tech Support Tickets You'll Be Happy You Didn't Receive

If tech support had a motto, it’d be reminiscent of Rule #4 of the Auvik Way: Even when it’s not your fault, it’s your problem. But sometimes, there are problems so bad you wouldn’t want to deal with them. We’ve rounded up 17 examples from the r/techsupportgore subreddit that are sure to send a palm to your face and a shiver down your spine: Plugging in your USB receiver with a hammer for that flush mounted look. from r/techsupportgore Good luck getting that one out.