Operations | Monitoring | ITSM | DevOps | Cloud

Latest Blogs

Five Things Your APM Platform Should do for Your Container Application Deployments.

One of the chief complexities in running large scale containerized applications is the need for continuous systems/application monitoring. Containers are very different from traditional VMs and the 3 tier applications that run on them. Monitoring that needs to ensure that SLAs promised to the business are being met as well as an ability to forecast usage trends while identifying problem areas such as bugs, capacity challenges, slowing performance, and any potential downtime.

Dynamic Sampling by Example

Last week, Rachel published a guide describing the advantages of dynamic sampling. In it, we discussed varying sample rates to achieve a target collection rate overall, and having different sample rates for distinct kinds of keys. We also teased the idea of combining the two techniques to preserve the most important events and traces for debugging without drowning them out in a sea of noise.

Why Your Lambda Functions May Be Doomed To Fail

AWS Lambda has a cool feature that can be both a blessing and a nightmare for a serverless application, depending on whether it’s properly handled by our code: the retry behavior. A retry occurs when an invocation of a Lambda function results in an error and the AWS Lambda platform automatically invokes the function again, with the same event payload. Before we get deeper, make sure you are familiar with the AWS documentation on the subject.

Alert escalation - How it works in SIGNL4

Part of any managers role is to make sure their team is taking accountability. Managers are not the front lines resolvers that handle issues, that is what they have a team for. However, managers do need to be aware of incidents that are occurring as well as making sure their team is taking ownership and resolving those issues. SIGNL4 takes the managerial work out of being a manager by providing alert ownership transparency.

Single Pane or Single Pain of Glass?

A lot has been written about the ever-elusive “Single Pane of Glass” (or SPOG). From calling it a myth like BigFoot or The Loch Ness Monster , to reporting that “a centralized, service-centric view into IT environments has become a must-have capability for IT Operations” (2018 Digital Enterprise Journal Study), both opponents and proponents admit that the implementation of a centralized view into IT Ops is a real need, but at the same time, a major operational challenge.

Takeaways From ServiceNow's Knowledge 2019

We had a great time in Las Vegas, attending ServiceNow’s Knowledge 2019 conference. We enjoyed everything the city has to offer, while also exploring the latest on IT workflow transformations. Though there are several valuable experiences to report on, I’ll cover just a few takeaways from Knowledge ‘19 and how it resonated with the OnPage team.

SLO, SLA, SLI Oh My! Creating them can be easy

Imagine you are driving a car on a freeway. Your speedometer is telling you you’re going 62 mph. But you “gotta go fast”. Faster than then 65 mph speed limit. So you go for it: first 68mph, then 75mph, then 80mph. Then you pass a police officer hiding in a speed trap. To your dismay, they pull you over and give you a ticket. All is not lost: there is a silver lining here.

AKS Cluster Performance: How to Better Operate Kubernetes in Azure

AKS is the managed service from Azure for Kubernetes. When you create an AKS cluster, Azure creates and operates the Kubernetes control plane for you at no cost. The only thing you do as a user is to say how many worker nodes you’d like, plus other configurations we’ll see in this post. So, with that in mind, how can you improve the AKS cluster performance of a service in which Azure pretty much manages almost everything?