Operations | Monitoring | ITSM | DevOps | Cloud

How I made AWS Lambda work for my SaaS

A big part of Checkly runs on AWS Lambda, but I never really discussed it in depth before on this blog. So here we go. Topics are: Note, I'm using "Lambda" here as a stand in for "serverless" in general. Many of the things discussed here apply to either Google Cloud Functions, Azure Functions and possibly Zeit although I've never used it. First something on how we use Lambda. Last week we went over 35 million check runs.

Building Your Observability Practice with Tools that Co-exist

A lot of product marketing is about telling people to throw away what they have in favor of something entirely new. Sometimes that is the right answer–sometimes what you have has completely outlived its usefulness and you need to put something better in its place–but a lot of the time, what’s realistic is to make incremental improvements. If you’ve been tasked with starting, or growing your observability practice, it may seem a long journey from here to there.

One Size Does Not Fit All: Tailoring Incident Response Messages to Different Stakeholders

In a simpler world, incident response notifications would be a one-size-fits-all type of item. You could deliver the same notification to everyone with equally successful results. But in the real world, incident response messages must be nuanced. Unlike baseball hats or wristwatches, the messages you send to different stakeholders when an incident occurs need to be tailored to each category of recipient.

Grafana Labs Teams Use Jaeger to Improve Query Performance Up to 10x

Grafana Labs works everyday to break traditional data boundaries with metric-visualization tools accessible across entire organizations. It began as a pure open-source project and has since expanded into supported subscription services. The Grafana open-source project is a platform for monitoring and analyzing time series data. There are also subscription offerings such as the supported Grafana Enterprise version. Grafana Labs’ engineers service more than 150,000 active installations.

Reduce Alert Volume and Gain Timely Event Insights with First-Response Policies

Auto-alert suppression management in OpsRamp delivers first-response actions to reduce redundant and noisy alerts. Learning-based first-response policies ensure that IT teams no longer have to create static rules for a target set of resources by configuring alarm thresholds, defining filter criteria, and specifying time intervals.

This New Wi-Fi Security Framework Brings Opportunity for MSPs

Thomas Edison once said, “Opportunity is missed by most people because it is dressed in overalls and looks like work.” That statement couldn’t be further from the truth—but there is an opportunity that MSPs are missing in the Wi-Fi market. If you think about the Wi-Fi market today, it’s an incredibly well-known and mature solution category, where most vendors offer highly similar products that provide a very similar set of capabilities.

6 steps to secure your workflows in AWS

On AWS, your workloads will be as secure as you make them. The Shared Responsibility Model in which AWS operates ensures the security of the cloud, but what’s in the cloud needs to be secured by the user. This means that as a DevSecOps professional, you need to be proactive about securing your workloads in the Amazon cloud. Achieving the optimal level of security in a multi-cloud environment requires centralized, automated solutions.

5 Essential Retrace Custom Dashboard Widgets For DevOps Managers

Imagine a man, a metaphorical man, slumped over, sitting silently across from you. Do you see him? Hastily smashing his fingers against the keyboard with a feverish sweat running down his neck. He, like many, only opens his APM solution after those universally feared “oh shit!” moments. Like a firefighter with a magnifying glass, he dives into his logs looking for a needle in a haystack. But you… Well, you know better than that. You wouldn’t just use your APM on bad days.

Identifying bottlenecks and optimizing performance in a Python codebase

July 08, 2019 In this post, we will walk through various techniques that can be used to identify the performance bottlenecks in your python codebase and optimize them. The term "optimization" can apply to a broad level of metrics. But two general metrics of most interest are; CPU performance (execution time) and memory footprint. For this post, you can think of an optimized code as the one which is either able to run faster or use lesser memory or both. There are no hard and fast rules.

Five reasons to choose Log360, part 3: Comprehensive network auditing

In the previous post, we discussed the various environments that Log360 helps you audit and secure. Having established the ease of Log360’s use and the breadth of its auditing scope, now we’ll examine some of the critical areas it can help you monitor. With over 1,000 predefined reports and alerts for several crucial types of network activity, Log360 provides comprehensive network auditing.