Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How to Do Effective Infrastructure Monitoring for Linux with Grafana

Grafana Labs has 8+ clusters in GKE running 270 nodes of various sizes, and all the hosted metrics and hosted log Grafana Cloud offerings are run on 16-core, 64-gig machines. At the recent All Systems Go! conference in Berlin, David Kaltschmidt, Director, User Experience, gave a talk about what monitoring these clusters and servers looks like at Grafana Labs and shared some best practices.

Android malware: How do enterprises tackle this ever-growing menace?

Let us first agree on a couple of things before we start: One, Android is the most affordable platform for enterprises with a mobile-first/mobile-only workforce, and it has the smallest learning curve of any mobile OS. Two, due to its very open-source nature, Android is easy for malicious actors to pray on, with the Google Play Store being the breeding ground for many attacks.

Monitor Lambda cold start durations with CloudWatch

When you look at an X-Ray trace for a Lambda cold start, you will see an Initialization subsegment. This subsegment represents “the function’s initialization code that is run before the handler”. This is where the runtime would resolve any dependencies, or initialize global variables. These are executed only once, so they don’t have to run on every invocation. The more dependencies you have, the longer this initialization step takes.

Hybrid Cloud Performance: Five Key Things That You Need to Keep in Mind

Digital Enterprise Journal’s (DEJ) research shows that organizations view cloud strategies as key areas for using technology to create business value. Organizations are seeing the importance of cloud deployments in each of the following areas of digital transformation - agility (71%), innovation (65%) and cost optimization (61%). The majority of cloud deployments are being conducted in a hybrid model where parts of IT services are being managed on-premises and in private and public cloud(s).

A single person on-call "rotation" is a critical vulnerability

One of the most common complaints we hear from operations and site reliability engineers is about the quality of life impacts and the resulting stress imposed by their on-call responsibilities. Most of us are already aware that a proper on-call rotation is critical to our engineering organization’s health in terms of both immediate incident response and long-term sustainable growth.

How to Monitor Amazon Redshift

In the first post of our three-part Amazon Redshift series, we covered what Redshift is and how it works. For the second installment, we’ll discuss how Amazon Redshift queries are analyzed and monitored. Before we go deep into gauging query performance on Redshift, let’s take a quick refresher on what Amazon Redshift is and what it does.

Log4net for .NET Logging: The Only Tutorial and 14 Tips You Need to Know

If you’ve been writing code for any reasonable amount of time, then it’s virtually impossible that you haven’t handled logging in any way, since it’s one of the most essential parts of modern, “real life” app development. If you’re a .NET developer, then you’ve probably used some of the many famous logging frameworks available for use at this platform. Today’s post will cover one of these frameworks: log4net.

What Your IT Chatbot Can Look Like Running on Full Power

Gartner once wrote that by 2020, “the average person will have more conversations with bots (chatbots) than with their spouse”. That prediction might seem a bit farfetched and probably more of a sign for relationship disaster, but there is some truth there: for many people chatbots are already an integral part of modern-day life. From turning on the lights, to picking our favorite songs and ordering food—we use chatbots for almost everything and expect them to work all the time.