Operations | Monitoring | ITSM | DevOps | Cloud

%term

Monitoring Microservices: IT's Newest Hot Mess

In this THWACKcamp session, you’ll learn how microservices are different from other applications, when performance bottlenecks most often occur, how they tend to break, and where you can add monitoring to stay ahead of trouble. You’ll also see how to extend existing infrastructure dashboards to include microservice workloads, cut troubleshooting time, and include new business metrics that measure the business goals driving microservices in the first place.

Six Ways to Improve Your Security Posture Using Critical Security Controls

Security policies within organizations are under a lot of scrutiny in today's times. Trying to stay up to date with these policies can create stress to users and the IT staff managing the infrastructure. Just like network standardization is a must, so is security standardization.

"Observability": Just a Fancy Word for "Monitoring"? A Journey From What to Why

Too often, monitoring is a never-ending arms race. We keep adding more monitoring in response to new problems, but the cycle never seems to end. Humans, (the business), drive new changes, which cause new problems, and need more, new monitoring. And that’s where real, useful observability may be able to help finally identify root cause and break the cycle of reactive monitoring for novel issues.

Monitoring Like a Network Engineer When You're a SysAdmin

Last year, we showed network engineers how to monitor like sysadmins. This year, we're flipping the script and showing systems administrators that there's nothing to fear from those network devices, and that monitoring them won't steal precious time from ensuring business services are up and users are happy.

Ruby Agent 2.4.21 is out with a bug fix, a new configuration option, and a debug option

As reported on Issue #228, if scout_apm is disabled on a node via the configuration monitor = false, we don't intend to install any instruments, but a few snuck in anyway. Since the rest of the agent isn't running, they (slowly but steadily) built up recorded info, but didn't purge it, causing a slow memory leak that became clear over the course of a week or two. We've stopped the offending instruments from installing themselves when Scout is disabled.

Overcoming The Black Box Problem With Machine Learning in IT Operations

Chronically understaffed and constantly stressed-out IT Ops and NOC teams are overwhelmed by today’s IT noise. Artificial Intelligence (AI) and Machine Learning (ML) can help these teams because ML (and AI) are exceptionally good at processing enormous volumes of very complex data in real-time, or near real-time, and surfacing actionable insights. But ML successes in IT Ops are still hit-or-miss.