Operations | Monitoring | ITSM | DevOps | Cloud

Troubleshoot faster with process-level app and network data

When responding to an incident, you need to quickly find the scope of the issue so you know which teams to notify and which parts of your system to investigate next—before your end users are affected. But as multiple processes use resources on each of your hosts, and interact in unexpected ways, it can be difficult to know exactly what is causing an issue—especially if those processes are running off-the-shelf software.

Creating Envoy WebAssembly Extensions

In the CNCF ecosystem, Envoy, an open source service proxy developed by Lyft, is a very common choice in service mesh networking. In a previous post we discussed that both Consul and Istio leverage Envoy. Were you aware that you can extend Envoy’s capabilities with WebAssembly? What is WebAssembly? WebAssembly, or Wasm as it is often abbreviated, is not so much of a programming language as it is a specification for a binary instruction format that can be run in sandboxed virtual machines.

How to encourage DBAs to embrace DevOps, rather than fear change

How do we help Database Administrators (DBAs) embrace DevOps in a way that can be really productive and part of a rich DevOps team that delivers value to customers quickly and continuously? That’s an important question to ask right now because there’s a common view among DBAs that DevOps isn’t for them. They’re responsible for documentation and maintenance and deployments, they have internal customers, and they serve internal requests.

5 things you can do to improve your customer support (part 2)

From my previous blog, I’m going to continue the list of five things you can do to improve your technical service delivery to your customers (if you didn’t read the last post, you can catch up on what you missed here (link)). In the following three points, I focus on the role automation can play.

Introduction to Custom Metrics in Python with the Logz.io RemoteWrite SDK

We just announced the creation of a new RemoteWrite SDK to support custom metrics from applications using several different languages. This tutorial will give a quick rundown of how to use the Python SDK. Using these integrations, Prometheus users can send metrics directly to Logz.io using the RemoteWrite protocol without sending them to Prometheus first. Each SDK, while for a separate language, is each capable of working with frameworks like Thanos, Cortex, and of course M3DB.

GitLab GUI

Has GitKraken made my dev life easy? It’s been 6 months since I started at Pipefy as a Young Gun Tech. During these months, I have learned a lot and used various tools to streamline my work. For this post, I will talk about how I use the GitKraken Git GUI with GitLab, running on Ubuntu, because both tools have an awesome integration. So you can speed up your workflow just like me.

Contextual Information: The Missing Piece in The AIOps Puzzle and How to Fix It

AIOps as a function is steadily gaining popularity, even climbing the Gartner Hype Cycle. Today’s observability tools go beyond merely monitoring to perform proactive remediation of events and incidents. However, what many of them lack is context. For instance, consider a regular AIOps solution that identifies an anomaly in system behavior. It will raise an alarm and a remediation workflow will do its job.

Citrix Tips for Troubleshooting

I recently saw a user asking on EUC Slack “is there a Domain controller response time in ?”. Unfortunately for him, his choice of monitoring product doesn’t include such metrics. However, it did make me wonder if Citrix admins are aware of the importance of getting metrics about Domain Controllers, simply because many EUC monitoring tools fail to monitor them.

Monitoring and Alerting 101: Monitoring Best Practices

An effective monitoring system is paramount to smooth business operations. As the need for a fast, responsive software experience gains momentum, monitoring becomes an indispensable driving force. Monitoring systems enable IT teams to proactively observe the health and responsiveness of critical environments and applications. Without monitoring, organizations must depend on customers or internal departments to receive notice of system issues.