Operations | Monitoring | ITSM | DevOps | Cloud

Blog

What is Subkeying?

Subkeying is a way to group a set of crashes at some level other than the top level of the call stack. Subkeying is a way to group a set of crashes at some level other than the top level of the call stack. At BugSplat, crashes are grouped by a stack key and groups of crashes can be found on the Summary page. By Default, BugSplat groups crashes using the topmost level of a call stack. A subkey is created when crashes are grouped at a level other than the top level of a call stack.

Getting Microsoft Azure Data into Splunk

If you're reading this, you're probably wondering how to get data from various Microsoft Azure services into Splunk. With the growing list of Azure services and various data access methods, it can be a little cloudy (pun intended) on what data is available and how to get all that data into Splunk. In this blog post, I'm going go over how Microsoft makes Azure data available, how to access the data, and out-of-the-box Splunk Add-Ons that can consume this data. So let's dive right in.

Splunk Attack Range Now With Caldera and Kali Linux

The Splunk Security Research Team has been working on new improvements and additions to the Splunk Attack Range, a tool that allows security researchers and analysts to quickly deploy environments locally and in the cloud in order to replicate attacks based on attack simulation engines. This deployment attempts to replicate environments at scale, including Windows, workstation/server, domain controller, Kali Linux, Splunk server and Splunk Phantom server.

Introducing the Datadog Operator for Kubernetes and OpenShift

As more environments run on Kubernetes—including our own— Datadog has been making it easier to get visibility into clusters of any scale. To minimize load on the Kubernetes API server, the Datadog Agent runs in two different modes. The node-based Agent queries local containers or external endpoints for data, while the Cluster Agent fetches cluster-level metadata from the API server.

How to identify and resolve front-end performance bottlenecks

We all want lightning-fast websites and applications, but how do we prioritize our efforts in order to have the biggest impact on performance? We interviewed our own front-end team so we could share some best practices we use every day to improve and maintain the performance of Raygun.

Redis monitoring 101: Metrics to watch

Redis, which stands for Remote Dictionary Server, is an open source, in-memory data structure store that’s used as a database, memory cache, and message broker. It stores data entirely in memory in the form of key-value pairs. This gives it an edge over all other databases, as it eliminates the need to access data from the disk. It also makes Redis one of the fastest NoSQL databases, where data is accessed in microseconds because there are no seek time delays.

Grafana 7.0 preview: New image renderer plugin to replace PhantomJS

Many Grafana users export images of their dashboard panels. This feature powers the ability to receive alerts with a rendered image of the panel attached, which is valuable for quickly spotting if something is about to go sideways in production. Since Grafana v2.0, when support for server-side rendering of dashboard panels as images was introduced, PhantomJS has served as the built-in image renderer that enables this feature.

How to deploy an app to AWS: App security

AWS security is an ongoing battle that you must address during every release, every change, and every CVE. When you’re first launching your production application, it’s impossible to check all the boxes; you simply don’t have the time. Until your application gets more adoption, you only have the time to do the bare essentials of security.

Sysdig's Prometheus monitoring behind the scenes

A few weeks ago, we announced that Sysdig is offering fully compatible Prometheus monitoring at scale for our customers, as well as a new website called PromCat.io hosting a curated repository of Prometheus exporters, dashboards and alerts. This got me thinking about how we were actually able to implement the changes necessary to offer this in our platform.

Best Practices in Incident Management

In an always-on world, companies look to systems and processes to keep their services up and running at all times. The most important part of maintaining this uptime is having an Incident Management process in place to restore your services in the event of an interruption or unplanned downtime. Incident Management processes are typically used by SRE, DevOps, NOC and other IT teams to respond to incidents that affect services and work on restoring their uptime.