Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

When It Comes to Security of the Platform, We Mean Business. Here's How.

At Splunk, we understand that a secure platform is a trustworthy one. We strive to implement a protected foundation for our customers to turn data into action, and part of that effort is giving you more frequent insight into the security enhancements that we’ve made to the platform. In this blog series, we’ll share the latest enhancements to Splunk Enterprise, review our security features in depth, and explain why these updates are important for you and your organization.

Monitor AWS Step Functions with Datadog

AWS Step Functions is a service that abstracts distributed applications into state machines, with each state representing a component of an application. Not only does this automatically generate an architectural diagram of your application’s workflow, it also makes it straightforward to reorder your states as well as implement parallel execution, retries, and other tasks.

Jaeger Essentials: Jaeger Persistent Storage With Elasticsearch, Cassandra & Kafka

Running systems in production involves requirements for high availability, resilience and recovery from failure. When running cloud native applications this becomes even more critical, as the base assumption in such environments is that compute nodes will suffer outages, Kubernetes nodes will go down and microservices instances are likely to fail, yet the service is expected to remain up and running.

Finding and Fixing Django N+1 Problems

The Django Python framework allows people to build websites extremely fast. One of its best features is the Object-relational mapper (ORM), which allows you to make queries to the database without having to write any SQL. Django will allow you to write your queries in Python and then it will try to turn those statements into efficient SQL. Most of the time the ORM creates the SQL flawlessly, but sometimes the results are less than ideal.

Is Kubernetes Delivering on its Promise?

A headline in a recent Register article jumped off my screen with the claim: “No, Kubernetes doesn’t make applications portable, say analysts. Good luck avoiding lock-in, too.” Well, that certainly got my attention…for a couple of reasons. First, the emphasis on an absolute claim was quite literally shouting at me. In my experience, absolutes are rare occurrences in software engineering. Second, it was nearly impossible to imagine what evidence this conclusion was based on.

Analyze your logs quickly with suggested queries beta in Cloud Logging

Cloud Logging is a popular tool to help developers, operators, and other users identify and find the root cause of issues in their infrastructure. With features like the Logs Explorer, you can quickly and efficiently retrieve, view, and analyze logs. To help you get the most out of your logs, we’re excited to introduce suggested queries in Cloud Logging to help highlight important logs, so you can start analyzing and troubleshoot issues quickly.

Automation and changing needs, featuring Forrester

In an ever-changing world, the future of work is changing as well, and it has accelerated some areas of automation that we were already moving toward. I sat down with our guest speaker, Leslie Joseph, Principal Analyst Serving Application Development and Delivery at Forrester Research, for a webinar to discuss these questions and get a better understanding around how automation plays an important role in supporting companies through crises and preparing them for an uncertain future.

Understanding your application's critical path

Don’t wait for an incident to focus on reliability. Learn concrete steps for preventing incidents in the first place in our two-part series, Planning and Architecting for Reliability. It’s 3 a.m. You’re lying comfortably in bed when suddenly your phone starts screeching. It’s an automated high-severity alert telling you that your company’s web application is down. Exhausted, you open the website on your phone and do some basic tests.

Visualizing NOC Operations with GroundWork NOC Boards

A monitoring system is a shared tool. It’s useful for teams to operate from the same source of information, since subjective opinions can lead insights astray, especially when troubleshooting systems and network issues. You need a single source of truth. A monitoring dashboard with drill-down capability is a basic tool for any NOC staff. Often displayed on kiosks or wall-mounted in the Network Operations Center (NOC), dashboards let you know at a glance whether anything needs attention.