Operations | Monitoring | ITSM | DevOps | Cloud

Blog

Balancing Centralization and Autonomy: The Key to Automation at Scale

The recent global outage reminds us that identifying issues and their impact radius is just the first part of a lengthy process to remediation. Incidents are inevitable; how we prepare for and learn from them is what sets teams up to respond more effectively next time. As we saw from the remediation steps taken by enterprises around the world, implementing a known fix across a large number of environments that are potentially managed by a number of distributed teams can be a gargantuan challenge.

Alerting with Twilio: Connect Your Monitoring with the Top-1 Communications Platform

You might be surprised. Why does ilert, the platform dedicated to alerting and incident management, publish anything about the direct (in the sense of bypassing an incident management tool) connection between monitoring solutions and Twilio? Do they take the bread out their own month? —You might think. Working on DevOps incident management since 2009, we believe every solution fits specific needs.

The risks - and rewards - of using production data for testing

Data, and the way enterprises use data in areas like development and testing, has not traditionally been a focus for business leaders but that’s now changing. Data is more varied and complicated than ever before, for example, with enterprises using two or more different database platforms – and 40% using four or more. It’s also spread wider and further, with enterprises hosting their databases in a combination of cloud and on-premises infrastructures.

Streamlining Ecommerce Operations: A Case Study on Resolving Stuck Orders with Automation

In the fast-paced world of ecommerce, efficiency is everything. The Global Operations Center (GOC) TOO team constantly battles with ecommerce store orders that get stuck, causing delays and resulting in customer dissatisfaction. Here’s a look at how one retail organization transformed their order troubleshooting process using automation.

Beyond Regulations: How Government Agencies Can Streamline and Automate IT Compliance

From the NIST Cybersecurity Framework to GDPR and more, public sector agencies must comply with a myriad of IT regulatory requirements. These regulations ensure proper financial management and stewardship, security, governance, operational efficiency and effectiveness, incident management – and ultimately, assure public trust and accountability.

Unlock Value with InfluxDB 3.0 and Expert Support Teams

InfluxDB is all about your data: we bridge the gap between an empty database bucket and business value and provide experts to help you derive value from your data. InfluxDB expert support teams come with contracted InfluxDB 3.0 serverless products (Serverless, Cloud Dedicated) and our Clustered on-prem product. Though no customer is left to figure everything out on their own, your product selection will determine the level of custom support you receive.

Understanding Scale Up vs. Scale Out - And Why You Need to Understand Scale Up vs. Scale Out to Be a Nutanix or HCI Guru

When your IT systems are nearing capacity, you need to make decisions to expand provision, and many of those decisions will revolve around the choices you make to scale up vs scale out. For many the decision is intrinsically linked to their choice of platform and whether they are looking at cloud based, hybrid infrastructure or on-premises led strategies.

SIGKILL vs SIGTERM: A Developer's Guide to Process Termination

As a developer working with Linux systems, containers, or Kubernetes, it's crucial to understand process termination signals, particularly SIGKILL and SIGTERM. This comprehensive guide will explore these signals, their differences, and their implications in various environments. We'll delve into best practices, common scenarios, and advanced considerations to help you manage process termination effectively in your applications.

Be the first to know with StatusGator's Early Warning Signals

We are excited to share that our Early Warning Signals feature, previously in beta, is now fully available to all StatusGator users on all plans. This long-awaited feature ensures you never miss a beat and keeps you informed of outages before a provider publicly acknowledges them on their status page. Since its beta launch, this feature has successfully detected multiple service outages before they were officially acknowledged by each provider.

Announcing Lumigo's New Multiple Dashboards Functionality

In today’s complex cloud-native environments, observability is key to maintaining performance, reliability, and scalability. However, different teams often need to focus on different aspects of the system. Developers might be more interested in error rates and response times, while operations teams must monitor system health and resource utilization. Lumigo now supports multiple dashboards, so you can provide each team with the information they need precisely how they need it.