Operations | Monitoring | ITSM | DevOps | Cloud

Resiliency is different on AWS: Here's how to manage it

There’s a common misconception about running workloads in the cloud: the cloud provider is responsible for reliability. After all, they’re hosting the infrastructure, services, and APIs. That leaves little else for their customers to manage, other than the workloads themselves…right?

Simplified routing in Grafana Alerting: Easy, secure, and powerful

With great power comes great… complexity? When we introduced Grafana Alerting a few years ago, it included a powerful routing feature that teams could use to send alerts to various contact points. Unfortunately, this functionality also came with a fair bit of complexity and an unfamiliar UX. This prevented many users from adopting it, but we’re still big believers in how it can help users.

New: Real-Time Remediation with Nexthink Flow's Event Trigger

Some issues can’t wait. When it comes to compliance or employee experience issues, time matters. Now with Nexthink Flow’s real-time event trigger, you can instantly trigger an automated workflow based off an event like an alert, employee login or application crash. When setting up a new workflow, you can select “Events” in the “Trigger” section and use a NQL query to identify the event to track.

How to Gain Visibility into Internet Performance

Continued cloud adoption is leading to an increasing reliance on internet services, and on a complex mix of external service providers and technologies to deliver those services. For network operations teams, these moves significantly reduce visibility into the performance of the underlying infrastructure that business services depend upon. In spite of this diminishing visibility and control, these teams remain responsible for network performance.

Six Tips to Reduce Noise in IT Operations

“We are drowning in noise all day long! Please help us!” -Every IT operations team Rich monitoring data is more important than ever for IT operations to manage the range of technology platforms and inter-connected systems the business runs on. One natural result of this is there are more signals and more noise that vie for operator attention.

Choosing the Right Opentelemetry Backend: Key Considerations

With applications becoming increasingly distributed and complex, gaining insights into their behavior and performance is essential for maintaining reliability and delivering exceptional user experiences. OpenTelemetry has emerged as a powerful framework for instrumenting applications to collect, process, and export telemetry data.

How to overcome common challenges in machine learning deployments

🚨 To read the full findings from this research, visit The Machine Learning State of Play 2024 white paper. Are the challenges of deploying machine learning (ML) overshadowing its true potential in the modern workplace? Through our recent white paper , we spoke to 500+ developers who have experience working with ML systems to gain an understanding of the pain points faced by developers when using ML solutions.

Introduction to Endpoint Management: Definition, Benefits, and Tools

Endpoint Management is so inherent to IT that it is canon in this industry, especially now that remote work is the new normal. Setting a robust system is paramount for any organization that relies on digital devices. These devices are connected to the corporate network and can access its resources, so the goal is to ensure that these devices are secure, compliant with company policies, and operating efficiently.