Operations | Monitoring | ITSM | DevOps | Cloud

Launching RMM Central: A unified IT solution for managed service providers

We’re pleased to introduce ManageEngine RMM Central, a unified remote monitoring and management solution. Maintaining the IT infrastructure and systems of client networks is a herculean task for IT service providers. Multiple tools perform various capabilities in network management, be it maintaining or managing workstations, laptops, servers, and other networks.

Monitor cloud endpoint health with Datadog's cloud service autodetection

Your modern cloud-hosted applications rely on a number of key components—such as databases and load balancers—that are managed by the cloud provider. While these cloud resources can reduce the overhead of maintaining your own infrastructure, capturing and contextualizing monitoring data from services you don’t own can be difficult.

What's Changed in VMware vSphere 7 Update 2: All You Need to Know

VMware has recently released vSphere 7 Update 2, and there is a lot of new stuff to look out for. vSphere, VMware’s server virtualization product, has been an industry favorite for a long time. The vSphere 7 came out in April 2020, and this is so far the second update to it, hence the name. When you look at the changes they’ve rolled out, you’ll know that they are really focusing on some key areas. As a result, VMware infrastructure is getting pretty solid and modern.

GKE operations magic: From an alert to resolution in 5 steps

As applications move from monolithic architectures to microservices-based architectures, DevOps and Site Reliability Engineering (SRE) teams face new operational challenges. Microservices are updated constantly with new features and resource managers/schedulers (like Kubernetes and GKE) can add/remove containers in response to changing workloads. The old way of creating alerts based on learned behaviors of your monolithic applications will not work with microservices applications.

9 Best Cloud Logging Services for Log Management, Analysis, Monitoring & More [2021 Comparison]

Log management stopped being a very simple operation quite some time ago. Long gone are the “good old days” when you could log into the machine, check the logs, and grep for the interesting parts. Right now things are better. With the observability tools that are now a part of our everyday lives, we can easily troubleshoot without the need to connect to servers at all. With the right tools, we can even predict potential issues and be alerted at the same time an incident happens.

Benchmarking Grafana Enterprise Metrics for horizontally scaling Prometheus up to 500 million active series

Since we launched Grafana Enterprise Metrics (GEM), our self-hosted Prometheus service, last year, we’ve seen customers run it at great scale. We have clusters with more than 100 million metrics, and GEM’s new scalable compactor can handle an estimated 650 million active series. Still, we wanted to run performance tests that would more definitively show GEM’s horizontal scalability and allow us to get more accurate TCO estimates.

How our Field Teams' Productivity Skyrocketed with our New AIOps Studio

Lately, I have seen fewer call outs from our field teams to our solution engineering team, and I was wondering what could be the reason? Sometimes, our field engineers approach our solution engineering team with advanced requests for data analysis, running what-if scenarios and assessing the quality of data and what new value can be gleaned by combining related datasets.

Announcing Services Discovery for tracking and improving service reliability

Gremlin helps teams proactively improve the reliability of their systems by running chaos experiments on infrastructure including hosts, containers, and Kubernetes clusters. But as microservice-based architectures and automated cloud platforms become the norm, engineers are shifting their focus from managing infrastructure to managing services. In order to keep these services as resilient as possible, they need tools that can help them find failure modes, reduce incidents, and improve availability.

How Can Companies Integrate Ethical AI? | Splunk's Ram Sriharsha & Dr. Rumman Chowdhury

Organizations use AI to be more competitive, deliver better business outcomes and avoid falling behind. However, business leaders should know they pose their organizations’ serious risk if they do not comply with ethical standards. Leadership must enable teams to practice ethical business strategies, up-level talent strategy, and enable organizational resilience. Dr. Rumman Chowdhury and Ram Sriharsha, Head of Machine Learning at Splunk, discuss the challenges companies will face if they do not comply with ethical standards and how to solve for fairness and privacy.