Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Your Guide to Observability Engineering in 2024

It may sound complicated and daunting, but so much of observability is about discovering the unknown unknowns in your critical systems. The capabilities of observability engineering can help you make those discoveries. Most organizations have some form of monitoring, alerting and troubleshooting, which can be adequate to a point but fall short when trying to determine the root cause of unexpected outages.

Ubuntu Security Notices now available in OSV format

Canonical is now issuing Ubuntu Security Notices (USNs) in the open source OSV format. Using the information provided, developers can identify known third-party, open source dependency vulnerabilities that pose a genuine risk to their application and its environment. This collaboration between Canonical and OSV aims to simplify vulnerability management and further enhance security for Ubuntu users.

The Importance of Observability for Healthcare Providers

The systems and data that healthcare providers utilize and process are fundamental to its successful operation. Therefore these organizations must invest in appropriate and powerful observability solutions that enable them to effectively monitor their systems and valuable data. These tools and solutions allow healthcare providers to securely manage, deliver, and ensure uptime for their entire IT infrastructure.

Mastering Centralized Logging with OpenSearch

For effective centralized logging, OpenSearch is a perfect solution as OpenSearch offers powerful querying and analysis capabilities, and it’s highly scalable and flexible. In this article, we will outline why you should use OpenSearch for centralized logging, before outlining how to easily configure centralized logging in OpenSearch.

Crisis Management for Oil and Gas Companies

Oil and gas companies operate in a high-stakes environment where the potential for catastrophic incidents, such as oil spills, explosions, and natural disasters always exists. These risks necessitate the establishment of robust crisis management for oil and gas companies to ensure the safety of their personnel and minimize potential damage to their operations and organizational reputation.

Reducing MTTR and the Hidden Costs of Downtime Through AI & Automation

Of all the KPIs that gauge the health and operational fitness of an enterprise, Mean Time to Repair (MTTR) from an outage or downtime is one of the most crucial. Yet while MTTR is a universally recognized metric, many organizations still fail to consider the total cost of MTTR when deciding where and how to invest in their IT environments.

How NaaS is revolutionising network management

Network-as-a-Service (NaaS) is revolutionising network management by helping businesses take greater control of the provisioning, payment and management of network services. In our on-demand webinar, “Unlocking the power of network automation”, Lisa Wright took a deeper look at how network automation, SDN (Software Defined Networking) and NaaS are shaping the future of networking and delivering value for enterprises.

INTEGRATE 2024 Day 2 Highlights

Dan Toomey, Senior Integration Architect at Deloitte Australia, kicked off the session by highlighting the essential role of business rules in software development. He emphasized the significance of managing evolving and complex business rules, advocating for the use of effective tools like Business Rules Management Systems (BRMS) to safeguard code and services.

Adding config to AWS ECS tasks

When deploying Docker containers to AWS ECS, you can encounter a situation where you want to run an image that requires some configuration. For example, let's say you wanted to run Vector1 as a sidecar to your main application so you can ship your application's metrics to a service like Honeybadger Insights. To run Vector, you only need to provide one configuration file (/etc/vector/vector.yaml) to the image available on Docker Hub.

AI-Assisted Incident Management Communication

‍ AI has revolutionized various aspects of incident response, from preparation to resolution. Across the incident response lifecycle, AI is being leveraged to streamline processes, reduce noise, and improve overall efficiency. One critical area where AI is making a significant impact is in incident communication. Effective and efficient communication is crucial during incidents, as it ensures that stakeholders are informed and aligned with the incident status and resolution efforts.