Operations | Monitoring | ITSM | DevOps | Cloud

Why Every Engineering Team Should Embrace AWS Graviton4

Two years ago, we shared our experiences with adopting AWS Graviton3 and our enthusiasm for the future of AWS Graviton and Arm. Once again, we're privileged to share our experiences as a launch customer of the Amazon EC2 R8g instances powered by AWS Graviton4, the newest generation of AWS Graviton processors. This blog elaborates our Graviton4 preview results including detailed performance data. We've since scaled up our Graviton4 tests with no visible impact to our customers.

Intelligent Health Checks: one-click observability for reliability tests

Reliability testing and observability are similar in one important way: engineering teams know they should be doing it, but they’re not sure how to start, or they don’t have the right resources, or they need to focus on competing priorities like feature development and incident response. In an ideal world, reliability and observability would be automated processes that configure, monitor, and run themselves.

Back to the basics with hybrid infrastructure monitoring

Managing IT environments can be challenging, especially with the growing complexity of hybrid infrastructures. These interconnected technologies, including servers, routers, storage arrays, and software-defined elements running in both data centers and cloud environments, require robust infrastructure monitoring.

Distributed Systems Monitoring: the Four Golden Signals

We recently published the IT Topic “IT System Monitoring: advanced solutions for total visibility and security”, in which we present how advanced solutions for IT system monitoring optimize performance, improve security and reduce alert noise with AI and machine learning. We also mentioned that there are four golden signals that IT systems monitoring should focus on.

5 Ways to Make Kubernetes Auditing an Effective Habit

Kubernetes has several components that produce logs and events containing information on everything that has happened in a Kubernetes cluster. Keeping track of all this data becomes extremely challenging when you run Kubernetes at a very large scale. With so many components generating logs, organizations need a centralized place to see it all. But this is only half your problem. You also need to correlate logs coming from different components to draw the right conclusions and take effective actions.

Digital transformation and cost savings: How AI benefits Australian SMEs to enhance digital experience

Small and medium-sized enterprises (SMEs) play a crucial role in Australia's economy. Despite this, they face significant challenges in the current economic climate, including rising costs, higher interest rates, and the need to stay competitive in a rapidly-evolving digital market. For these businesses, cutting expenses is the top priority, closely followed by enhancing the digital customer experience.

The importance of end user experience monitoring

In 2024, customer experience will be the biggest driver of success. While the business world glances at the financial horizon with worried eyes, finding ways to retain users, capture new leads, and create meaningful, long-lasting brands is more critical than ever. According to Forrester, the ROI of customer experience is 9,900%. For most businesses, the value of user experience is apparent—lower costs, improved loyalty, higher satisfaction, and a higher overall LTV.

Incident Response Automation: How It Works & Best Practices

It's 2 a.m. and your engineering team is sound asleep when suddenly a barrage of alerts start flooding in. A critical service is down and customers are complaining. Your developers scramble to sift through the noise, identify the root cause, and fix the issue—all while racing against the clock to meet tight SLOs.

Round Robin escalation policies: do's and don'ts

The concept of Round Robin comes from sports. And it has nothing to do with anyone called Robin, but the french word ruban (ribbon). In a Round Robin tournament, all participants face each other by taking turns. When applied to on-call schedules, a Round Robin escalation policy means that responders assigned to a level will take turns responding to alerts. When is this strategy useful and when isn’t?