Operations | Monitoring | ITSM | DevOps | Cloud

%term

EKS vs Cycle: Comparing Worker Nodes

Over the last few weeks I've been talking about the key differences between Amazon EKS and Cycle. If you happened to miss it and want to catch up before diving into this post you can check it out here: This post will round out the series by taking a look at how worker nodes are added to a cluster and the major differences between EKS and Cycle there.

How to load-balance across multiple availability zones for improved redundancy

Load balancers are some of the most important load-bearing (pun intended) components in cloud environments. They perform multiple critical tasks: network switching, packet inspection, and of course, routing. Most cloud-based load balancers focus on load balancing within a single zone, but what if you have resources spread across multiple zones?

Python Flask instrumentation using OpenTelemetry | SigNoz

In this video, you will learn how to instrument your Python Flask application using OpenTelemetry and monitor your trace data in SigNoz. Link to Document used in this video More about SigNoz: SigNoz - Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Backed by Y Combinator. SigNoz helps developers monitor applications and troubleshoot problems in their deployed applications. SigNoz uses distributed tracing to gain visibility into your software stack.

Complete Guide to Azure VM: Pricing Models, Types & More

Trying to find the best virtual machine on the market that gives you the flexibility of easy scalability and the promise of a secure network – and doesn’t cost an arm and a leg (and maybe another arm)? Azure VM is likely the best solution for you… assuming you can project costs correctly. However, Azure doesn’t make it easy with its different offerings and pricing models.

How LogicMonitor and Amazon Bedrock Accelerate Generative AI Initiatives

Enterprise generative artificial intelligence (GenAI) projects are gaining traction as organizations seek ways to stay competitive and deliver benefits for their customers. According to McKinsey, scaling these initiatives is challenging due to the required workflow changes. With AI adoption on the rise across industries, the need for robust monitoring and observability solutions has never been greater.

Beyond RPA: Your Checklist to Unlocking the Full Potential of Automation

Robotic process automation (RPA) initiatives often falter because enterprise application leaders choose use cases that involve intricate business logic, extended workflows, and require significant orchestration or additional technologies to deliver business value. In essence, RPA is not well-suited for complex workflows, lengthy processes, and those requiring extensive coordination.

Introducing Toto: A state-of-the-art time series foundation model by Datadog

Foundation models, or large AI models, are the driving force behind the advancement of generative AI applications that cover an ever-growing list of use cases including chatbots, code completion, autonomous agents, image generation and more. However, when it comes to understanding observability metrics, current large language models (LLMs) are not optimal.

How to run fault injection tests on AWS managed services

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Fully-managed SaaS services offer incredible scalability and accessibility, but at a cost: they’re also single points of failure. If your application depends on a SaaS service and the service fails, guess who your customers will blame? We need to design applications to anticipate and work around managed service failures, but how do we do that without having to wait for the service to fail?

Spend a little time on software reliability now instead of a lot of time later

You're going to spend time fixing reliability—but it's your choice whether it's during an outage or ahead of time on your schedule and for less costs. Which will you choose? "We all know when things go wrong, it cost us a million dollars and it was really bad. Let's have that never happen again. But when we say, I need every engineering team to spend one hour, one day a week on reliability, does everyone lose their mind, or is that a reasonable request? Can we amortize out the cost of that?

Best Practices for Seamless Hybrid Meetings

Hybrid work is very much here to stay. But in spite of that, many companies still haven’t come around to the idea, meaning the tools and systems offered aren’t always used to best effect. This leaves these businesses unprepared to embrace the hybrid world of work we now all find ourselves in. Take hybrid meetings for example – which combine in-person and remote participation, using digital tools to facilitate communication and collaboration across various locations.