Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Site Reliability Engineering: Definition, Principles & How It Differs From DevOps

Site crashes and outages can cost hundreds of thousands in lost revenue and inconvenience users. Site Reliability Engineering helps build highly reliable and scalable systems, particularly important for companies that depend on their software to support their customers performing critical operations. Hiring a Site Reliability Engineer is the best way to ensure a software system stays up and running at all times.

Top Trends of the AWS Summit London

This week, I played at home as the rest of the DX team and I were at the AWS Summit London. 🇬🇧 The event was buzzing with excitement and overflowing with great talks and impressive exhibition stands. It didn't disappoint in terms of attendance either! Despite covering a wide range of subjects, I couldn't help but notice three super-hot tech trends that were everywhere throughout the day. So, without further ado, let's dive right in and explore these topics!

Container Optimization Techniques that Cloud Providers Hate!

Watch this 20-min session to get an in-depth look at factors that must be considered when optimizing containers, and effective ways to optimize container density to make sure that they don’t get throttled or killed, while at the same time making sure that being safe doesn’t run up a huge cloud bill. We cover 3 critical factors that are key when optimizing.

Pipelines Full of Context: A GitLab CI/CD Journey

Do you know what version of your software is running in production? How often is that software deployed, and was it deployed right before last week’s p0 incident? What sort of dependencies are being deployed along with that software, and are any of them potential security risks? These are all common observability questions that may be difficult to answer.

How to Monitor a Heroku App with Graphite, Grafana and StatsD

This article explores the efficient monitoring of Heroku Apps using MetricFire's HostedGraphite plugin and Grafana dashboards. By combining these tools, developers can gain valuable insights into their app's performance and resource utilization. This guide provides step-by-step instructions on setting up MetricFire, integrating StatsD, and creating comprehensive Grafana dashboards for effective monitoring and debugging.

Technical deep-dive into a real-time kernel

Canonical announced the general availability of Ubuntu’s real-time kernel earlier this year. Since then, our community raised several questions regarding the workings of the kernel and tuning guidelines. We aim to provide answers in this and an upcoming follow-up post. Depending on your background knowledge, you may wish to start with the basics of preemption and a real-time system. In that case, this introductory webinar or our blog series on what is real-time Linux, is for you.

Throw custom exceptions in Logic Apps: Using an API Management (Part V)

Welcome to the fifth and last part of this series of blog posts on How to throw custom exceptions inside Logic Apps. In all those posts, we talk about the following: The last approach we want to address in this series is another out-of-the-box idea: using an API exposed in API Management to throw back the exception. This approach is similar to the previous one.

How To Allocate Cloud Costs After A Company Merger/Acquisition

If you’re going through a merger or an acquisition, you’ll soon have to assemble all the cloud account data from two or more organizations and blend it together to fit your new, larger organization. You may be bringing the companies together to form a new entity or simply absorbing one company into the other; the details don’t matter, because the solution is the same either way.