Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

How to speed up incidents with a lot of cooks in the kitchen

In one of our recent webinars we discussed a substantial challenge IT Ops teams face in today’s complex IT environments: defining and clearly communicating incident/operational roles and processes, in an effort to create a well-coordinated incident management lifecycle. This lifecycle is essential for restoring service as quickly as possible when disruptions occur. Following are the highlights of that discussion, also recently published in an ApmDigest article.

ITOps In 2 Minutes | What is Product-Led Growth? | Michael Fisher

The OpsRamp IT operations management (ITOM) platform allows you to see everything in your hybrid IT environment, take the right action faster with integrated event and incident management and automate with confidence with AIOps. Learn more about our service-centric AIOps platform. With OpsRamp, you can detect and resolve incidents faster, understand resource dependencies and avoid costly performance issues that result in lost revenue and productivity.

Deploying applications to Kubernetes from your CI pipeline with Shipa

Kubernetes can bring a wide collection of advantages to a development organization. Properly using Kubernetes can significantly improve productivity, empower you to better utilize your cloud spend, and improve application stability and reliability. On the flip side, if you are not properly leverag Kubernetes, your would-be benefits become drawbacks. As a developer, this can become incredibly frustrating when your focus is on delivering quality code fast.

Continuous integration for a Bazel Android project

Bazel (pronounced like the tasty herb: “bay-zell”) is an universal build tool developed by Google. Some notable companies like Twitter and projects like the Android Open Source project have migrated to Bazel. In this tutorial, you will learn how to build a Bazel Android project and set it up for continuous integration with CircleCI. We will wrap up by automatically running tests and producing a binary APK file. In addition to the written guide there is a working sample project.

How to get mobile push notifications from Spike.sh

When an issue happens in your software in production, the channel to send the alert on depends on multiple factors. If it's a critical issue requiring immediate attention, you should alert the team member via phone call. But not all issues require a phone call, and in fact it may become annoying if your phone keeps ringing for minor issues. This is where other channels like SMS, Slack and mobile push notifications come in.

What Is Root Cause Analysis (RCA) and Why Do You Need It?

Imagine you have a hole in your car's tire. To fix it quickly and get on your way, you apply a patch. Then it happens again. You apply another patch. Before you know it, you're driving on the highway and you blow a tire. The risk was always there. You were simply hiding it because you didn't solve the problem. We see this often when it comes to IT issues. Teams take a band-aid approach to fixing problems without addressing the underlying causes.

More Changes Mean More Challenges for Troubleshooting

The widespread adoption of Agile methodologies in recent years has allowed organizations to significantly increase their ability to push out more high quality software. Previous development practices revolved heavily around centralized applications and infrequent updates that were shipped maybe once a quarter or even once a year.

What's New with JFrog Artifactory and Xray

Get the latest on self-hosted Docker rate limits, cutting through violation noise and new package type support. Without doubt, 2020 has been one of the most challenging years for everyone in recent history, but especially for those in the world of DevOps. JFrog has strived to continue developing and innovating at the same pace, to give our customers an even better end-to-end DevOps experience, and help customers maintain their drum-beat of on-time releases.

Why Your Mean Time to Repair (MTTR) Is Higher Than It Should Be

Mean time to repair (MTTR) is an essential metric that represents the average time it takes to repair and restore a component or system to functionality. It is a primary measurement of the maintainability of an organization’s systems, equipment, applications and infrastructure, as well as its efficiency in fixing that equipment when an IT incident occurs. Key challenges with MTTR arise from just trying to figure out that there is actually a problem.