Operations | Monitoring | ITSM | DevOps | Cloud

RESOLVE '22: Expert predictions for AIOps 2022-2025

BigPanda’s RESOLVE ‘22 conference hosted a number of luminaries in the AIOps and IT Ops world, so naturally we needed to get their thoughts on the future of the market and where they see AIOps going in the next few years. Our guests for the session titled Expert predictions for AIOps 2022-2025 were from the press, investor community, analyst community and vendor world.

Open-source storage for beginners with Ceph

Modern organisations have become reliant on their IT capabilities, and at the heart of that infrastructure is a growing need to store data. Be it transactional databases, file shares, or burgeoning data lakes for business analytics. Traditionally, storage needs have been catered to by big iron hardware vendors, but over the last decade, more and more organisations have turned to open-source solutions such as Ceph running on commodity hardware.

How to Perform Geolocation Testing to Ensure Your Website Works Globally

So, you have launched a website intending to reach a worldwide audience? If you're running a business, this could be the first step to growing your brand. But is your website really ready to go global? After all, just because your website works for a user in the United States doesn't mean it will be accessible to a user in Japan. For one, not everyone speaks the same language. Does your website offer translation for users visiting from different global locations?

Configuring an OpenTelemetry Collector to connect to BindPlane OP

Bindplane OP is the first open source, vendor-agnostic, agent and pipeline management tool. It makes it easy to deploy, configure, and manage agents on thousands of sources, and ship metrics, logs, and traces to any destination. This blog shows you how to configure an existing OpenTelemetry Collector from any source to connect to Bindplane OP without needing to remove or reinstall the collector.

What is Kubernetes CrashLoopBackOff? And how to fix it

CrashLoopBackOff is a Kubernetes state representing a restart loop that is happening in a Pod: a container in the Pod is started, but crashes and is then restarted, over and over again. Kubernetes will wait an increasing back-off time between restarts to give you a chance to fix the error. As such, CrashLoopBackOff is not an error on itself, but indicates that there’s an error happening that prevents a Pod from starting properly.

An Introduction to PromQL: How to Write Simple Queries

PromQL is a flexible language designed to make it easy for users to perform ad-hoc queries against their data. By default, Prometheus indexes all of the fields in each metric except for source and target, which are not indexed by default. Prometheus is an open-source tool that lets you monitor Kubernetes clusters and applications. It collects data from monitoring targets by scraping metrics HTTP endpoints.

Using StatusPage at squadcast | SRE Best practices | Squadcast

Let your customers know how your Services are doing, without them having to ask you about it. One of the core principles of SRE is Transparency and Status Pages help you communicate the status of your Services to your customers at all times, as opposed to you getting to know the status of your Services through support tickets logged by your customers.

New in Grafana Alerting: File provisioning

We are happy to announce that file provisioning for Grafana Alerting has arrived in Grafana 9.1. This feature enables you to configure your whole alerting stack using files on disk, as you may already do with data sources or dashboards. The Terraform Grafana provider has also been updated to allow the provisioning of Grafana Alerting resources.

What are Canary Deployments and Why are they Important?

Every modification to software comes with the potential for production problems. Application failures often have serious consequences which can result in a loss of revenue and a poor customer experience. Additionally, organizations constantly try to improve their services for a better customer experience. How can you minimize the chance of error and update your application with confidence?