Operations | Monitoring | ITSM | DevOps | Cloud

Monitoring compute infrastructure with the Cloud Ops Agent

How can you improve observability for workloads that use compute infrastructure directly and run on Google Compute Engine instances? In this episode of Engineering for Reliability, we show how you can use the Cloud Operations agent to do just that. Watch to learn about the Cloud Operations Agent, how to install it manually and automatically, and how to use the data it collects to improve the reliability of your services - and keep your users happy!

Face Detection using Robotic Data Automation(RDA) in 4 mins

Perform real-time face detection through a webcam/recorded video using Robotic Data Automation(RDA) in 4 mins. This method is so quick that we can start getting real-time reports on the faces of every person/object in the video without much performance overhead. The best part is it's a low code tool. Intriguing enough?

Tips for designing distributed systems

With companies expecting software products to handle constantly increasing volumes of requests and network bandwidth use, apps must be primed for scale. If you need resilient, resource-conserving systems with rapid delivery, it is time to design a distributed system. To successfully architect a heterogeneous, secure, fault-tolerant, and efficient distributed system, you need conscientiousness and some level of experience.

Monitoring PostgreSQL With pgmetrics and pgDash

I am currently trialing pgmetrics and pgDash for monitoring PostgreSQL databases. Here are my notes on it. pgmetrics is a command-line tool you point at a PostgreSQL cluster and it spits out statistics and diagnostics in a text or JSON format. It is a standalone binary written in Go, and it is open source. Here is a sample pgmetrics report. Rapidloop, the company that develops pgmetrics, also runs pgDash – a web service that collects reports generated by pgmetrics and displays them in a web UI.

Unexpected Parallels Between Yoga and Observability

Yoga is to ideal human health what observability is to an application’s ideal functioning. It is well established that observability is a critical factor for the successful implementation and maintenance of cloud-native, serverless, cloud-agnostic, and microservices-based applications. Well-established observability helps DevOps and development teams cross the boundaries of complex systems and get complete visibility into their functioning.

Getting started with Memory attacks

Memory (or RAM, short for random-access memory) is a critical computing resource that stores temporary data on a system. Memory is a finite resource, and the amount of memory available determines the number and complexity of processes that can run on the system. Running out of RAM can cause significant problems such as system-wide lockups, terminated processes, and increased disk activity. Understanding how and when these issues can happen is vital to creating stable and resilient systems.

How to integrate security checks into your deployment workflow

As software applications grow in scale and complexity, the surface areas for security vulnerabilities and exploits grow with it. Modern development practices include large amounts of code reuse. First, in the form of language-specific standard libraries such as the C++ STL, the Golang standard library, and Microsoft.NET. Second, in the form of open-source libraries found on places like Github. Much of this code is built using other libraries, introducing a web of dependencies into modern software.

DevSecOps: Collaborate Confidently with Open Source Tools

There is a Cambrian explosion currently underway in the collaboration tools space. The exponential rise in remote working as a result of naturally evolving workplaces and aided by the recent pandemic has created an opportunity for lots of different collaboration tools to take center stage. As our collaboration tools improve, work that would have been nearly impossible to do remotely is becoming more and more common.