Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Kubernetes Cost Management: Analyze Your Kubernetes Cost | CloudZero

The benefits for innovation with Kubernetes are clear: it can allow small teams to deliver more value, more rapidly. However, cost discussions around Kubernetes — and Kubernetes cost management — can be difficult. You have disposable and replaceable compute resources constantly coming and going, on a range of types of infrastructure. Yet at the end of the month, you just get a billing line item for EKS cost and a bunch of EC2 instances.

Achieving the Observability Imperative Requires AI

The shift to Observability Over the last six months, unified monitoring, log management, and event management vendors have reoriented their technology portfolios (often without any change to the underlying functionality) towards Observability. In so doing, a fair amount of confusion has been generated in the market.

The Future of Kubernetes on DevOps Radio

In this episode of DevOps Radio, Shipa’s CEO and Founder Bruno Andrade joins host Brian Dawson to discuss his thoughts on the future of Kubernetes. DevOps Radio is a CloudBees-sponsored podcast series. Hosting experts from around the industry, the show dives into what it takes to successfully develop, deliver and deploy software in today’s ever-changing business environment. From DevOps to Docker, each episode features real-world insights and a few stories, tips, industry scoop and more.

How to build your own incident management process

IT incident management is a fundamental operational process designed to ensure rapid service restoration. This process is typically assigned to the help desk but is also very much entrenched in the day-to-day of DevOps. When incident management goes right, service is restored quickly and the impact on productivity, continuity, and customer satisfaction is minimal.

Delivering Agile Kubernetes Ingress Services for VMware Tanzu

VMware Tanzu eases the adoption of Kubernetes and supports modern applications with an automated application platform for container-based workloads. Since the application delivery components are among the most critical pieces of infrastructure needed to deliver enterprise-grade Kubernetes clusters, an ingress controller and services such as load balancing are typically deployed to enable external users to access the application.

7 Tips On Building And Maintaining An SRE Team In Your Company

In today's "always on" world, Reliability is a primary business KPI. Plant the culture of Reliability by implementing these 7 simple tips to build a solid SRE team in your organization. Many of today’s hottest jobs didn’t exist at the turn of the millennium. Social media managers, data scientists, and growth hackers were never heard of before. Another relatively new job role in demand is that of a Site Reliability Engineer or SRE. The profession is quite new.

Take the first step toward SRE with Cloud Operations Sandbox

At Google Cloud, we strive to bring Site Reliability Engineering (SRE) culture to our customers not only through training on organizational best practices, but also with the tools you need to run successful cloud services. Part and parcel of that is comprehensive observability tooling—logging, monitoring, tracing, profiling and debugging—which can help you troubleshoot production issues faster, increase release velocity and improve service reliability.

What Are AWS Savings Plans? How They Can Lower Your AWS Bill

It has been nearly a year since Amazon Web Services (AWS) first rolled out its new savings plans. AWS Savings Plans were created to help manage cloud costs; however, they are best used as part of a larger strategy focused around cloud cost intelligence. (Keep reading for more on that!)

Taming Operational Load with VMware CRE

Every engineering team must manage some level of operational load. But too much of it can get in the way of doing the important and engaging work that will make your organization—and your team—thrive. VMware Customer Reliability Engineering (CRE) is no different. We are a team of site reliability engineers and program managers who work together with Tanzu customers and partner teams to learn and apply reliability engineering practices using our Tanzu portfolio of services.

Taming the compliance beast: achieve efficiency & reliability at scale

Regulatory compliance is time-consuming and expensive. A recent survey of IT security professionals found that, on average, organizations must comply with 13 different regulations and spend an average of $3.5M annually on compliance activities, with audit-related activities consuming 232 person hours per year. With a team of five people, that adds up to 1.5 months a year devoted to audit-related activity. That’s a lot of hours that could have been spent on initiatives driving customer value.