Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Introducing the World's First CI/CD Live Debugging Tool

Today we’re announcing the first-ever solution for live debugging a CI/CD pipeline. You can now place stop/resume breakpoints and inspect live pipelines in the same way that developers debug their applications. It’s the easiest, fastest way to troubleshoot and fix bugs in complex pipelines. Live debugging is very well known to software developers and is one of the most efficient ways to find and fix bugs in application code.

#KUBE100: The Story So Far

On an unassuming afternoon on Wednesday the 18th September 2019, we launched a world’s first! A k3s-powered, managed Kubernetes service. We were in for a ride... Since we had been taking applications for a few weeks already, when it came to beta launch day there were no shortage of applicants. It was very encouraging to see there was plenty of buzz online already.

How to Avoid Unexpected AWS Data Transfer Costs

If your business depends on AWS cloud services, you’re probably familiar with the experience of having unexpected, hidden charges crop up on your monthly invoice. At times, the AWS cost structure can be quite opaque, making it difficult to accurately gauge in advance the costs of hosting a given application. One of the most common sources of unanticipated AWS charges is the service’s data transfer fees.

Two Stackery Superstars Named AWS Serverless Heroes

It’s been a great couple of months at Stackery. Since coming on as CEO earlier this year, I’ve been impressed with how much our team gets done and their contributions to making the serverless development experience easier and more reliable. I want to take a moment to recognize two of our amazing Stackerinos who were recognized by AWS recently with the AWS Serverless Hero distinction.

Building a Successful Uptime Management Strategy

Managing a cloud system properly entails numerous tasks (performance monitoring, response times, latency, uptime management, security, compliance, and disaster recovery). Together, these tasks form a comprehensive holistic strategy that embraces and accounts for all potential scenarios. Especially in today’s intensive business landscape, running cloud operations (NOC) 24/7 has become mandatory together with maintaining a high availability of network services.

Datadog's AWS re:Invent 2019 guide

AWS re:Invent is an annual gathering of tens of thousands of AWS staff, partners, and users for a full week of keynote sessions, feature announcements, customer case studies, hands-on workshops, and more. As in years past, we will be there with dozens of engineers, ready to answer your monitoring questions and show you the newest additions to Datadog.

Site Reliability Engineering-Why you should adopt SRE

Site reliability engineering was a term coined by Google engineer Benjamin Treynor in 2003 when he was tasked with making sure that Google services were reliable, secure and functional. He and his team eventually wrote the book on SRE which is available online for free for anyone interested in research and implementation of SRE best practices.

Introducing dark mode for Datadog

Datadog provides full visibility into your environment through a wide variety of features, ranging from host and container maps of your dynamic infrastructure to customizable dashboards that provide a unified view of every layer of your stack. And now we’re pleased to announce that you can enjoy these visualization features and the rest of the Datadog platform in dark mode.

Rancher Labs Industry Survey Shows Rapid Adoption of Containers and Kubernetes, But Challenges Remain

To get an accurate picture of the current state of Kubernetes deployments, Rancher Labs recently conducted an industry survey that included 1,106 respondents from large and small enterprises across a broad range of more than 25 industries, including technology, financial services, telecom, education, government and healthcare. The respondents were almost evenly split among EMEA and North America and included both Rancher and non-Rancher users.

Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

TL;DR: Fast-moving IT stacks see frequent, long and painful outages. Thousands of changes – planned, unplanned and shadow changes – are one of the main reasons behind this. Until now, IT Ops, NOC & DevOps teams didn’t have an easy way to get a real-time answer to the “What Changed?” question – the answer that can help reduce the duration of outages and incidents in these fast-moving IT stacks. Now, with BigPanda Root Cause Changes, they do.