Operations | Monitoring | ITSM | DevOps | Cloud

Case Study

How to monitor Kubernetes with Grafana and Prometheus: Inside Powder's observability stack

David Calvert is a site reliability engineer working remotely from the south of France. He’s currently focused on observability, reliability, and security aspects of cloud infrastructure. You can find him as dotdc on GitHub and @0xDC_ on Twitter. Over the past three years, I’ve built and operated Kubernetes clusters for two different companies — the first one on-premises, and the second on a public cloud platform for my current job at Powder.

4 billion logs, 120 TB of data: How Just Eat Takeaway.com uses Grafana Cloud to scale

In 2017, Just Eat Takeaway.com (JET) was transitioning from a scrappy startup to a surging scaleup. With a global customer base and workforce, the food delivery marketplace’s front line teams needed to scale the real-time monitoring of the platform. Their initial efforts looked like “NASA’s mission control with Grafana dashboards,” said Senior Technology Manager Alex Murray.

How Lumigo helps StartingFinance run 100% serverless with 100% confidence

StartingFinance supports a community of 70,000 with a platform that provides time-critical financial and investment information. Running 100% serverless, StartingFinance relies on Lumigo to ensure high performing apps and has helped them to reduce error rate, down time, and improve their time to resolution. Make sure to subscribe so you don't miss out on any new livestreams and observability content!

Gartner IOCS Blog - Lucid Motors Case Study

Assaf Resnick, CEO and co-founder of BigPanda, sat down with Sanjay Chandra, vice president of information technology at luxury electric automaker Lucid Motors, at Gartner IT IOCS 2022. They discussed Lucid’s unique ITOps journey and how BigPanda helps minimize downtime of critical applications and services. Sanjay is a visionary ITOps leader, responsible for IT, enterprise systems, global infrastructure, operations and security at Lucid Motors.

How JPMorgan Chase uses Grafana and AI to monitor SLOs, SLIs, and more

For the team at JPMorgan Chase, the daily stakes of having a stable system are high. “We are in the business of making sure that trades are executed, and systems are stable and up and running for a positive client experience,” said Askari Imam, VP, Asset Wealth Management (Product and Integration Delivery).

How Tint Streamlines Infrastructure Automation and Meets Compliance Requirements

I recently had the opportunity to speak with Kevin Maschtaler, the Platform and Reliability lead at Tint, about their experience with Qovery. Tint began using Qovery in early 2022 to automate their infrastructure and support their team and customer growth. In this article, we will explore Tint's journey with Qovery, including how we continue to assist them with compliance in the testing and release process and how we help them save on cloud costs through our partnerships.

A Guide to Stack Overflow's Path to the Cloud

Like many companies, Stack Overflow is trying to get out of the business of running our architecture in our own data centers; instead, we want to offload some of the more mundane parts of system administration to a cloud service offering like Azure. I’m going to cut to the chase for the purpose of this article and concede we’ve already decided on Azure for the majority of our infrastructure and, most importantly to me, our databases.

Dos and Don'ts of Observability: Lessons Learned from RedMonk

On November 16, 2022, I sat down with analyst KellyAnn Fitzpatrick from RedMonk to discuss my favorite topic: observability. This time, we looked at observability in a context of what to do and what to avoid doing as you’re starting and going on an observability journey. Click the image below (or here) for a replay of the session. A machine-generated transcript is available at the end of the post.