Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Helping Your Remote NOC Teams Work Better Together

In light of COVID-19 related office closures, one thing we’ve seen and heard repeatedly is the “abandoned NOC.” People that are responsible for finding, escalating and resolving problems in your infrastructure and applications quickly are now having to work very differently. Two-minute hallway conversations are replaced with time-consuming emails, Slack, and virtual calls.

Cybersecurity challenges of the work-from-home model

Just recently, the World Health Organization declared coronavirus a global pandemic. This decision brought with it several health and safety measures, and normal life came to a halt in many countries. This resulted in many organizations around the world adopting telecommuting methods to prevent the spread of COVID-19. While people are adjusting to the sudden changes in the way they work, cybercriminals are using this opportunity to exploit new vulnerabilities the work-from-home environment presents.

Performance Best Practices: Running and Monitoring Express.js in Production

What is the most important feature an Express.js application can have? Maybe using sockets for real-time chats or GraphQL instead of REST APIs? Come on, tell me. What’s the most amazing, sexy, and hyped feature you have in your Express.js application? Want to guess what mine is? Optimal performance with minimal downtime. If your users can’t use your application, what’s the point of fancy features?

Pro tips for making the most of your Datadog metrics in Grafana with the enterprise plugin

Hello again! We are Eldin and Christine – or, as our lovely editor has dubbed us, Regis and Kelly – jumping back in for another post. This week, to highlight the big tent and community theme, we are going to write about how our Datadog plugin allows you to “see it all in one place.” Datadog is a popular monitoring and analytics platform that allows you to easily install an agent so you can get started with collecting metrics right away.

Optimizing your alerts to reduce Alert Noise

Reducing alert fatigue starts from your monitoring platform - setting the right thresholds to trigger alerts and understanding which of these are essential to be sent into your on-call platform is a start. This post outlines some of the best practices that help you reduce alert noise and improve your on-call experience. The word noise implies something unpleasant and unwanted. You combine that with on-call and it adds a factor of annoyance to the already overwhelming process.

Running Google Cloud Containers with Rancher

Rancher is the enterprise computing platform to run Kubernetes on-premises, in the cloud and at the edge. It’s an excellent platform to get started with containers or for those who are struggling to scale up their Kubernetes operations in production. However, in a world increasingly dominated by public infrastructure providers like Google Cloud, it’s reasonable to ask how Rancher adds value to services like Google’s Kubernetes Engine (GKE).

Logz.io Infrastructure Monitoring: Grafana and Kibana are Better Together

In the midst of a complex and challenging global environment, I’m proud and excited to announce General Availability for Logz.io Infrastructure Monitoring, our new metrics monitoring and analytics solution based on Grafana. Additionally, we’re supporting Early Availability for our new Distributed Tracing offering powered by Jaeger. The release represents a huge next step in our mission to provide the best open source for observability as a fully managed, cost-effective cloud service.

Find and fix issues faster with our new Logs Viewer

Monitoring your cloud infrastructure is an essential part of making sure your operations are running smoothly. Since announcing the new Cloud Logging interface in February, we’ve heard from users that the new interface is making it faster and easier to meet logging needs, including troubleshooting issues, verifying deployments, and ensuring compliance. One of those users, Arne Claus, is a site reliability engineer at trivago, and has taken advantage of the new interface already.

Leveraging EC2 tagging for continuous optimization of containerized workloads

Ocean by Spot delivers a serverless container experience by managing the underlying cloud infrastructure. It automates the scale up/down and management of spot instances, reserved capacity and on-demand instances (as needed) within a cluster. Ocean accomplishes this with a fundamental construct called Launch Specification.