Operations | Monitoring | ITSM | DevOps | Cloud

Blog

Chaos engineering + monitoring, part 1: Sensu + Gremlin

One of my earliest jobs was as an admin for an MSP. We'd routinely generate alerts that weren't actionable, lacked context, and for most of our customers, were considered noise. From a monitoring perspective, it was bad. Customers didn't trust in the alerts they received and often resorted to having some additional monitoring product installed on their systems. It's safe to say that our auto-generated tickets and emails were largely ignored.

How to inspire exceptional contributions to your open-source project

Netdata must be doing something right when it comes to inspiring contributions. Our open-source, distributed monitoring agent has on GitHub and has seen contributions from hundreds of people: . We’ve even hired a handful of our contributors to work full-time on making the Netdata ecosystem even more powerful. The community is passionate about what we’re building, and they’re actively interested in making it work better for their particular needs.

Apache Tomcat Monitoring with ELK and Logz.io

Apache Tomcat is the most popular application server for serving Java applications. Widely-used, mature and well documented, Tomcat can probably be defined as the de-facto industry standard. Some sources put Tomcat’s market share at over 60%! Tomcat is particularly popular for serving smaller applications since it doesn’t require the full Java EE platform. It consumes a relatively small amount of resources and provides users with simpler admin features.

2019 PHP Monitoring Options

There is no denying the popularity of PHP. It has been a constant force in the web development world since its release way back in 1995. And now in 2019, thanks to Laravel, it is still going as strong as ever! Here at Scout, recently we have been working hard on providing a PHP performance monitoring agent to sit alongside our existing ruby, python and elixir agents.

How To Track Timeouts In Honeybadger

The other day a long-time customer wrote in with a problem. They use Honeybadger to monitor their Ruby apps for exceptions but were having trouble catching timeouts. If their app took too long to respond, their application server, Puma, would abort the request. The only insight their team had into this problem was through Puma's logs. Most people consider timeouts to be a kind of error, so it'd be nice to have them reported by Honeybadger like any other errors.

Pro Tips: How to Decrease MTTR and Increase Uptime with Grafana and VictorOps

We can sift through oceans of data. Alert on predetermined parameters. Deliver multiple commits a day. But as organizations leverage these layered, complex monitoring systems, “we also have to start practicing observability to enrich the actions that we take to solve problems as they occur and drive continual improvement,” said VictorOps Product Marketing Manager Melanie Postma. VictorOps is one tool that can help accomplish that.

Log Aggregation 101: A Complete Guide, from How It Works to the Tools You Must Know about

Every developer’s worst nightmare is having to dig through a huge log file trying to pinpoint problems. The troubleshooting most likely won’t stop there. They’ll either have to follow the trail to multiple other log files and possibly on other servers. The log files may even be in different formats. This may go on until one loses themselves completely. Log aggregation is what you need to stop this seemingly never-ending cycle.

Five reasons to choose Log360, part 2: Multi-environment support

In the previous post of this series, we looked at how easy it is to get Log360 up and running due to its various deployment features and easy-to-use UI. Today, we’ll dive into the solution’s wide range of support for event sources across multiple environments. Servers and workstations. With Log360, you can easily go deep into the events occurring on all Windows, Unix/Linux, and IBM servers and workstations in your network.

Pro Tips: How Amgen Manages On Calls (and Burnout) with Grafana

There is a lot of talk about graphing all the things, but have you ever considered graphing all the people – in particular their on calls – as well? “Not letting people burnout on call is something that is being talked about in the industry,” said Jordan J. Hamel, Design Engineer at the biotech company Amgen.