Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Instant scale up for even the most dynamic ECS clusters

One of the key features of Ocean by Spot is a “headroom” feature, the ability to maintain a dynamic buffer of spare capacity for immediate scale-up. Ocean continuously predicts which workloads are most likely to require scale-up and adjusts headroom in line with this prediction to enable immediate scheduling of new tasks, without waiting for infrastructure provisioning. This shortens the time to execution for these workloads and dramatically speeds up the scale-up process.

COVID-19's Impact On Infrastructure Security

It’s no secret that COVID-19 is negatively impacting businesses of all sizes in a number of ways. Some more obvious than others. Unless you are in IT, you’re probably not thinking of how COVID-19 can affect the infrastructure security of your organization, but the truth is that as businesses make the tough decision to layoff employees in order to stay in business, basic security hygiene can easily be overlooked.

Optimizing Your Alerting Escalation Policy

Reacting to alerts can be a pain, however, there are ways to be proactive and decrease frustration concerning IT Alerting. Developing an alerting strategy saves IT Operations and Development teams time, money, and eliminates notifications from low priority alerts. Keep reading for more information on routing and escalation chains, fielding alerts, and how to communicate an alerting strategy to management.

Monitoring AWS Lambda with Prometheus and Sysdig

In this post, we will show how it’s easily possible to monitor AWS Lambda with Sysdig Monitor. By leveraging existing Prometheus ingestion with Sysdig, you will be able to monitor serverless services with a single-pane-of-glass approach, giving you the confidence to run these services in production.

Prometheus Metric Federation with Thanos

Prometheus is a CNCF graduated project for monitoring and alerting. It is one of the most widely used monitoring and alerting tools in the Kubernetes ecosystem. Rancher users can leverage Prometheus quickly by using the built-in monitoring stack. Prometheus stores its metrics as a time series database on the local disk. Prometheus local storage is limited by the size of the disk and amount of metrics it can retain.

Why Netdata picked VerneMQ

In 2019, the Netdata team already knew that a Netdata Cloud solution in the form of an online platform would greatly complement Netdata’s distributed monitoring by making it much easier to organize large infrastructures and by enabling new ways for teams to collaborate. The old node registry available at the time wasn’t enough for Netdata’s users. Building an online platform, even one that does not directly process users’ metrics, is challenging.

Monitoring + Automation: An Elusive Goal

Today’s monitoring investments align more often with automation than any other technology. Automation is one of the principal objectives of DevOps to reduce toil, i.e. manual work. This helps keep engineers happy and engaged, allowing for better scale in building and operating applications. Automation typically spans infrastructure and application technologies. The challenge is that many organizations just have too many automation tools.

Building Automated Monitoring with Icinga and iLert

How many servers can be managed by one system administrator? This question is pretty hard to answer since it depends decisively on the tasks that need to be operated. It is clear, however, that the amount of servers one engineer can manage has increased tremendously over the time, and is still growing. Public and private clouds, in combination with automation tools, enables us to automate many daily tasks. In a modern IT infrastructure almost everything can, and should, be automated.

How to Secure your WFH Environment

I am making a digital transformation during this novel work-from-home (WFH) era due to a COVID-19 quarantine. Many of you are going through the same and distractions abound while sharing a workspace with housemates, children, and pets. Moreover, we have to contend with an increased risk to cybersecurity, given recent attacks on work-related software such as Slack and Zoom.

Mattermost's QA journey with Rainforest and what we've learned so far

Here at Mattermost, our team of developers and quality assurance analysts are proud of what we build and work hard to ship a quality product on the 16th of each month. However, maintaining our high bar for quality month over month isn’t without its challenges!