Kubernetes allows DevOps teams to deploy containerized applications faster and makes managing containers at scale significantly easier. Obtaining visibility into these containerized applications is key to maximizing application/service performance and proactively preventing downtime.
Distributed denial of service (DDoS) is one of the most insidious types of digital attacks that can cause cloud outages. It is typically directed at a specific target, and if done competently, can bring an application or even all network traffic to its knees.
Site reliability engineering (SRE) is Google’s approach to service management where software engineers run production systems using a software engineering approach. It’s clear that Google is unique, and they usually need to tackle software bugs and errors in different and non-conventional ways. But having software engineers doing a job that is traditionally done by professionals with a systems administration background sounds impractical.
In part 1 of this series, we tried to outline what data retention is and why it is needed to overcome increasing requirements for various regulatory standards. As detailed, there are some clear guidelines for organizations to take what we called a “data retention approach for compliance”. In this follow up post, outline some specific technological and procedural challenges you might face as well as some practical guidelines and strategies to overcome them.
When operating always-on services, engineers need to quickly respond to alerts and prevent issues from becoming outages. Fortunately, many alerts can be resolved through easy changes to systems or network infrastructure. However, these tasks still require manual intervention and cause interruptions for on-call responders.
We’re very excited to announce that Ansible roles to deploy StackStorm have been promoted to major version 1.0.0!