Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Disaster recovery in AWS, GCP and Azure - thoughts on capacity planning and risks

One of the most popular cloud disaster recovery models in the industry today is the “pilot light” model where critical applications and data are in already place so that it can be quickly retrieved if needed. A simple question one must ask before adopting this model is what thought has been given to whether the AWS/GCP/Azure APIs will work and if the requisite capacity will be available in the alternate region.

Prometheus for multi-cluster setups

This tip is for those who are using Prometheus federation to monitor multiple clusters. How should alertmanager be configured for multiple clusters? Let us say that if there’s an issue for Cluster A it only needs to send an alert for cluster A? In such cases, every alert should be routed to proper team based on labels (if there is problem with application A on cluster B - team responsible should be notified). In the above case, two alerts are triggered by the same rule.

Continuous Integration & Delivery @ Moogsoft: GitLab and Jenkins Integration

Introduction One of the SRE team’s goals at Moogsoft is to make sure our feature teams have an easy path from local code changes to production. Changes rolling out to production mean new features, bug fixes, optimizations, and more, which translates into value added for our customers. In short, at Moogsoft we are all about making sure our product is continually evolving, and one way the SRE group helps is by building shared Jenkins functionality our engineers understand and can use quickly.

Curtail security exploits in applications and fortify your remote endpoints

The trend of working from home has hit the ground running, and businesses have turned to strategies and tools that will ensure a no-plummet productive environment. There are two major forks in the road when it comes to provisioning remote endpoints—users can use their own devices, or the company can hand over corporate-owned devices.

Monitor Carbon Black Defense logs with Datadog

Creating security policies for the devices connected to your network is critical to ensuring that company data is safe. This is especially true as companies adopt a bring-your-own-device model and allow more personal phones, tablets, and laptops to connect to internal services. These devices, or endpoints, introduce unique vulnerabilities that can expose sensitive data if they are not monitored.

SRE Leaders Panel: Work as Done vs Work as Imagined

Blameless recently had the privilege of hosting some fantastic leaders in the SRE and resilience community for a panel discussion. Our panelists discussed the effects of imposter syndrome especially during high tempo situations, how to use it to our advantage and overcome doubt, and how culture directly affects the availability of our systems. The transcript below has been lightly edited, and if you’re interested in watching the full panel, you can do so here.

Deploy a Rancher Cluster with GitLab CI and Terraform

In today’s ever-changing world of DevOps, it is essential to follow best practices. That goes for security, access control, resource limits, etc. One of the most important things in the world of DevOps is continuous integration and continuous delivery, or CI/CD. Continuous integration is a crucial part of an efficient deployment. We are all guilty of repeating manual steps over and over again – especially when it comes to node configuration.

10 Best Practices For Guaranteed ITOM Success

ITOM or IT Operations Management is an umbrella term that covers all activities involved in the setup, design, configuration, deployment, and maintenance of the infrastructure that supports business services in an organization. Simply put, ITOM is how the IT landscape is managed in your company. From network security, configuration, and monitoring to devices, applications, and personnel, ITOM is what keeps your IT going. Generally, ITOM leverages several tools to manage these activities individually.

7 Configurations to Enhance the Performance of Your Java Web Applications

There has been a lingering perception that Java applications are slower than applications written in other languages. So, if performance is important for your application, you should not be considering Java as the programming language to use. This perception was true about 20 years ago, when Java was initially used for developing applications. In the early Java implementations, it took a long time for the Java Virtual Machine (JVM) to start.

Kublr 1.18 Supports in-Place Platform Upgrades and External Clusters

We are excited to announce in-place Kublr Platform upgrades and a technical preview for external cluster support. That’s yet another step in making enterprise-grade Kubernetes adoption a breeze. While Kublr supports automated rolling cluster updates and upgrades with zero downtime, since our last release (1.17) updating the platform itself was still a semi-manual project supported by the Kublr team. Now, all it takes is the click of a button.