VMware has recently released vSphere 7 Update 2, and there is a lot of new stuff to look out for. vSphere, VMware’s server virtualization product, has been an industry favorite for a long time. The vSphere 7 came out in April 2020, and this is so far the second update to it, hence the name. When you look at the changes they’ve rolled out, you’ll know that they are really focusing on some key areas. As a result, VMware infrastructure is getting pretty solid and modern.
As applications move from monolithic architectures to microservices-based architectures, DevOps and Site Reliability Engineering (SRE) teams face new operational challenges. Microservices are updated constantly with new features and resource managers/schedulers (like Kubernetes and GKE) can add/remove containers in response to changing workloads. The old way of creating alerts based on learned behaviors of your monolithic applications will not work with microservices applications.
Log management stopped being a very simple operation quite some time ago. Long gone are the “good old days” when you could log into the machine, check the logs, and grep for the interesting parts. Right now things are better. With the observability tools that are now a part of our everyday lives, we can easily troubleshoot without the need to connect to servers at all. With the right tools, we can even predict potential issues and be alerted at the same time an incident happens.
Since we launched Grafana Enterprise Metrics (GEM), our self-hosted Prometheus service, last year, we’ve seen customers run it at great scale. We have clusters with more than 100 million metrics, and GEM’s new scalable compactor can handle an estimated 650 million active series. Still, we wanted to run performance tests that would more definitively show GEM’s horizontal scalability and allow us to get more accurate TCO estimates.
Gremlin helps teams proactively improve the reliability of their systems by running chaos experiments on infrastructure including hosts, containers, and Kubernetes clusters. But as microservice-based architectures and automated cloud platforms become the norm, engineers are shifting their focus from managing infrastructure to managing services. In order to keep these services as resilient as possible, they need tools that can help them find failure modes, reduce incidents, and improve availability.
No one likes giving their weekends up to fix release issues. Developers and operations teams are traditionally hesitant to make changes or deploy applications on a Friday, in case something goes wrong and they have to spend their weekend making emergency fixes. Or worse, trying to roll back changes that were made. However, with a strong set of practices and a reliable deployment pipeline, there should be no reason why a deployment cannot happen anytime — even on a Friday afternoon.
GitOps is growing in popularity. You’ve probably seen it mentioned on Reddit or dev.to. But what the heck is GitOps? Broadly speaking, GitOps takes the principles of Git and CI-powered workflows favored by software developers — commonly used to automate the process of building, testing and deploying software — and applies them to other business processes.