Operations | Monitoring | ITSM | DevOps | Cloud

Testing a PyTorch machine learning model with pytest and CircleCI

PyTorch is an open-source machine learning (ML) framework that accelerates the path from research prototyping to production deployment. You can work with PyTorch using regular Python without delving into the underlying native C++ code. It contains a full toolkit for building production-worthy ML applications, including layers for deep neural networks, activation functions and optimizers. It also has associated libraries for computer vision and natural language processing.

What is a Kubernetes cluster mesh and what are the benefits?

Kubernetes is an excellent solution for building a flexible and scalable infrastructure to run dynamic workloads. However, as our cluster expands, we might face the inevitable situation of scaling and managing multiple clusters concurrently. This notion can introduce a lot of complexity for our day-to-day workload maintenance and adds difficulty to keep all our policies and services up to date in all environments.

Grafana Alerting: How to monitor alerts for better alert management

With the release of Grafana 10.2, we made a number of enhancements to Grafana Alerting. These updates included the rollout of Insights, a new section of the Grafana Alerting home page. Available now to all Grafana Cloud users, Insights offers valuable information, such as statistics on alert rules and notifications, to help you monitor alerting data and quickly analyze alert performance.

Top Data Center Management Trends to Watch in 2024

In the blink of an eye, 2023 has come to an end and the data center industry saw lots of movement towards sustainability, AI, and operational efficiency. Data center management is ever-changing and evolving, and it’s important to stay on top of the latest trends to guide you to success in the new year. With 2024 just days away, here are the top 10 emerging data center management trends that you should watch out for.

The Advent of Monitoring Day 1: What Are Synthetics and Why They Are Needed

This is the first part of our 12-day Advent of Monitoring series. In this series, Checkly's engineers will share practical monitoring tips from their own experience. Hey there! Here is my take on what synthetic monitoring means and why it’s awesome! I think it’s a very complicated word for a very straightforward concept. In fact, I am convinced, that once you've used it, you will never want to live without it.

Performance optimization techniques in time series databases: sync.Pool for CPU-bound operations

Internally, VictoriaMetrics makes heavy use of sync.Pool, a data structure built into Go’s standard library. sync.Pool is intended to store temporary, fungible objects for reuse to relieve pressure on the garbage collector. If you are familiar with free lists, you can think of sync.Pool as a data structure that allows you to implement them in a thread-safe way.

IT Automation Powers SRE Practices as System Complexity, Consumer Demands Grow

Site Reliability Engineers (SREs) use automation and orchestration capabilities to scale security and performance, ensuring sites are reliable and efficient. Site Reliability Engineering (SRE) can be applied to a wide range of use cases and industries, where software systems and services are critical to business operations.

Monitor your chaos engineering experiments with Steadybit's offering in the Datadog Marketplace

Steadybit is a software reliability platform that uses chaos engineering and fault injection to help organizations improve the stability and performance of their applications. By allowing customers to simulate turbulent scenarios in a controlled environment, Steadybit enables you to identify and mitigate potential system issues to reduce downtime and improve resilience.