Operations | Monitoring | ITSM | DevOps | Cloud

Alerts Are Fundamentally Messy

Good alerting hygiene consists of a few components: chasing down alert conditions, reflecting on incidents, and thinking of what makes a signal good or bad. The hope is that we can get our alerts to the stage where they will page us when they should, and they won’t when they shouldn’t. However, the reality of alerting in a socio-technical system must cater not only to the mess around the signal, but also to the longer term interpretation of alerts by people and automation acting on them.

NGINX Access and Error Logs

Nginx, a widely used web server and reverse proxy, maintains two crucial logs that provide valuable insights into its performance and user interactions: the access log and the error log. These logs play a pivotal role in monitoring and troubleshooting web server activities. The access log records every request made to the server, capturing details such as the requested URL, client's IP address, response status code, and user agent.

Finding relationships in your data with embeddings

With the world still working out the limits of LLMs and ever more powerful models being released each month, it’s a little hard to know where to begin. Whether it’s summarising and generating text, building a useful chat assistant, or comparing the relatedness of strings with embeddings, almost all of this now can be done via a few simple API calls. It has never been easier to incorporate these new technologies into your own product.

Coming Soon: Cloudsmith Migration Toolkit

One of our core motivations in building Cloudsmith is to make software developers' lives easier. We want Cloudsmith to be one of those great products that feels intuitive and automates everything. As we’re picking up more and larger customers, we’re seeing an increased need for migration tools. We want to make it as easy as possible for teams who are stuck using JFrog Artifactory, or Sonatype Nexus, or other legacy tools to move over to the joy of SaaS artifact management using Cloudsmith.

Partitioning Data for Query Performance in InfluxDB 3.0

Query performance is critical in any database. Data partitioning is a mechanism that helps prune unnecessary data, allowing queries to run faster. However, there are always trade-offs between large and small numbers of partitions. For instance, fine-grained partitioning on high cardinality columns can reduce performance. This post describes different partitioning schemes supported by InfluxDB 3.0 and explains their trade-offs.

5 Cloud Outages Tracker Tools To Monitor Vendors in 2024

Whether you’re a business owner, a tech enthusiast, or simply a user who relies on cloud services for daily tasks, the cloud outage tracker can be a useful tool. It informs you of downtime, degraded performance, and maintenance of services that modern businesses rely on. Here’s the list of cloud outage tracker tools that can help you prepare for and mitigate the effects of inevitable disruptions in the cloud.

Real Production Readiness with Internal Developer Portals

In cultures of continuous improvement, the criteria by which teams define a release's fitness for production is flexible by definition. Engineering organizations strive to balance risk and velocity, aiming for high quality releases on a cadence that doesn’t impede overall business throughput.

Understand & Optimize Your Telemetry Data (Subtitled)

The explosion of telemetry data also massively increases your data bill. Teams also cannot control the data they do not understand and often lack the capabilities to act on it once it is understood. Mezmo makes it easier to understand and optimize your data. It helps reduce unnecessary noise and cost, and improve the quality of your data, so that your developers and engineers can consistently deliver on their service level objectives.