Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

MSP Networking: Definitions, Insights and Strategies

Staying on top of client networks is no walk in the park for MSPs these days. With the growing reliance on cloud apps, distributed workforces, and constant technology changes, network environments have become exponentially more complex. Without rock-solid network management practices, disruptions and plummeting productivity become inevitable for clients. This makes it more crucial than ever for MSPs to level up their networking game.

What is an MSP? A Beginner's Guide to Outsourced IT

Picture a calm Tuesday morning at the office. You login to your desktop, ready to tackle the week, when suddenly the screen goes blank. The IT systems are down—your heartbeat just skipped a beat. Now picture an alternate scenario where before you even realize there’s a problem, it’s already being solved. This isn’t just wishful thinking—it’s the reality for businesses that partner with a Managed Service Provider (MSP).

How to deploy a Slack bot to allow anyone in your team to quickly raise major incidents on Zenduty

One of the biggest challenges for some of our customers was allowing non-engineering teams, such as Support, Sales, or Sustomer Success teams, to raise incidents for specific Dev/Infra/Security/Ops teams on Zenduty in a structured and efficient manner as soon as a customer reports an issue. In many organizations, we observed that non-technical team members often needed to switch between platforms, fill out complex forms, or reach out to multiple stakeholders manually to ensure that an issue is escalated.

Key learnings from the State of Cloud Costs study

We recently released our initial State of Cloud Costs report, which identified factors shaping the costs of hundreds of organizations that use Datadog Cloud Cost Management to monitor their AWS spend. The report reveals several widely applicable themes, including the ways in which resource utilization, adoption of emerging technologies, and participation in commitment-based discount programs all shape cloud environments and costs.

Attach Screenshots to Your Playwright Test Reports

Today I want to show you how you can attach your screenshots directly to Playwright's test reports. Imagine you have a simple Playwright test that navigates to Checkly. You take a screenshot and store it in screenshots/home.png. Then, you click a link in the main navigation, expect a specific heading to be visible, and take another screenshot. When you run this test using npx playwright test, the test passes, and you find the screenshots in the /screenshots directory.

Deliver Peak Microsoft Teams Performance at Scale

Scale is a perennial challenge for most IT teams. While organizations expect the same performance and experience whether 500 users are accessing essential applications or 50,000, IT headcount rarely increases in proportion with organizational growth. This often leaves IT departments overtaxed and pressed to triage the most urgent concerns. But even that requires good data to inform decisions — which can be in short supply.

Release Roundup August 2024

Over the past year, the Gremlin team has focused on giving you more tools to adapt Gremlin to your organization’s reliability needs. We started with customizable reliability tests, and now, we’ve released customizable role-based access controls (RBAC). We’ve also made it easier to target specific availability zones when running Failure Flags experiments, and to run experiments behind a proxy. Keep reading to learn more! ‍

Microsoft Recall AI: Feature Release Update and Cybersecurity Concerns

Microsoft Recall AI is generating buzz, but not all of it is positive. When the feature was first introduced, the tech community was filled with excitement. However, concerns about privacy and data security quickly surfaced. While artificial intelligence (AI) promises convenience, it often comes at a cost—and in this case, that cost could be your privacy. As we look toward a future where AI powers more of our digital experiences, opinions are divided.

CVE-2024-21410: Ensuring Secure Firmware Updates in Industrial Devices

Security vulnerabilities are a serious issue for any organization. Even a single unpatched flaw can lead to disastrous consequences, including data breaches and loss of system integrity. CVE-2024-21410 is one such vulnerability that presents a significant risk. Found in a popular application used by many organizations, this flaw can leave systems exposed to attacks if not addressed promptly.

Better root cause analysis: Mastering alert insights with the new central history timeline

A year ago we rebuilt our alert rule state history, using Grafana Loki for storage and updating the UI to display a timeline of all state changes of an alert rule. As a result, users can now conduct better root cause analysis by going down to the level of an alert rule and seeing when certain alert instances started or stopped firing. But we aren’t stopping there. To ensure system stability and avert outages, you also need one place to see the state history for all the alerts in your system.