Operations | Monitoring | ITSM | DevOps | Cloud

Microsoft's 3 major incidents in 10 days, where did they go wrong?

Just in case you haven’t heard, last week Microsoft experienced a huge outage that prevented users from accessing its Office 365 cloud-based subscription service which serves 200 million active monthly users. This latest outage was the third in ten days, causing the company to receive a deluge of customer complaints about a 'something went wrong' message that popped up when they tried to access their accounts.

Event Log Management for Security and Compliance

Security log management is the process of collecting, storing, and correlating the network data that details all activity in your systems and networks. Every action in an organization’s network generates event data, including records produced by operating systems, applications, devices, and users. The Center for Internet Security (CIS) identifies log management as a basic control for detecting malicious actors and software hiding in networks and on machines.

Dangers of Console-Driven Development

AWS offers the ability to login to a web UI dashboard. In this dashboard, you can add, edit, and deploy various cloud resources. When I was first getting started with AWS, this is where I began for two reasons: My very first full-time job as a Software Engineer was on a small enough team that all of our infrastructure was setup using the AWS console in a single AWS account.

IT just got smarter

Every company in the world needs to reduce risk and uncertainty in its IT operations, ITOM. The best way to do that is by combining AI and digital workflows. It’s all about applying machine learning to operational data so that you can generate insights about potential system issues, and then launch automated workflows that resolve problems fast—ideally, before they impact customers. Today’s partnership announcement between IBM and ServiceNow is great news for enterprise customers.

ServiceNow adds Health and Safety Testing app to Safe Workplace suite

At the start of the COVID pandemic, the business world shifted to remote work almost overnight, triggering a new wave of digital transformation. According to a recent study by ServiceNow in partnership with Wakefield, this was a welcome change – 87% of employees believe their company created new and better ways of working.

Integrating TA-Nix with Splunk App for Infrastructure

Previous articles in our series have introduced the Splunk App for Infrastructure (SAI) and provided getting-started guidance for Linux and Windows using native metric-collection tools such as collectd and perfmon. But did you know you can also use your existing Splunk Universal Forwarders (UF’s), together with the Splunk Add-on for Unix and Linux (TA-Nix) to send both the metrics and logs without the need of additional agents?

ManageEngine Positioned in the 2020 Gartner Magic Quadrant for IT Service Management Tools

ServiceDesk Plus, ManageEngine’s flagship IT service management (ITSM) solution, has been named a Niche Player in this year’s Gartner Magic Quadrant for IT Service Management Tools. The Magic Quadrant offers insights for organizations shopping for an ITSM tool for their business needs. The research names some vendors based on strict inclusion criteria that includes vision, capability, size of operations, number of large customers, and more.

Jaeger Essentials: Distributed Tracing from Dapper to Jaeger

If you are dealing with microservices, serverless architecture, on any other type of distributed architecture, you have probably heard the term “Distributed Tracing.” You may have been wondering what it’s all about, and where should you start, in this post, I’ll tell you about the journey we passed at Duda, from the day we heard about distributed tracing and started to explore whether it will be useful to use it in our company, to the exploration on what is distributed tracing a