Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

What site reliability engineering (SRE) and how is it different from DevOps?

Site reliability engineering (SRE) is Google’s approach to service management where software engineers run production systems using a software engineering approach. It’s clear that Google is unique, and they usually need to tackle software bugs and errors in different and non-conventional ways. But having software engineers doing a job that is traditionally done by professionals with a systems administration background sounds impractical.

How to Stay Ahead of Data Retention Requirements - Part 2

In part 1 of this series, we tried to outline what data retention is and why it is needed to overcome increasing requirements for various regulatory standards. As detailed, there are some clear guidelines for organizations to take what we called a “data retention approach for compliance”. In this follow up post, outline some specific technological and procedural challenges you might face as well as some practical guidelines and strategies to overcome them.

Opsgenie Actions: Sign Up For Early Access

When operating always-on services, engineers need to quickly respond to alerts and prevent issues from becoming outages. Fortunately, many alerts can be resolved through easy changes to systems or network infrastructure. However, these tasks still require manual intervention and cause interruptions for on-call responders.

DNS Hijacking: What You Need to Know

Crashed websites and slow loading pages can be devastating for any site owner. But there’s another type of threat that often goes undetected. A report published by FireEye on Thursday details a particular type of DNS hijacking that allows hackers to easily steal information. These attacks have been going on for approximately two years and involve three different methods that compromise websites without alarming users.

The ominous opacity of the AWS bill - a cautionary tale

We were only in the first week of the month-long billing period for our client’s AWS account. Already, it showed that they had exceeded the free-tier limit for SQS and had nearly exceeded it for CloudWatch too (approximately 85 per cent used). This is puzzling, because we hadn't run any data downloads for the client at all. In fact, all services had been down since before Christmas when we shut it down to work on new server CloudFormation scripts.

3 Common Application Performance Bottlenecks-And How to Avoid Them

Between increasing website and application complexity, global internet user growth, and the rapid rise of mobile, optimizing application performance today is an entirely different animal than it was even a decade ago. Here’s how to side-step slow-downs and eliminate the most common application performance bottlenecks.

Top 8 Metrics for SharePoint Performance Monitoring

Microsoft SharePoint is one of the most business-critical services in enterprise IT. SharePoint is widely used by organizations to create websites where information is centrally stored, shared, organized and accessed by users from anywhere, any time and any device. Major use cases of SharePoint include knowledge and content management, intranet, file hosting and collaboration, and so on.

Light monitoring with Elastic Metricbeat

Since Pandora FMS version 7.0 NG 712, thanks to the integration with Elastic, we have drastically improved the system of storage and visualization of records or logs, which allows us to increase the growth and speed in the presentation of information. But wait, there is still more to talk about before I go on to describe Light Monitoring…