Operations | Monitoring | ITSM | DevOps | Cloud

Cloud

The latest News and Information on Cloud monitoring, security and related technologies.

Using Splunk to Detect Abuse of AWS Permanent and Temporary Credentials

Amazon Web Services provides its users with the ability to create temporary credentials via the use of AWS Security Token Service (AWS STS). These temporary credentials work pretty much in the same manner like permanent credentials created from AWS IAM Service. There are however two differences.

21 new ways we're improving observability with Cloud Ops

We’ve heard from customers about how important it is to be able to reliably operate your applications and infrastructure running on Google Cloud. In particular, observability is critical to reliable operations. To help you quickly gain insight into your Google Cloud environment, we’ve added 21 new features to Cloud Operations, the observability suite we launched earlier this year, which gives you access to all our operations capabilities directly from the Google Cloud Console.

How to Provision Cloud Infrastructure

One of the best things about cloud computing is how it converts technical efficiencies into cost-savings. Some of those efficiencies are just part of the tool kit, like pay-per-use Lambda jobs. Good DevOps brings a lot of savings to the cloud, as well. It can smooth out high-friction state management challenges. Sprucing up how you provision cloud services, for example, speeds up deployments. That’s where treating infrastructure the same as workflows from the rest of your codebase comes in.

Debugging AWS Lambda Timeouts

Some time ago, an ex-colleague of mine at DAZN received an alert through PagerDuty. There was a spike in error rate for one of the Lambda functions his team looks after. He jumped onto the AWS console right away and confirmed that there was indeed a problem. The next logical step was to check the logs to see what the problem was. But he found nothing. And so began an hour-long ghost hunt to find clues as to what was failing and why there were no error messages.

My first Kubernetes cluster: Amazon EKS review + tutorial

During my career, I’ve taken part in many on-call rotations and post-mortems. The longest on-call rotation I’ve ever had — no breaks, vacations, or holidays — lasted for a whopping 2.5 years at Lucid Software. I’m jaded. I strongly prefer stability to tinkering with shiny new toys. Very few software engineers start this way, but enough of them make the transition after having been bit enough times by a bad release.

Are Cloud Computing Engineers the Missing Link on Your Federal IT Team?

Cloud computing can be more complex than anticipated, particularly as agencies continue to move applications and operations into a cloud environment. Does your federal IT team have the in-house skills to ensure cloud computing is helping your agency rather than draining its money and resources?

Dashbird turns 3: reflecting on the journey, challenges and milestones of the past year

Another year of empowering DevOps teams has passed and what a year it’s been! I’d like to take a moment to reflect on the journey, the milestones and challenges this past year encompassed. The last year has been our most transformational to date. We’ve had a huge amount of ups and downs and I’m incredibly proud to say that we got through it and our organization is more resilient, more aligned in our vision and closer as a result.

Introducing, Dashbird Atlas

We’re pleased and honored to be part of the Serverless revolution - continuously innovating to make processes and day-to-day tasks for serverless users more efficient, seamless and enjoyable. So let’s get right into the new and exciting stuff now! Earlier this year, Dashbird launched the very well-received Insights Engine designed to encourage a proactive approach when building and operating serverless applications.

Build Docker Containers For Python Apps Like A Pro

Python apps go great with containers. Docker, Kubernetes, Cloudfoundry, Public Cloud, Private Cloud, they're all awesome places to run your containers. But getting your apps into containers is a tricky business, particularly if you have tens or hundreds of apps to manage, and maintain. Your containers have to be secure, reproducible, and easy to rebuild when vulnerabilities strike or upgrades are required.

Attaching incident playbooks to Azure monitor alerts for rapid remediation

Incident response playbooks are a set of actions that need to be executed by your incident repsonders depending on the nature of the outage. Having well defined incident response playbooks can be extremely critical, especially during high customer impact events, that you would typically classify as Sev-0 incidents.