Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

How we do realtime response with incident.io, Sentry & PagerDuty

Like most tech companies, we use an on-call rota and various alerting tools. We do this to respond to incidents before they’re reported. Proactively identifying issues and communicating to customers helps us provide great experiences and fosters trust. Internally, we’ve been using these alerting tools in tandem with our auto-create incidents feature. We’ve found that it’s made responding to the pager much smoother - it’s one less thing to do when you get paged at 2am.

IoT Project Lifecycle: Key considerations for OTA updates at scale [Part IV]

From entertainment to security, automation is now pervasive. Intelligent devices are transforming our homes while enriching our lives, making them more efficient, productive and environmentally friendly. Most embedded devices run Linux, and their number is poised to keep growing.

Rising IT costs: What to watch out for

It seems like every conversation is about inflation lately. Everything is getting more expensive and the news cycle suggests there is little chance of that abating. Inflation and supply chain challenges are having a knock on effect in terms of cloud adoption and network usage. We’ve already seen some of the big providers increase their prices - so what’s to be done? Can technology also offer solutions for stemming the rise of IT costs?

The Power Of Combining Kubernetes And Non-Kubernetes Cloud Spend

Whether you’re new to Kubernetes or a bona fide wizard, it may seem like getting any meaningful cost data out of it is a miracle. This is because many organizations that migrate to Kubernetes unwittingly step into the Black Box of Kubernetes Spend. In pre-Kubernetes life, teams could allocate costs by tagging resources.

How to monitor host reachability

Most sysadmins and developers have at some point used a few of the popular Linux networking commands or their Windows equivalents to answer the common questions of host reachability- that is, whether a host or service is reachable and how fast it responds. One of the simplest, common checks, is to simply ping a host to verify that it’s reachable from where you issue the command, and to see the total time it takes for the host to receive your request.

How Many SREs Does Your Company Need? Here's How to Decide

So you’ve decided to take advantage of Site Reliability Engineering by hiring SREs for your company. Now, you have a second decision to make: Exactly how many SREs to hire. Do you need just one or two SREs? Or should you build a sprawling SRE team, with a dozen or more SREs on hand to support your organization’s reliability needs? The answers to these questions will, of course, vary; every business’s needs are different.

Automate Troubleshooting of Applications Running on Kubernetes

StackState is an out-of-the-box solution to observe your entire Kubernetes stack, identify problems, automatically highlight the changes that cause them and provide the full context you need for efficient and effective troubleshooting. Our clear and affordable pricing makes it easy to get started today.

Announcing issue-initiated Change Lead Time

Sleuth is pleased to announce a new option to start your Change Lead Time clock based on state transitions in your issue tracker! In our ongoing effort to meet customers where they are, we heard from many of you that you’d like Sleuth to account for and provide visibility into your pre-commit coding time. We’re pleased to offer this this new option to tell Sleuth which specific state transitions in your issue tracker should start your Change Lead Time clock!