Latest Posts

On-call doesn't have to be stressfull

Nov 29, 2019 By Amrit Balraj In Zenduty

“Being on-call is a critical duty that many operations and engineering teams must undertake to keep their services reliable and available. However, there are several pitfalls in the organization of on-call rotations and responsibilities that can lead to serious consequences for the services and the teams if not avoided.

Read Post

Zenduty

Read more about On-call doesn't have to be stressfull

The importance of GameDays

Nov 18, 2019 By Amrit Balraj In Zenduty

GameDays were first coined by Amazon’s “Master of Disaster” Jesse Robbins when he created them intending to increase reliability by purposefully creating major failures on pre-planned dates. Game Days help facilitate the values of chaos engineering. Chaos engineering is the disciplined practice of injecting failure into healthy systems. With modern IT services becoming increasingly sophisticated continuously changing systems, outages are inevitable.

Read Post

Zenduty

Read more about The importance of GameDays

Site Reliability Engineering-Why you should adopt SRE

Nov 11, 2019 By Amrit Balraj In Zenduty

Site reliability engineering was a term coined by Google engineer Benjamin Treynor in 2003 when he was tasked with making sure that Google services were reliable, secure and functional. He and his team eventually wrote the book on SRE which is available online for free for anyone interested in research and implementation of SRE best practices.

Read Post

Zenduty

Read more about Site Reliability Engineering-Why you should adopt SRE

Relationships between Operation and Devlopment Teams

Oct 16, 2019 By Amrit Balraj In Zenduty

Modern businesses are evolving rapidly with the advent of cloud, CI/CD and microservices. However, there still exists an extensive and obvious divide between principle business stakeholders and developmental teams. Development teams are often unaware of the challenges faced by operations teams and vice-versa. This is where a need for adoption of DevOps principles comes into the picture. DevOps which came into existence as the natural successor to Agile practices in software development.

Read Post

Zenduty

Read more about Relationships between Operation and Devlopment Teams

ChatOps-The future of collaboration

Oct 7, 2019 By Amrit Balraj In Zenduty

ChatOps is the implementation of chatbots to unify communication and collaboration. Through ChatOps every single member of a team will be aware of what the other members are working on. It is the logical next step in the evolution of communication among teams after email and IM. Projects of today are developed at a global scale with millions of people as potential users, this means that teams are larger and often work in shifts or even remotely.

Read Post

Zenduty

Read more about ChatOps-The future of collaboration

Post Mortems- Bringing clarity to incident reviews

Oct 3, 2019 By Amrit Balraj In Zenduty

An incident post mortem is known by many names- incident review, root cause analysis (RCA), learning review, but what do they entail?. A post mortem is a post-incident activity to help organizations understand how the incident happened and to learn from it. Service incidents are an unavoidable hurdle for any company when they do happen, the teams working will be wholly focussed on restoring service as quickly as possible.

Read Post

Zenduty

Read more about Post Mortems- Bringing clarity to incident reviews

The importance of Incident Roles

Sep 30, 2019 By Amrit Balraj In Zenduty

Modern technology organizations are required to be adaptive in their approach to incident management. A single project will have multiple teams working as different branches on integrated systems. Even if all the members have unified communication channels when an interruption occurs in the service there’s bound to be chaos. The frontline response team will have to be on their toes to get to the root issues at the first signs of trouble.

Read Post

Zenduty

Read more about The importance of Incident Roles

Fostering blamelessness at the workplace

Sep 20, 2019 By Amrit Balraj In Zenduty

An integral lesson every business (of any size) learns is that failure is inevitable at some point in the production cycle. There might be times where things go haywire at critical junctures sending teams scrambling to rectify the root issue and reinstate service. The underlying causes are often many and varied especially in large scale systems with complex architecture and interdependence.

Read Post

Zenduty

Read more about Fostering blamelessness at the workplace

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

On-call doesn't have to be stressfull

The importance of GameDays

Site Reliability Engineering-Why you should adopt SRE

Relationships between Operation and Devlopment Teams

ChatOps-The future of collaboration

Post Mortems- Bringing clarity to incident reviews

The importance of Incident Roles

Fostering blamelessness at the workplace

Monthly Archive

Follow Us