Featured Post

Resilience hinges on conversations as much as tooling

Image Source: depositphotos.com

Too many businesses still treat resilience as a software procurement and IT operations issue. In reality resilience lives in the mutual relationship between tech, business leadership, and culture.

It goes deep - resilience is baked into the organization in a multitude of ways. Some tech enabled, some policy-driven, and some by culture or employee goodwill.

Organizations that thrive in disruption - and clearly, disruption is now the norm - are those where CIOs, CTOs, and all their business leaders maintain continuous, strategic conversations. It's these reality-based conversations that will turn tools into adaptable systems, technology investments into real-world resilience, and a workforce into a team of conscientious colleagues. The companies that neglect this fact-driven dialogue risk fragility, no matter how advanced their tech stack appears on paper.

Tech is the first step on the road

IT operations is a mature and metrics-led discipline. Despite the complexity and interconnectedness of modern IT infrastructure, most organizations can boast very good uptime, reliability, and quality of service. Investment in technology that supports revenue generation has never been a big issue. Even when some tech is a 'grudge' purchase - traditionally security CIOs are able to make a reasoned case for investment - a risk-reward approach often prevails. Plus, business leaders see the headlines when even major organizations with large resources have been taken offline - it still happens regularly.

There is a dirty secret though. While many organizations are highly available, they are not always highly recoverable should an incident like a cyber attack or malware infestation take root. Even incredibly well-resourced and secure companies may be breached. Some tech vendors say "it's not if, but when" it happens. The day one response to the issue will ideally have started years before. As the old saying goes, "the best time to plant a tree was 20 years ago."

Putting aside what a well-trained, clear policy, confident response to an emergency looks like, there are tech limitations to what can be done that must be explicitly workshopped and agreed to ensure incident response is smooth and unprejudiced. Ideally, a company and a team must focus on a clean recovery, not on the politics of who is at fault. For example, a web store goes down and orders have not gone through the warehouse. What was cached, can it be sent through? How much data may have been lost? Data may be recovered from a backup server - great. But how long will it actually take? The business may demand hours, but high volumes of data might take weeks if investment in recovery was low priority.

So much of the preparation, incident response, and later strengthening all hinges on open, dispassionate conversations about risk tolerance, investment, expectation management, and understanding on all sides about legal, regulatory, technical, and policy matters, and public perception too.

Policy shows the way

The right policies, understood and practised, guide people in using their tools correctly, economically, and compliantly. Properly formulated, practised, followed, and even promoted externally where appropriate, they are the lifeline that connects the ideal state of planning to follow through.

Like any territory map, policies must be clear, understandable, and standardised. There are a few best practices to run policies against to optimise them for success.

Firstly, ensure you define the policy stack that actually matters in an incident. The basics are an incident response plan, disaster recovery and business continuity, backup and retention, access and identity, change management, third-party risk, legal and communications.

Going a level deeper, translate policy into decisions that leaders can take quickly: what gets restored first, acceptable data loss (RPO) and downtime (RTO), who can authorise exceptions, and what 'good/clean enough to reopen' looks like. This level of detail is essential to removing an exceptional amount of stress should the worst happen.

Ownership is critical. Make explicit who is responsible, accountable, consulted, and informed (RACI) for IT, security, legal, communications, finance, operations, and product. Offer clear escalation thresholds and a single incident commander role. This really is one of the most important policies to set, practice, and ensure that those involved truly 'get'. A crisis is not a place for personality politics or bashfulness to get in the way of expertise and responsibility.

I'm sure that almost any organization will admit that they are not rehearsing and practising enough. Investing in tabletop exercises and recovery drills - ideally tied to real systems - really is one of those 'priceless' investments. And of course, measurement enables informed responses, both organizational and emotional, when everyone knows what normal looks like. So, track KPIs like time-to-detect, time-to-decide, time-to-restore, and the decision latency between teams so that people keep a sense of proportion.

And while there are many other practices on the policy side, I'll end with the human element again. Treat your comms policy as a core part of resilience. Regular updates, supportive customer messaging guardrails, and very clear coordination with legal will stop necessary speed from creating further risks.

People are the origin, vehicle, and destination

The phrase 'soft skills' is misleading. 'Foundational skills' is better.

We must always start with human reality. Resilience fails when teams freeze, argue, or hide bad news. We want people to share facts early and act with clarity. Any organization works best with psychological safety and a norm of open escalation. Reward early flagging of anomalies, remove blame in reviews, and focus on learning and improvement.

Role readiness is as important as tools. People should train on their parts as much as with their tech. We are also economic animals, so align incentives with recoverability. If uptime is celebrated but recovery testing is not, recoverability will stay underfunded. Resilience is a valid part of performance expectations.

And, whatever happens, protect the team. There is only one of them. Ensure there's active fatigue management and a defined stop point in decision loops. Tired teams can make a challenge into a more expensive mistake. Keep calm, keep talking, keep solving. Resilience lies in that cooperation between people and their tools.