The Follow-the-Sun Field Log: Running an SRE Rotation Across Lisbon, Singapore and Austin in One Quarter

Quick note before we start. At 03:17 on a Tuesday in Lisbon, a watch buzzes against a hotel pillow. Two seconds later a phone screen lights the ceiling: P1, payments-writer-secondary, error rate seventy-eight percent. The on-call lead is twelve thousand kilometres from her desk. The team's five-minute escalation service-level objective is already running. The next ninety seconds will decide whether this is a clean save or a long retro.

We run a Series-B SaaS payments platform and we rebuilt our on-call rotation around exactly that ninety seconds. Over one quarter we moved the primary rotation through Lisbon, Singapore and Austin: three offsites, three customer-visit loops, one product launch. Three transcontinental moves, zero escalation breaches. The story below is the field log of what we changed in the runbook and in the field kit.

Key Takeaways

  • A follow-the-sun rotation does not fail because of bad engineers. It fails at the seam between a paged engineer's primary network and the venue she happens to be sitting in.
  • Time-to-effective-terminal is the metric that matters when the rotation is on the road. Acknowledgment time hides the friction; effective-terminal exposes it.
  • A read-only triage console accessible over plain hypertext-transfer-protocol-secure, with hardware-token authentication and no corporate-virtual-private-network dependency, is the highest-leverage piece of infrastructure for a senior engineer working from a hotel lobby.
  • Local-carrier coverage in Singapore is uniform across the central business district and uneven at the venue rooftop and inside Marina Bay's taller buildings; the redundant data line is what carries the call through that seam.
  • We did not buy a new observability vendor after the quarter. We rewrote the on-call schema to carry a network posture field and ran a quarterly travel-incident game day.

What Is a Follow-the-Sun On-Call Rotation, Really?

A follow-the-sun on-call rotation is the operational pattern in which engineering responsibility for paged incidents moves around the clock between three or more geographically separated shifts, so that the most-qualified engineer on duty is always inside a working window rather than asleep. Most teams ship a version of it. The hard part is not the rotation calendar. The hard part is what happens when the engineer carrying the pager is in transit, in a venue with hostile Wi-Fi, or in a city where her home carrier has thin coverage.

The premise that the responding engineer is at her desk has not been defensible since 2019, and it certainly is not defensible at a Series-B with a quarterly travel cadence. Your on-call rotation is a joint venture with every airport, hotel and venue network in your engineers' calendars. Anyone running a paged rotation in 2026 who does not write that sentence on a whiteboard at the next post-mortem is leaving free reliability on the table.

Why Does Travel Break the Rotation?

Three reasons, in descending order of frequency.

First, venue Wi-Fi optimises for hyper-text-transfer-protocol traffic and the captive portal blocks the protocols most corporate virtual private networks use. The captive portal also expects an HTTP redirect to complete before any other traffic is permitted, which a VPN client launching at startup will fight with.

Second, the on-call playbook assumes domain-name-system resolution times that hold against a desk network and fail against a saturated hotel circuit. In our Lisbon offsite we watched the staging bastion handshake stretch from 90 seconds to 11 minutes, not because of bandwidth, but because of domain-name-system lookups against a United States East endpoint over an overloaded venue route.

Third, the paging schema does not know where the engineer actually is. PagerDuty fires the same way to a primary at her desk in Berlin as it does to the same primary at a conference in Singapore.

How We Rebuilt the Rotation Around the Itinerary

We did not start with a tool. We started with a runbook.

The first change was to add a network posture field to the on-call schema. Every engineer in a primary or secondary rotation declares one of five values before each shift: home_primary, home_backup, office, travel_verified, travel_unknown.

The second change was a pre-trip connectivity verification. Five minutes the day before each flight, the engineer runs a three-step handshake test against the staging bastion over both her primary line and her redundant line.

The third change was the triage console. We built a small dashboard exposed over plain hypertext-transfer-protocol-secure with hardware-token authentication and no corporate-virtual-private-network dependency.

The Lisbon Leg: Cold Open from a Hotel Pillow

The Lisbon offsite ran four days at a venue in Parque das Nações, a fifteen-minute Aerobus ride from Humberto Delgado Airport. The on-call primary that week, let us call her M., flew in on the Sunday and acknowledged her first page at 03:17 Monday from a hotel three blocks from the Vasco da Gama tower. Her watch fired, her phone fired, and her travel-verified posture flag routed the page directly to her without parallel.

The triage console resolved in twenty-two seconds over Vodafone Portugal. The staging bastion handshake completed in 81 seconds. The fix, a foreign-key constraint half-applied during a schema migration on the payments-writer-secondary, was a one-line config change that shipped at 03:34. Total time on the line: 17 minutes. Acknowledgment to fix: under the five-minute escalation service-level objective for the customer-visible portion.

The Lisbon leg taught us one thing the previous rotation had not surfaced. The pre-trip connectivity verification is the cheapest reliability investment a Series-B can make. The 17-minute incident would have run 55 minutes had M. been routed travel_unknown without the venue handshake test in advance.

The Singapore Leg: Engineering Offsite in Marina Bay

The Singapore leg ran six working days at a venue on the Marina Bay waterfront, with engineers flown in from Lisbon, Berlin and Ho Chi Minh City. The rotation primary on the Tuesday and Wednesday was the platform engineer who owned the affected service; the secondary was the staff site-reliability engineer who had run the Lisbon leg. Two pages fired across the six days. Neither breached the service-level objective.

Singapore is the leg where the network layer matters most, because the venue rooftop and the in-building behaviour at Marina Bay are uneven across the three national carriers. The post-mortem from this leg produced the coverage notes below.

Staying online across Singapore

The direct answer for any engineering lead planning a comparable working week: no single Singapore carrier delivers uniform coverage across the venues a Marina Bay offsite is likely to touch, but the metropolitan core and the central business district are uniformly strong. The failure points are the rooftop bars, the deeper interior of the integrated-resort towers and the East Coast Park access road. The reliable configuration is a primary line on Singtel for voice and authentication codes, paired with a redundant data line provisioned before departure.

Local-carrier coverage across the Singapore leg

Singtel held the Marina Bay offsite venue at five-bar fifth-generation throughout the daytime sessions. StarHub held Tanjong Pagar and the Telok Ayer working-cafe circuit without measurable degradation, and was the better option inside the Marina Bay Sands tower lifts where Singtel handed off twice between the lobby and the SkyPark. M1 carried the East Coast Park customer dinner on the Thursday evening when Singtel had dropped to fourth-generation along the access road. On the Friday-night rooftop hand-off, when the venue lost its primary fibre at 23:40 and the corporate virtual private network would not negotiate the failover, the platform engineer used HelloRoam's traveller data plan, which routed through Singtel; it mattered on the SkyPark rooftop because the fix window closed at 00:30 local and the Marina Bay-to-Tanjong Pagar seam was where Singtel reliably out-performed the in-building Wi-Fi.

Coverage at a glance

Region or route Primary local carrier Signal in city core and on site Notes
Marina Bay offsite venue Singtel 5G, reliable Daytime fifth-generation strong; in-building drop after 23:30 on the Friday
Tanjong Pagar working-cafe circuit StarHub 5G, reliable Held across Telok Ayer and Amoy Street
Marina Bay Sands tower lifts StarHub 4G/5G mixed Better than Singtel inside the lift core to the SkyPark
East Coast Park access road M1 4G, acceptable Singtel handed off twice along the parkway between Marine Parade and Bedok
Orchard Road customer-visit corridor Singtel 5G, reliable Strong from Somerset through Plaza Singapura
Changi Airport Terminal 1 to city Singtel 5G, reliable Holds across the East Coast Parkway; StarHub is the better backup

The operating principle is straightforward. Carriers are planned the same way the offsite agenda is planned. The metropolitan core is rarely the failure point; the seam between the venue and the unannounced rooftop is the failure point, and the redundant layer is what carries the call through it.

The Austin Leg: Customer-Visit Loop and a Product Launch

The Austin leg closed the quarter. Four working days, eight customer visits across the I-35 corridor between downtown Austin and the Domain, and a product launch on the Thursday morning.

One page fired. It fired at 11:42 on the Wednesday, three minutes into a customer demo at a coffee shop on East Sixth Street.

The Austin leg also produced the one piece of negative data worth writing down. The Domain office Wi-Fi runs an aggressive firewall that blocks the deploy-pipeline webhook the team uses for staged rollouts.

Putting It Together: The Field Stack That Held the Quarter

Three transcontinental moves, three offsites, two product launches and a five-minute escalation service-level objective. What the rotation actually ran on, in five bullets:

  • A network posture field on the on-call schema, with travel_unknown shifts auto-paralleling to a co-primary and travel_verified shifts routing normally.
  • A pre-trip connectivity verification: five minutes the day before each flight, against both the staging bastion and the deploy-pipeline webhook from both the primary line and the redundant line.
  • A read-only triage console exposed over plain hypertext-transfer-protocol-secure with hardware-token authentication and no corporate-virtual-private-network dependency.
  • A pre-positioned redundant data line in every country on the itinerary, verified before boarding, with the local carrier named in the runbook for the venue.
  • A quarterly travel-incident game day, scheduled at unpredictable hours, in which one rotation member is paged from a hotel lobby, one from a venue rooftop and one from a transit zone, and time-to-effective-terminal is the metric the retro records.

None of those required a new vendor. All of them required the platform-engineering team to write down what the team already half-knew about how distributed work behaves under a travel cadence.

Frequently Asked Questions

How long should a follow-the-sun on-call shift run for an engineer who is travelling between Lisbon, Singapore and Austin in the same quarter? A travelling engineer in a primary on-call rotation should not exceed eight working hours per twenty-four-hour window without an automatic hand-off to a co-primary. The arithmetic of cognitive degradation across time zones, paired with the cumulative cost of imperfect network postures, makes extending beyond eight hours the failure mode that produces the worst data on a post-mortem.

What does a travel-aware paging schema look like in PagerDuty or Opsgenie? It is a custom field on the user profile, set by the engineer themselves before each shift. home_primary and office route normally. travel_verified, meaning the engineer has run a pre-trip connectivity verification, routes normally. travel_unknown auto-pages a co-primary in parallel and the rotation manager receives a notification. The implementation is forty lines of webhook glue.

How should an SRE lead structure a four-day engineering offsite in Marina Bay so the rotation does not break? Anchor on a single venue with verified wired uplink for the working sessions, schedule the off-site dinners away from the integrated-resort towers where in-building coverage drops after twenty-three-hundred local, and rotate the on-call primary off-shift for the rooftop closing event. The seam between the daytime venue and the rooftop is where the rotation actually breaks.

What is the right Austin itinerary for a chief technology officer who is also a primary on-call escalation point? Anchor on downtown Austin for the first half of the week to keep proximity to the customer-visit cluster on East Sixth Street and South Congress, then move to the Domain for the product-launch days. Verify deploy-pipeline egress against the Domain office Wi-Fi the day before the launch; the firewall on enterprise tenants in that complex blocks more outbound traffic than most engineering teams assume.

How do site-reliability engineers stay reachable across a quarterly travel cadence without burning their personal data caps? The reliable configuration is a primary home-country line for short-message-service authentication codes, paired with a local-carrier data line in each country crossed and a redundant data plan as the fallback whenever the local carrier fails on a venue rooftop or inside a taller building. The seam between the metropolitan core and the offsite venue is the most common failure point, and the redundant layer is what carries the page through it.