Use (End-to-End) Tracing or Correlation IDs (4 mins read).
Why you should be requesting, and logging, a unique identifier per request for better supportability.
Recommended read: Postmortem: Removing all users from github.com/trivago · trivago tech blog https://tech.trivago.com/2021/10/05/postmortem-removing-all-users-from-github.com/trivago/
Recommended read: An incident response starter-pack: how do you handle production outages? https://blog.lawrencejones.dev/incident-response/
This is a great post by Shubheksha and talking about the right way to talk about production issues.
Having a blameless culture makes it easier for new/junior engineers getting started with working on production systems, and makes everyone more comfortable working on things where they know they won't get the blame pointed at them.
I've found that, at work, diagnosing issues in our staging environment has given me such a great experience - it's been great to practice dealing with production-like issues in a non-production environment, as it gives you that time to breath, experiment and learn, as well as giving me much greater understanding of the end-to-end system.
Recommended read: Re-framing how we think about production incidents https://shubheksha.com/posts/2019/04/re-framing-how-we-think-about-production-incidents/