These reports can be effective for enterprises in improving their resilience, Dai said. However, he did point out that AWS could better help customers minimize downtime and business risk by promoting multi-region architectures, active-active failover, and redundant DNS strategies.
Further, he said that while reports will help in accelerating post-mortem analysis, it is far from enough, and only continuous product improvement, along with practice optimization, can help minimize systemic risks.
Generating an incident report
In order to take advantage of the new capability, enterprise users need to ask questions off the CloudWatch investigation assistant about a particular service’s performance issues or the reason behind its downtime.
Once a user requests such information, the AI-powered assistant scans the system to find telemetry that might be relevant to the situation, and generates hypotheses based on what it finds.
Once the hypotheses are accepted by the user, the assistant can be asked to generate an incident report, the company wrote in its documentation.
Currently, the incident report generation feature is available in US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), Europe (Spain), and Europe (Stockholm) regions.