This article in Network World details the 12 hour outage AWS recently experience. This goes to show how important end user experience monitoring is. You need to track 100% of user transactions which allows you to stop performance issues before they become systemic:
Amazon Web Services has almost fully recovered from a more than 12-hour event that appears to have started by only impacting a small number of customers but quickly snowballed into a larger issue that took down major sites including Reddit, Imgur and others yesterday.
read the full post here: Amazon outage started small, snowballed into 12-hour event