free excerpt from:
Monitoring Distributed Systems - O'Reilly Media
"Monitoring is an essential part of a modern production system. If you can’t monitor a service, you don’t know what’s happening, and if you’re blind to what’s happening, your service can’t be reliable.
Author Rob Ewaschuk explains basic principles and best practices that he and other members of Google’s Site Reliability Engineering (SRE) teams use for building successful monitoring and alerting systems."