Free
Site Reliability Engineering: How Google Runs Production Systems
Ebooks Online

The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization.This book is divided into four sections:Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practicesPrinciples—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systemsManagement—Explore Google's best practices for training, communication, and meetings that your organization can use

Paperback: 552 pages

Publisher: O'Reilly Media; 1 edition (April 16, 2016)

Language: English

ISBN-10: 149192912X

ISBN-13: 978-1491929124

Product Dimensions: 6.9 x 1.3 x 9.1 inches

Shipping Weight: 2 pounds (View shipping rates and policies)

Average Customer Review: 4.1 out of 5 stars  See all reviews (17 customer reviews)

Best Sellers Rank: #8,739 in Books (See Top 100 in Books) #2 in Books > Computers & Technology > Computer Science > Systems Analysis & Design #2 in Books > Computers & Technology > Operating Systems > Linux > Networking & System Administration #2 in Books > Computers & Technology > Networking & Cloud Computing > Network Administration > Linux & UNIX Administration

This book is divided into four sections: Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE) Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems Management—Explore Google's best practices for training, communication, and meetings that your organization can use How to Read This Book This book is a series of essays written by members and alumni of Google’s Site Reliability Engineering organization. It’s much more like conference proceedings than it is like a standard book by an author or a small number of authors. Each chapter is intended to be read as a part of a coherent whole, but a good deal can be gained by reading on whatever subject particularly interests you. (If there are other articles that support or inform the text, we reference them so you can follow up accordingly.) You don’t need to read in any particular order, though we’d suggest at least starting with Chapters 2 and 3, which describe Google’s production environment and outline how SRE approaches risk, respectively. (Risk is, in many ways, the key quality of our profession.) Reading cover-to-cover is, of course, also useful and possible; our chapters are grouped thematically, into Principles (Part II), Practices (Part III), and Management (Part IV). Each has a small introduction that highlights what the individual pieces are about, and references other articles published by Google SREs, covering specific topics in more detail. Additionally, there’s a companion website mentioned in the book that has a number of helpful resources. We hope this will be at least as useful and interesting to you as putting it together was for us. — The Editors

Site Reliability Engineering: How Google Runs Production Systems Tame Your Gmail in 5 Easy Steps with David Allen's GTD: 5-Steps to Organize Your Mail, Improve Productivity and Get Things Done Using Gmail, Google Drive, Google Tasks and Google Calendar Transplant Production Systems: Proceedings of the International Symposium on Transplant Production Systems, Yokohama, Japan, 21-26 July 1992 Lean for Systems Engineering with Lean Enablers for Systems Engineering Decision Systems for Inventory Management and Production Planning (Wiley Series in Production/Operations Management) Database Systems: Design, Implementation, and Management (with Premium Web Site Printed Access Card) (Management Information Systems) Handbook of Software Reliability Engineering Software Reliability Engineering Software Assessment: Reliability, Safety, Testability (New Dimensions In Engineering Series) Axiomatic Quality: Integrating Axiomatic Design with Six-Sigma, Reliability, and Quality Engineering Site Analysis: Informing Context-Sensitive and Sustainable Site Planning and Design Naming a Web Site on the Internet: How to Choose, Register and Protect the Right Domain Name for Your Web Site Arches National Park - A Photographer's Site Shooting Guide I (Arches National Park - A Photographer's Site Shooting Guide 1) Google AdWords: Earn Better Revenue through Google AdWords Google Drive & Docs in 30 Minutes (2nd Edition): The unofficial guide to the new Google Drive, Docs, Sheets & Slides Google Adsense & SEO Secret $100/ Day: How I make $100/ day with Google and my SEO secrets Seo: 2016: Search Engine Optimization, Internet Marketing Strategies & Content Marketing (Google Adwords, Google Analytics, Wordpress, E-Mail ... Marketing, E-Commerce, Inbound Marketing) SEO: 2016: Search Engine Optimization, Internet Marketing Strategies & Content Marketing (Google Adwords, Google Analytics, Wordpress, E-Mail Marketing, ... Marketing, E-Commerce, Inbound Marketing) SEO for Google Places - The Secret to Crushing Your Competition with Local SEO and Google Places Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design