[PDF/ePUB] Data Science on the Google Cloud Platform

Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning image

DOWNLOAD PDF

From the Preface In this book, we walk through an example of this new transformative, more collaborative way of doing data science. You will learn how to implement an end-to-end data pipeline-we will begin with ingesting the data in a serverless way and work our way through data exploration, dashboards, relational databases, and streaming data all the way to training and making operational a machine learning model. I cover all these aspects of data-based services because data engineers will be involved in designing the services, developing the statistical and machine learning models and implementing them in large-scale production and in real time. Who This Book Is For If you use computers to work with data, this book is for you. You might go by the title of data analyst, database administrator, data engineer, data scientist, or systems programmer today. Although your role might be narrower today (perhaps you do only data analysis, or only model building, or only DevOps), you want to stretch your wings a bit-you want to learn how to create data science models as well as how to implement them at scale in production systems. Google Cloud Platform is designed to make you forget about infrastructure. The marquee data services-Google BigQuery, Cloud Dataflow, Cloud Pub/Sub, and Cloud ML Engine-are all serverless and autoscaling. When you submit a query to BigQuery, it is run on thousands of nodes, and you get your result back; you don’t spin up a cluster or install any software. Similarly, in Cloud Dataflow, when you submit a data pipeline, and in Cloud Machine Learning Engine, when you submit a machine learning job, you can process data at scale and train models at scale without worrying about cluster management or failure recovery. Cloud Pub/Sub is a global messaging service that autoscales to the throughput and number of subscribers and publishers without any work on your part. Even when you’re running open source software like Apache Spark that’s designed to operate on a cluster, Google Cloud Platform makes it easy. Leave your data on Google Cloud Storage, not in HDFS, and spin up a job-specific cluster to run the Spark job. After the job completes, you can safely delete the cluster. Because of this job-specific infrastructure, there’s no need to fear overprovisioning hardware or running out of capacity to run a job when you need it. Plus, data is encrypted, both at rest and in transit, and kept secure. As a data scientist, not having to manage infrastructure is incredibly liberating. The reason that you can afford to forget about virtual machines and clusters when running on Google Cloud Platform comes down to networking. The network bisection bandwidth within a Google Cloud Platform datacenter is 1 PBps, and so sustained reads off Cloud Storage are extremely fast. What this means is that you don’t need to shard your data as you would with traditional MapReduce jobs. Instead, Google Cloud Platform can autoscale your compute jobs by shuffling the data onto new compute nodes as needed. Hence, you’re liberated from cluster management when doing data science on Google Cloud Platform. These autoscaled, fully managed services make it easier to implement data science models at scale-which is why data scientists no longer need to hand off their models to data engineers. Instead, they can write a data science workload, submit it to the cloud, and have that workload executed automatically in an autoscaled manner. At the same time, data science packages are becoming simpler and simpler. So, it has become extremely easy for an engineer to slurp in data and use a canned model to get an initial (and often very good) model up and running. With well-designed packages and easy-to-consume APIs, you don’t need to know the esoteric details of data science algorithms-only what each algorithm does, and how to link algorithms together to solve realistic problems. This convergence between data science and data engineering is why you can stretch your wings beyond your current role. Rather than simply read this book cover-to-cover, I strongly encourage you to follow along with me by also trying out the code. The full source code for the end-to-end pipeline I build in this book is on GitHub. Create a Google Cloud Platform project and after reading each chapter, try to repeat what I did by referring to the code and to the Readme file in each folder of the GitHub repository.

✔ Author(s):
✔ Title: Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning
✔ Rating : 4.2 out of 5 base on (79 reviews)
✔ ISBN-10: 1491974567
✔ ISBN-13: 9781491974568
✔ Language: English
✔ Format ebook: PDF, EPUB, Kindle, Audio, HTML and MOBI
✔ Device compatibles: Android, iOS, PC and Amazon Kindle

Readers' opinions about Data Science on the Google Cloud Platform by Valliappa Lakshmanan

/
Elaine Leonard
What a rollercoaster of emotions! I laughed, cried, and everything in between. The author's ability to evoke such raw feelings is truly commendable. It's a story that will stay with me forever.
/
Josephine Alvarez
The depth of character development was astounding. I felt like I knew each person intimately, understanding their hopes, fears, and dreams. It made the story so much more meaningful.
/
Laura Hill
The characters in this book felt like old friends, and I was sad to say goodbye to them at the end. It's a testament to the author's talent for creating memorable and relatable personas.


A Beginner’s Guide to Building a Roombox: Simply Living Mini Designs by Rosa Moran, Pea, Bee, & Jay #2: Wannabees, Strip-Pieced Bargello: Dynamic Quilts, Step by Step, Radiant Red, Volume 1: A Massive-Verse Book, When Death Becomes Life: Notes from a Transplant Surgeon, First Date: An absolutely jaw-dropping psychological thriller, The Homemade Healthy Dog Food Cookbook & Blueprint: Elevate Your Canine’s Well-being with a Variety of Easy Dog Food Recipes, Irresistible Treats, and a 30-Day Transition Meal Plan, The Jar Spells Compendium: A Step by Step Guide to Realize Witch Bottles for Love, Prosperity, Happiness and Health., The Do No Harm Dog Training and Behavior Handbook: Featuring the Hierarchy of Dog Needs®, Consumed by Hate, Redeemed by Love: How a Violent Klansman Became a Champion of Racial Reconciliation, Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning full download ... Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning Valliappa Lakshmanan popular EPUB download ... Click the button to get Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning pdf new book ... [download] book Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning format PDF ... Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning epub ebook ... Click the Download or Read Online button Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning Valliappa Lakshmanan pdf free download ... Read online or download Valliappa Lakshmanan Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning PDF ... Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning book online for free ... Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning pdf ebook epub free download ... Complete PDF Ebook with essay, research pape Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning read free ...