1.

Record Nr.

UNINA9910300364903321

Autore

Vermeulen Andreas François

Titolo

Practical Data Science : A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets / / by Andreas François Vermeulen

Pubbl/distr/stampa

Berkeley, CA : , : Apress : , : Imprint : Apress, , 2018

ISBN

9781484230541

148423054X

Edizione

[1st ed. 2018.]

Descrizione fisica

1 online resource (824 pages)

Disciplina

005.73

Soggetti

Data mining

Big data

Data structures (Computer science)

Data Mining and Knowledge Discovery

Big Data/Analytics

Big Data

Data Storage Representation

Lingua di pubblicazione

Inglese

Formato

Materiale a stampa

Livello bibliografico

Monografia

Note generali

Includes index.

Nota di contenuto

Chapter 1: Data Science Technology Stack -- Chapter 2: Vermeulen - Krennwallner - Hillman - Clark -- Chapter 3: Layered Framework -- Chapter 4: Business Layer -- Chapter 5: Utility Layer -- Chapter 6: Three Management Layers -- Chapter 7: Retrieve Super Step -- Chapter 8: Assess Super Step -- Chapter 9: Process Super Step -- Chapter 10: Transform Super Step -- Chapter 11: Organize and Report Super Step -- .

Sommario/riassunto

Learn how to build a data science technology stack and perform good data science with repeatable methods. You will learn how to turn data lakes into business assets. The data science technology stack demonstrated in Practical Data Science is built from components in general use in the industry. Data scientist Andreas Vermeulen demonstrates in detail how to build and provision a technology stack to yield repeatable results. He shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of



data from a polyglot of data types and dimensions. What You'll Learn: Become fluent in the essential concepts and terminology of data science and data engineering Build and use a technology stack that meets industry criteria Master the methods for retrieving actionable business knowledge Coordinate the handling of polyglot data types in a data lake for repeatable results.