05042nam 2200721 a 450 991100676370332120240313112325.01-62198-910-01-84951-913-71-299-18393-X(CKB)2550000001005718(EBL)1103987(OCoLC)828794321(SSID)ssj0000907255(PQKBManifestationID)12469492(PQKBTitleCode)TC0000907255(PQKBWorkID)10884307(PQKB)10906268(MiAaPQ)EBC1103987(CaSebORM)9781849519120(Au-PeEL)EBL1103987(CaPaEBR)ebr10659963(CaONFJC)MIL449643(PPN)22799860X(PPN)167589679(OCoLC)843959318(OCoLC)ocn843959318 (EXLCZ)99255000000100571820130307d2013 uy 0engurunu|||||txtccrHadoop real-world solutions cookbook Realistic, simple code examples to solve problems at scale with Hadoop and related technologies /Jonathan R. Owens, Jon Lentz, Brian Femiano1st editionBirmingham [England] Packt Pub.20131 online resource (316 p.)Includes index.1-84951-912-9 Cover; Copyright; Credits; About the Authors; About the Reviewers; www.packtpub.com; Table of Contents; Preface; Chapter 1: Hadoop Distributed File System - Importing and Exporting Data; Introduction; Importing and exporting data into HDFS using Hadoop shell commands; Moving data efficiently between clusters using Distributed Copy; Importing data from MySQL into HDFS using Sqoop; Exporting data from HDFS into MySQL using Sqoop; Configuring Sqoop for Microsoft SQL Server; Exporting data from HDFS into MongoDB; Importing data from MongoDB into HDFSExporting data from HDFS into MongoDB using PigUsing HDFS in a Greenplum external table; Using Flume to load data into HDFS; Chapter 2: HDFS; Introduction; Reading and writing data to HDFS; Compressing data using LZO; Reading and writing data to SequenceFiles; Using Apache Avro to serialize data; Using Apache Thrift to serialize data; Using Protocol Buffers to serialize data; Setting the replication factor for HDFS; Setting the block size for HDFS; Chapter 3: Extracting and Transforming Data; Introduction; Transforming Apache logs into TSV format using MapReduceUsing Apache Pig to filter bot traffic from web server logsUsing Apache Pig to sort web server log data by timestamp; Using Apache Pig to sessionize web server log data; Using Python to extend Apache Pig functionality; Using MapReduce and secondary sort to calculate page views; Using Hive and Python to clean and transform geographical event data; Using Python and Hadoop Streaming to perform a time series analytic; Using Multiple Outputs in MapReduce to name output files; Creating custom Hadoop Writable and InputFormat to read geographical event dataChapter 4: Performing Common Tasks Using Hive, Pig, and MapReduce Introduction; Using Hive to map an external table over weblog data in HDFS; Using Hive to dynamically create tables from the results of a weblog query; Using the Hive string UDFs to concatenate fields in weblog data; Using Hive to intersect weblog IPs and determine the country; Generating n-grams over news archives using MapReduce; Using the distributed cache in MapReduce; to find lines that contain matching keywords over news archives; Using Pig to load a table and perform a SELECT operation with GROUP BYChapter 5: Advanced Joins Introduction; Joining data in the Mapper using MapReduce; Joining data using Apache Pig replicated join; Joining sorted data using Apache Pig merge join; Joining skewed data using Apache Pig skewed join; Using a map-side join in Apache Hive to analyze geographical events; Using optimized full outer joins in Apache Hive to analyze geographical events; Joining data using an external key-value store (Redis); Chapter 6: Big Data Analysis; Introduction; Counting distinct IPs in web log data using MapReduce and CombinersUsing Hive date UDFs to transform and sort event dates from geographic event dataRealistic, simple code examples to solve problems at scale with Hadoop and related technologies.Electronic data processingDistributed processingOpen source softwareElectronic data processingDistributed processing.Open source software.004.6005.74Owens Jonathan R1823336Lentz Jon733032Femiano Brian1823337MiAaPQMiAaPQMiAaPQBOOK9911006763703321Hadoop real-world solutions cookbook4389933UNINA