Clojure data analysis cookbook / / Eric Rochester
| Clojure data analysis cookbook / / Eric Rochester |
| Autore | Rochester Eric |
| Edizione | [1st edition] |
| Pubbl/distr/stampa | Birmingham, UK, : Packt Pub., c2013 |
| Descrizione fisica | 1 online resource (342 p.) |
| Disciplina | 005.133 |
| Soggetto topico |
Database searching
Clojure (Computer program language) |
| ISBN |
1-68015-416-8
1-299-44085-1 1-78216-265-8 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto |
Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Importing Data for Analysis; Introduction; Creating a new project; Reading CSV data into Incanter datasets; Reading JSON data into Incanter datasets; Reading data from Excel with Incanter; Reading data from JDBC databases; Reading XML data into Incanter datasets; Scraping data from tables in web pages; Scraping textual data from web pages; Reading RDF data; Reading RDF data with SPARQL; Aggregating data from different formats; Chapter 2: Cleaning and Validating Data
IntroductionCleaning data with regular expressions; Maintaining consistency with synonym maps; Identifying and removing duplicate data; Normalizing numbers; Rescaling values; Normalizing dates and times; Lazily processing very large data sets; Sampling from very large data sets; Fixing spelling errors; Parsing custom data formats; Validating data with Valip; Chapter 3: Managing Complexity with Concurrent Programming; Introduction; Managing program complexity with STM; Managing program complexity with agents; Getting better performance with commute; Combining agents and STM Maintaining consistency with ensureIntroducing safe side effects into the STM; Maintaining data consistency with validators; Tracking processing with watchers; Debugging concurrent programs with watchers; Recovering from errors in agents; Managing input with sized queues; Chapter 4: Improving Performance with Parallel Programming; Introduction; Parallelizing processing with pmap; Parallelizing processing with Incanter; Partitioning Monte Carlo simulations for better pmap performance; Finding the optimal partition size with simulated annealing; Parallelizing with reducers Generating online summary statistics with reducersHarnessing your GPU with OpenCL and Calx; Using type hints; Benchmarking with Criterium; Chapter 5: Distributed Data Processing with Cascalog; Introduction; Distributed processing with Cascalog and Hadoop; Querying data with Cascalog; Distributing data with Apache HDFS; Parsing CSV files with Cascalog; Complex queries with Cascalog; Aggregating data with Cascalog; Defining new Cascalog operators; Composing Cascalog queries; Handling errors in Cascalog workflows; Transforming data with Cascalog Executing Cascalog queries in the Cloud with PalletChapter 6: Working with Incanter Datasets; Introduction; Loading Incanter's sample datasets; Loading Clojure data structures into datasets; Viewing datasets interactively with view; Converting datasets to matrices; Using infix formulas in Incanter; Selecting columns with ; Selecting rows with ; Filtering datasets with where; Grouping data with group-by; Saving datasets to CSV and JSON; Projecting from multiple datasets with join; Chapter 7: Preparing for and Performing Statistical Data Analysis with Incanter; Introduction Generating summary statistics with rollup |
| Record Nr. | UNINA-9911006788503321 |
Rochester Eric
|
||
| Birmingham, UK, : Packt Pub., c2013 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||
Mastering Clojure data analysis : leverage the power and flexibility of Clojure through this practical guide to data analysis / / Eric Rochester ; cover image by Jarosław Blaminsky
| Mastering Clojure data analysis : leverage the power and flexibility of Clojure through this practical guide to data analysis / / Eric Rochester ; cover image by Jarosław Blaminsky |
| Autore | Rochester Eric |
| Pubbl/distr/stampa | Birmingham, England : , : Packt Publishing Ltd, , 2014 |
| Descrizione fisica | 1 online resource (340 p.) |
| Disciplina | 302.3 |
| Collana | Community Experience Distilled |
| Soggetto topico |
Social networks - Mathematical models
Geographic information systems - England Application software - Development |
| Soggetto genere / forma | Electronic books. |
| ISBN | 1-78328-414-5 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto |
Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Network Analysis - The Six Degrees of Kevin Bacon; Analyzing social networks; Getting the data; Understanding graphs; Implementing the graph; Loading the data; Measuring social network graphs; Density; Degrees; Paths; The average path length; Network diameter; Clustering coefficient; Centrality; Degrees of separation; Visualizing the graph; Setting up ClojureScript; A force-directed layout; A hive plot; A pie chart; Summary
Chapter 2: GIS Analysis - Mapping Climate ChangeUnderstanding GIS; Mapping the climate change; Downloading and extracting the data; Downloading the files; Extracting the files; Transforming the data - filtering; Rolling averages; Reading the data; Interpolating sample points and generating heat maps using inverse distance weighting (IDW); Working with map projections; Finding a base map; Working with ArcGIS; Summary; Chapter 3: Topic Modeling - Changing Concerns in State of the Union Addresses; Understanding data in State of the Union addresses; Understanding topic modeling Preparing for visualizationsSetting up the project; Getting the data; Loading the data into MALLET; Visualizing with D3 and ClojureScript; Exploring the topics; Exploring topic 43; Exploring topic 26; Exploring topic 42; Summary; Chapter 4: Classifying UFO Sightings; Getting the data; Extracting the data; Dealing with messy data; Visualizing UFO data; Description; Topic modeling descriptions; Hoaxes; Preparing the data; Reading the data into a sequence of data records; Splitting out the NUFORC comments; Categorizing the documents based on the comments Partitioning the documents into directories based on the categoriesDividing them into training and test sets; Classifying the data; Coding the classifier interface; Running the classifier and examining the results; Summary; Chapter 5: Benford's Law - Detecting Natural Progressions of Numbers; Learning about Benford's Law; Applying Benford's law to compound interest; Looking at the world population data; Failing Benford's Law; Case studies; Summary; Chapter 6: Sentiment Analysis - Categorizing Hotel Reviews; Understanding sentiment analysis; Getting hotel review data; Exploring the data Preparing the dataTokenizing; Creating feature vectors; Creating feature vector functions and POS tagging; Cross validating the results; Calculating error rates; Using the Weka machine learning library; Connecting Weka and cross validation; Understanding maximum entropy classifiers; Understanding naive Bayesian classifiers; Running the experiment; Examining the results; Combining the error rates; Improving the results; Summary; Chapter 7: Null Hypothesis Tests - Analyzing Crime Data; Introducing confirmatory data analysis; Understanding null hypothesis testing; Understanding the process Formulating an initial hypothesis |
| Record Nr. | UNINA-9910464796903321 |
Rochester Eric
|
||
| Birmingham, England : , : Packt Publishing Ltd, , 2014 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||
Mastering Clojure data analysis : leverage the power and flexibility of Clojure through this practical guide to data analysis / / Eric Rochester ; cover image by Jarosław Blaminsky
| Mastering Clojure data analysis : leverage the power and flexibility of Clojure through this practical guide to data analysis / / Eric Rochester ; cover image by Jarosław Blaminsky |
| Autore | Rochester Eric |
| Pubbl/distr/stampa | Birmingham, England : , : Packt Publishing Ltd, , 2014 |
| Descrizione fisica | 1 online resource (340 p.) |
| Disciplina | 302.3 |
| Collana | Community Experience Distilled |
| Soggetto topico |
Social networks - Mathematical models
Geographic information systems - England Application software - Development |
| ISBN | 1-78328-414-5 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto |
Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Network Analysis - The Six Degrees of Kevin Bacon; Analyzing social networks; Getting the data; Understanding graphs; Implementing the graph; Loading the data; Measuring social network graphs; Density; Degrees; Paths; The average path length; Network diameter; Clustering coefficient; Centrality; Degrees of separation; Visualizing the graph; Setting up ClojureScript; A force-directed layout; A hive plot; A pie chart; Summary
Chapter 2: GIS Analysis - Mapping Climate ChangeUnderstanding GIS; Mapping the climate change; Downloading and extracting the data; Downloading the files; Extracting the files; Transforming the data - filtering; Rolling averages; Reading the data; Interpolating sample points and generating heat maps using inverse distance weighting (IDW); Working with map projections; Finding a base map; Working with ArcGIS; Summary; Chapter 3: Topic Modeling - Changing Concerns in State of the Union Addresses; Understanding data in State of the Union addresses; Understanding topic modeling Preparing for visualizationsSetting up the project; Getting the data; Loading the data into MALLET; Visualizing with D3 and ClojureScript; Exploring the topics; Exploring topic 43; Exploring topic 26; Exploring topic 42; Summary; Chapter 4: Classifying UFO Sightings; Getting the data; Extracting the data; Dealing with messy data; Visualizing UFO data; Description; Topic modeling descriptions; Hoaxes; Preparing the data; Reading the data into a sequence of data records; Splitting out the NUFORC comments; Categorizing the documents based on the comments Partitioning the documents into directories based on the categoriesDividing them into training and test sets; Classifying the data; Coding the classifier interface; Running the classifier and examining the results; Summary; Chapter 5: Benford's Law - Detecting Natural Progressions of Numbers; Learning about Benford's Law; Applying Benford's law to compound interest; Looking at the world population data; Failing Benford's Law; Case studies; Summary; Chapter 6: Sentiment Analysis - Categorizing Hotel Reviews; Understanding sentiment analysis; Getting hotel review data; Exploring the data Preparing the dataTokenizing; Creating feature vectors; Creating feature vector functions and POS tagging; Cross validating the results; Calculating error rates; Using the Weka machine learning library; Connecting Weka and cross validation; Understanding maximum entropy classifiers; Understanding naive Bayesian classifiers; Running the experiment; Examining the results; Combining the error rates; Improving the results; Summary; Chapter 7: Null Hypothesis Tests - Analyzing Crime Data; Introducing confirmatory data analysis; Understanding null hypothesis testing; Understanding the process Formulating an initial hypothesis |
| Record Nr. | UNINA-9910786552103321 |
Rochester Eric
|
||
| Birmingham, England : , : Packt Publishing Ltd, , 2014 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||
Mastering Clojure data analysis : leverage the power and flexibility of Clojure through this practical guide to data analysis / / Eric Rochester ; cover image by Jarosław Blaminsky
| Mastering Clojure data analysis : leverage the power and flexibility of Clojure through this practical guide to data analysis / / Eric Rochester ; cover image by Jarosław Blaminsky |
| Autore | Rochester Eric |
| Pubbl/distr/stampa | Birmingham, England : , : Packt Publishing Ltd, , 2014 |
| Descrizione fisica | 1 online resource (340 p.) |
| Disciplina | 302.3 |
| Collana | Community Experience Distilled |
| Soggetto topico |
Social networks - Mathematical models
Geographic information systems - England Application software - Development |
| ISBN | 1-78328-414-5 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto |
Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Network Analysis - The Six Degrees of Kevin Bacon; Analyzing social networks; Getting the data; Understanding graphs; Implementing the graph; Loading the data; Measuring social network graphs; Density; Degrees; Paths; The average path length; Network diameter; Clustering coefficient; Centrality; Degrees of separation; Visualizing the graph; Setting up ClojureScript; A force-directed layout; A hive plot; A pie chart; Summary
Chapter 2: GIS Analysis - Mapping Climate ChangeUnderstanding GIS; Mapping the climate change; Downloading and extracting the data; Downloading the files; Extracting the files; Transforming the data - filtering; Rolling averages; Reading the data; Interpolating sample points and generating heat maps using inverse distance weighting (IDW); Working with map projections; Finding a base map; Working with ArcGIS; Summary; Chapter 3: Topic Modeling - Changing Concerns in State of the Union Addresses; Understanding data in State of the Union addresses; Understanding topic modeling Preparing for visualizationsSetting up the project; Getting the data; Loading the data into MALLET; Visualizing with D3 and ClojureScript; Exploring the topics; Exploring topic 43; Exploring topic 26; Exploring topic 42; Summary; Chapter 4: Classifying UFO Sightings; Getting the data; Extracting the data; Dealing with messy data; Visualizing UFO data; Description; Topic modeling descriptions; Hoaxes; Preparing the data; Reading the data into a sequence of data records; Splitting out the NUFORC comments; Categorizing the documents based on the comments Partitioning the documents into directories based on the categoriesDividing them into training and test sets; Classifying the data; Coding the classifier interface; Running the classifier and examining the results; Summary; Chapter 5: Benford's Law - Detecting Natural Progressions of Numbers; Learning about Benford's Law; Applying Benford's law to compound interest; Looking at the world population data; Failing Benford's Law; Case studies; Summary; Chapter 6: Sentiment Analysis - Categorizing Hotel Reviews; Understanding sentiment analysis; Getting hotel review data; Exploring the data Preparing the dataTokenizing; Creating feature vectors; Creating feature vector functions and POS tagging; Cross validating the results; Calculating error rates; Using the Weka machine learning library; Connecting Weka and cross validation; Understanding maximum entropy classifiers; Understanding naive Bayesian classifiers; Running the experiment; Examining the results; Combining the error rates; Improving the results; Summary; Chapter 7: Null Hypothesis Tests - Analyzing Crime Data; Introducing confirmatory data analysis; Understanding null hypothesis testing; Understanding the process Formulating an initial hypothesis |
| Record Nr. | UNINA-9910825645103321 |
Rochester Eric
|
||
| Birmingham, England : , : Packt Publishing Ltd, , 2014 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||