Bash for Data Scientists
| Bash for Data Scientists |
| Autore | Campesato Oswald |
| Edizione | [1st ed.] |
| Pubbl/distr/stampa | Bloomfield : , : Mercury Learning & Information, , 2022 |
| Descrizione fisica | 1 online resource (293 pages) |
| Disciplina | 005.43 |
| Soggetto topico | COMPUTERS / Programming Languages / Python |
| Soggetto non controllato |
Computer Science
Data Science Pandas Programming Python UNIX awk data mining grep sed |
| ISBN |
9781683929710
1683929713 9781683929727 1683929721 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto |
Intro -- Bash for Data Scientists -- CONTENTS -- PREFACE -- WHAT IS THE GOAL? -- IS THIS BOOK IS FOR ME AND WHAT WILL I LEARN? -- HOW WERE THE CODE SAMPLES CREATED? -- WHAT YOU NEED TO KNOW FOR THIS BOOK -- WHICH BASH COMMANDS ARE EXCLUDED? -- HOW DO I SET UP A COMMAND SHELL? -- WHAT ARE THE "NEXT STEPS" AFTER FINISHING THIS BOOK? -- CHAPTER 1 INTRODUCTION -- WHAT IS UNIX? -- Available Shell Types -- WHAT IS BASH? -- Getting Help for Bash Commands -- Navigating Around Directories -- The history Command -- LISTING FILENAMES WITH THE LS COMMAND -- DISPLAYING CONTENTS OF FILES -- The cat Command -- The head and tail Commands -- The Pipe Symbol -- The fold Command -- FILE OWNERSHIP: OWNER, GROUP, AND WORLD -- HIDDEN FILES -- HANDLING PROBLEMATIC FILENAMES -- WORKING WITH ENVIRONMENT VARIABLES -- The env Command -- Useful Environment Variables -- Setting the PATH Environment Variable -- Specifying Aliases and Environment Variables -- FINDING EXECUTABLE FILES -- THE printf COMMAND AND THE echo COMMAND -- THE cut COMMAND -- THE echo COMMAND AND WHITESPACES -- COMMAND SUBSTITUTION ("BACK TICK") -- THE PIPE SYMBOL AND MULTIPLE COMMA -- USING A SEMICOLON TO SEPARATE COMMANDS -- THE paste COMMAND -- Inserting Blank Lines with the paste Command -- A SIMPLE USE CASE WITH THE paste COMMAND -- A SIMPLE USE CASE WITH cut AND paste COMMANDS -- WORKING WITH META CHARACTERS -- WORKING WITH CHARACTER CLASSES -- WHAT ABOUT ZSH? -- Switching between bash and zsh -- Configuring zsh -- SUMMARY -- CHAPTER 2 FILES AND DIRECTORIES -- CREATE, COPY, REMOVE, AND MOVE FILES -- Creating Files -- Copying Files -- Copy Files with Command Substitution -- Deleting Files -- Moving Files -- THE BASENAME, DIRNAME, AND FILE COMMANDS -- THE wc COMMAND -- THE more COMMAND AND THE less COMMAND -- THE head COMMAND -- THE tail COMMAND -- FILE COMPARISON COMMANDS -- THE PARTS OF A FILENA.
WORKING WITH FILE PERMISSIONS -- The chmod Command -- The chown Command -- The chgrp Command -- The umask and ulimit Commands -- WORKING WITH DIRECTORIES -- Absolute and Relative Directories -- Absolute and Relative Path Names -- Creating Directories -- Removing Directories -- Changing Directories -- Renaming Directories -- USING QUOTE CHARACTERS -- STREAMS AND REDIRECTION COMMANDS -- METACHARACTERS AND CHARACTER CLASSES -- Digits and Characters -- Working with "^" and "\" and "!" -- FILENAMES AND METACHARACTERS -- SUMMARY -- CHAPTER 3 USEFUL COMMANDS -- THE join COMMAND -- THE fold COMMAND -- THE split COMMAND -- THE sort COMMAND -- THE uniq COMMAND -- HOW TO COMPARE FILES -- THE od COMMAND -- THE tr COMMAND -- A SIMPLE USE CASE -- THE find COMMAND -- THE tee COMMAND -- FILE COMPRESSION COMMANDS -- The tar command -- The cpio Command -- The gzip and gunzip Commands -- The bunzip2 Command -- The zip Command -- COMMANDS FOR zip FILES AND bz FILES -- INTERNAL FIELD SEPARATOR (IFS) -- DATA FROM A RANGE OF COLUMNS IN A DATASET -- WORKING WITH UNEVEN ROWS IN DATASETS -- THE alias COMMAND -- SUMMARY -- CHAPTER 4 CONDITIONAL LOGIC AND LOOPS -- ARITHMETIC OPERATIONS AND OPERATORS -- WORKING WITH ARRAYS -- ARRAYS AND TEXT FILES -- WORKING WITH VARIABLES -- Assigning Values to Variables -- WORKING WITH OPERATORS FOR STRINGS AND NUMBERS -- THE read COMMAND FOR USER INPUT -- THE test COMMAND FOR VARIABLES, FILES, AND DIRECTORIES -- Relational Operators -- Boolean Operators -- String Operators -- File Test Operators -- CONDITIONAL LOGIC WITH if/else STATEMENTS -- THE case/esac STATEMENT -- ARITHMETIC OPERATORS AND COMPARISONS -- WORKING WITH STRINGS IN SHELL SCRIPTS -- Working with Strings -- WORKING WITH LOOPS -- Using a for loop -- WORKING WITH NESTED LOOPS -- USING A while LOOP -- THE while, case, AND if/elif/fi STATEMENTS -- USING AN UNTIL LOOP. USER-DEFINED FUNCTIONS -- CREATING A SIMPLE MENU FROM SHELL COMMANDS -- SUMMARY -- CHAPTER 5 PROCESSING DATASETS WITH GREPAND SED -- WHAT IS THE grep COMMAND? -- METACHARACTERS AND THE grep COMMAND -- ESCAPING METACHARACTERS WITH THE grep COMMAND -- USEFUL OPTIONS FOR THE grep COMMAND -- Character Classes and the grep Command -- WORKING WITH THE -C OPTION IN grep -- MATCHING A RANGE OF LINES -- USING BACK REFERENCES IN THE grep COMMAND -- FINDING EMPTY LINES IN DATASETS -- USING KEYS TO SEARCH DATASETS -- THE BACKSLASH CHARACTER AND THE grep COMMAND -- MULTIPLE MATCHES IN THE GREP COMMAND -- THE grep COMMAND AND THE xargs COMMAND -- Searching zip Files for a String -- CHECKING FOR A UNIQUE KEY VALUE -- Redirecting Error Messages -- THE egrep COMMAND AND fgrep COMMAND -- Displaying "Pure" Words in a Dataset with egrep -- Redirecting Error Messages -- THE egrep COMMAND AND fgrep COMMAND -- Displaying "Pure" Words in a Dataset with egrep -- The fgrep Command -- DELETE ROWS WITH MISSING VALUES -- A SIMPLE USE CASE -- WHAT IS THE sed COMMAND? -- The sed Execution Cycle -- MATCHING STRING PATTERNS USING sed -- SUBSTITUTING STRING PATTERNS USING sed -- Replacing Vowels from a String or a File -- Deleting Multiple Digits and Letters from a String -- SEARCH AND REPLACE WITH sed -- DATASETS WITH MULTIPLE DELIMITERS -- USEFUL SWITCHES IN sed -- WORKING WITH DATASETS -- Printing Lines -- Character Classes and sed -- Removing Control Characters -- COUNTING WORDS IN A DATASET -- BACK REFERENCES IN sed -- ONE-LINE sed COMMANDS -- POPULATE MISSING VALUES WITH THE sed COMMAND -- A DATASET WITH 1,000,000 ROWS -- Numeric Comparisons -- Counting Adjacent Digits -- Average Support Rate -- SUMMARY -- CHAPTER 6 PROCESSING DATASETS WITH AWK -- THE awk COMMAND -- Built-in Variables that Control awk -- How Does the awk Command Work? -- ALIGNING TEXT WITH THE printf COMMAND. CONDITIONAL LOGIC AND CONTROL STATEMENTS -- The while Statement -- A for loop in awk -- A for loop with a break Statement -- The next and continue Statements -- DELETING ALTERNATE LINES IN DATASETS -- MERGING LINES IN DATASETS -- Printing File Contents as a Single Line -- Joining Groups of Lines in a Text File -- Joining Alternate Lines in a Text File -- MATCHING WITH METACHARACTERS AND CHARACTER SETS -- PRINTING LINES USING CONDITIONAL LOGIC -- SPLITTING FILENAMES WITH awk -- WORKING WITH POSTFIX ARITHMETIC OPERATORS -- NUMERIC FUNCTIONS IN awk -- ONE-LINE awk COMMANDS -- USEFUL SHORT awk SCRIPTS -- PRINTING THE WORDS IN A TEXT STRING IN awk -- COUNT OCCURRENCES OF A STRING IN SPECIFIC ROWS -- PRINTING A STRING IN A FIXED NUMBER OF COLUMNS -- PRINTING A DATASET IN A FIXED NUMBER OF COLUMNS -- ALIGNING COLUMNS IN DATASETS -- ALIGNING COLUMNS AND MULTIPLE ROWS IN DATASETS -- DISPLAYING A SUBSET OF COLUMNS IN A TEXT FILE -- SUBSETS OF COLUMN-ALIGNED ROWS IN DATASETS -- COUNTING WORD FREQUENCY IN DATASETS -- DISPLAYING ONLY "PURE" WORDS IN A DATASET -- DELETE ROWS WITH MISSING VALUES -- WORKING WITH MULTI-LINE RECORDS IN AWK -- A SIMPLE USE CASE -- ANOTHER USE CASE -- A DATASET WITH 1,000,000 ROWS -- Counting Adjacent Digits -- Average Support Rate -- SUMMARY -- CHAPTER 7 PROCESSING DATASETS (PANDAS) -- PREREQUISITES FOR THIS CHAPTER -- ANALYZING MISSING DATA -- Causes of Missing Data -- PANDAS, CSV FILES, AND MISSING DATA -- Single Column CSV Files -- Two Column CSV Files -- MISSING DATA AND IMPUTATION -- Counting Missing Data Values -- Drop Redundant Columns -- Remove Duplicate Rows -- Display Duplicate Rows -- Uniformity of Data Values -- Too Many Missing Data Values -- Categorical Data -- Data Inconsistency -- Mean Value Imputation -- Random Value Imputation -- Multiple Imputation -- Matching and Hot Deck Imputation. Is a Zero Value Valid or Invalid? -- SKEWED DATASETS -- CSV FILES WITH MULTI-ROW RECORDS -- COLUMN SUBSET AND ROW SUBRANGE OF THE TITANIC CSV FILE -- DATA NORMALIZATION -- Assigning Classes to Data -- Other Data Cleaning Tasks -- DeepChecks and Data Validation -- HANDLING CATEGORICAL DATA -- Processing Inconsistent Categorical Data -- Mapping Categorical Data to Numeric Values -- Mapping Categorical Data to One Hot Encoded Values -- WORKING WITH CURRENCY -- WORKING WITH DATES -- Find Missing Dates -- Find Unique Dates -- Switch Date Formats -- WORKING WITH IMBALANCED DATASETS -- Data Sampling Techniques -- Removing Noisy Data -- Cost-sensitive Learning -- Detecting Imbalanced Data -- Rebalancing Datasets -- Specify stratify in Data Splits -- WHAT IS SMOTE? -- DATA WRANGLING -- Data Transformation: What Does This Mean? -- A DATASET WITH 1,000,000 ROWS -- Dataset Details -- Numeric Comparisons -- Counting Adjacent Digits -- SAVING CSV DATA TO XML, JSON, AND HTML FILES -- SUMMARY -- CHAPTER 8 NOSQL, SQLITE, AND PYTHON -- NON-RELATIONAL DATABASE SYSTEMS -- Advantages of Non-relational Databases -- WHAT IS NOSQL? -- What is NewSQL? -- RDBMS VERSUS NOSQL: WHICH ONE TO USE? -- Good Data Types for NoSQL -- Some Guidelines for Selecting a Database -- NoSQL Databases -- WHAT IS MONGODB? -- Features of MongoDB -- Installing MongoDB -- Launching MongoDB -- USEFUL MONGO APIS -- Metacharacters in Mongo Queries -- MONGODB COLLECTIONS AND DOCUMENTS -- Document Format in MongoDB -- CREATE A MONGODB COLLECTION -- WORKING WITH MONGODB COLLECTIONS -- Find All Android Phones -- Find All Android Phones in 2018 -- Insert a New Item (Document) -- Update an Existing Item (Document) -- Calculate the Average Price for Each Brand -- Calculate the Average Price for Each Brand in 2019 -- Import Data with mongoimport -- WHAT IS FUGUE? -- WHAT IS COMPASS? -- WHAT IS PYMONGO?. MYSQL, SQLALCHEMY, AND PANDAS. |
| Record Nr. | UNINA-9911006689403321 |
Campesato Oswald
|
||
| Bloomfield : , : Mercury Learning & Information, , 2022 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||
Pandas Basics
| Pandas Basics |
| Autore | Campesato Oswald |
| Edizione | [1st ed.] |
| Pubbl/distr/stampa | Bloomfield : , : Mercury Learning & Information, , 2022 |
| Descrizione fisica | 1 online resource (215 pages) |
| Disciplina | 005.133 |
| Soggetto topico | COMPUTERS / Programming Languages / Python |
| Soggetto non controllato |
Computer Science
Data Science Developers Matplotlib NumPy Programming Python Seaborn data mining |
| ISBN |
9781683928249
1683928245 9781683928256 1683928253 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto |
Cover -- Title Page -- Copyright -- Dedication -- Contents -- Preface -- Chapter 1: Introduction to Python -- Tools for Python -- easy_install and pip -- virtualenv -- IPython -- Python Installation -- Setting the PATH Environment Variable (Windows Only) -- Launching Python on Your Machine -- The Python Interactive Interpreter -- Python Identifiers -- Lines, Indentation, and Multi-lines -- Quotations and Comments -- Saving Your Code in a Module -- Some Standard Modules -- The help() and dir() Functions -- Compile Time and Runtime Code Checking -- Simple Data Types -- Working with Numbers -- Working with Other Bases -- The chr() Function -- The round() Function -- Formatting Numbers -- Working with Fractions -- Unicode and UTF-8 -- Working with Unicode -- Working with Strings -- Comparing Strings -- Formatting Strings -- Uninitialized Variables and the Value None -- Slicing and Splicing Strings -- Testing for Digits and Alphabetic Characters -- Search and Replace a String in Other Strings -- Remove Leading and Trailing Characters -- Printing Text without NewLine Characters -- Text Alignment -- Working with Dates -- Converting Strings to Dates -- Exception Handling -- Handling User Input -- Command-line Arguments -- Summary -- Chapter 2: Working with Data -- Dealing with Data: What Can Go Wrong? -- What is Data Drift? -- What are Datasets? -- Data Preprocessing -- Data Types -- Preparing Datasets -- Discrete Data Versus Continuous Data -- Binning Continuous Data -- Scaling Numeric Data via Normalization -- Scaling Numeric Data via Standardization -- Scaling Numeric Data via Robust Standardization -- What to Look for in Categorical Data -- Mapping Categorical Data to Numeric Values -- Working with Dates -- Working with Currency -- Working with Outliers and Anomalies -- Outlier Detection/Removal -- Finding Outliers with NumPy.
Finding Outliers with Pandas -- Calculating Z-scores to Find Outliers -- Finding Outliers with SkLearn (Optional) -- Working with Missing Data -- Imputing Values: When is Zero a Valid Value? -- Dealing with Imbalanced Datasets -- What is SMOTE? -- SMOTE extensions -- The Bias-Variance Tradeoff -- Types of Bias in Data -- Analyzing Classifiers (Optional) -- What is LIME? -- What is ANOVA? -- Summary -- Chapter 3: Introduction to Probability and Statistics -- What is a Probability? -- Calculating the Expected Value -- Random Variables -- Discrete versus Continuous Random Variables -- Well-known Probability Distributions -- Fundamental Concepts in Statistics -- The Mean -- The Median -- The Mode -- The Variance and Standard Deviation -- Population, Sample, and Population Variance -- Chebyshev's Inequality -- What is a p-value? -- The Moments of a Function (Optional) -- What is Skewness? -- What is Kurtosis? -- Data and Statistics -- The Central Limit Theorem -- Correlation versus Causation -- Statistical Inferences -- Statistical Terms: RSS, TSS, R^2, and F1 Score -- What is an F1 score? -- Gini Impurity, Entropy, and Perplexity -- What is the Gini Impurity? -- What is Entropy? -- Calculating the Gini Impurity and Entropy Values -- Multi-dimensional Gini Index -- What is Perplexity? -- Cross-Entropy and KL Divergence -- What is Cross-Entropy? -- What is KL Divergence? -- What's Their Purpose? -- Covariance and Correlation Matrices -- The Covariance Matrix -- Covariance Matrix: An Example -- The Correlation Matrix -- Eigenvalues and Eigenvectors -- Calculating Eigenvectors: A Simple Example -- Gauss Jordan Elimination (Optional) -- PCA (Principal Component Analysis) -- The New Matrix of Eigenvectors -- Well-known Distance Metrics -- Pearson Correlation Coefficient -- Jaccard Index (or Similarity) -- Local Sensitivity Hashing (Optional). Types of Distance Metrics -- What is Bayesian Inference? -- Bayes' Theorem -- Some Bayesian Terminology -- What is MAP? -- Why Use Bayes' Theorem? -- Summary -- Chapter 4: Introduction to Pandas (1) -- What is Pandas? -- Pandas Options and Settings -- Pandas Data Frames -- Data Frames and Data Cleaning Tasks -- Alternatives to Pandas -- A Pandas Data Frame with a NumPy Example -- Describing a Pandas Data Frame -- Pandas Boolean Data Frames -- Transposing a Pandas Data Frame -- Pandas Data Frames and Random Numbers -- Reading CSV Files in Pandas -- Specifying a Separator and Column Sets in Text Files -- Specifying an Index in Text Files -- The loc() and iloc() Methods in Pandas -- Converting Categorical Data to Numeric Data -- Matching and Splitting Strings in Pandas -- Converting Strings to Dates in Pandas -- Working with Date Ranges in Pandas -- Detecting Missing Dates in Pandas -- Interpolating Missing Dates in Pandas -- Other Operations with Dates in Pandas -- Merging and Splitting Columns in Pandas -- Reading HTML Web Pages in Pandas -- Saving a Pandas Data Frame as an HTML Web Page -- Summary -- Chapter 5: Introduction to Pandas (2) -- Combining Pandas Data Frames -- Data Manipulation with Pandas Data Frames (1) -- Data Manipulation with Pandas Data Frames (2) -- Data Manipulation with Pandas Data Frames (3) -- Pandas Data Frames and CSV Files -- Managing Columns in Data Frames -- Switching Columns -- Appending Columns -- Deleting Columns -- Inserting Columns -- Scaling Numeric Columns -- Managing Rows in Pandas -- Selecting a Range of Rows in Pandas -- Finding Duplicate Rows in Pandas -- Inserting New Rows in Pandas -- Handling Missing Data in Pandas -- Multiple Types of Missing Values -- Test for Numeric Values in a Column -- Replacing NaN Values in Pandas -- Summary -- Chapter 6: Introduction to Pandas (3) -- Threshold Values and Outliers. The Pandas Pipe Method -- Pandas query() Method for Filtering Data -- Sorting Data Frames in Pandas -- Working with groupby() in Pandas -- Working with apply() and mapapply() in Pandas -- Handling Outliers in Pandas -- Pandas Data Frames and Scatterplots -- Pandas Data Frames and Simple Statistics -- Aggregate Operations in Pandas Data Frames -- Aggregate Operations with the titanic.csv Dataset -- Save Data Frames as CSV Files and Zip Files -- Pandas Data Frames and Excel Spreadsheets -- Working with JSON-based Data -- Python Dictionary and JSON -- Python, Pandas, and JSON -- Window Functions in Pandas -- Useful One-line Commands in Pandas -- What is pandasql? -- What is Method Chaining? -- Pandas and Method Chaining -- Pandas Profiling -- Alternatives to Pandas -- Summary -- Chapter 7: Data Visualization -- What is Data Visualization? -- Types of Data Visualization -- What is Matplotlib? -- Lines in a Grid in Matplotlib -- A Colored Grid in Matplotlib -- Randomized Data Points in Matplotlib -- A Histogram in Matplotlib -- A Set of Line Segments in Matplotlib -- Plotting Multiple Lines in Matplotlib -- Trigonometric Functions in Matplotlib -- Display IQ Scores in Matplotlib -- Plot a Best-Fitting Line in Matplotlib -- The Iris Dataset in Sklearn -- Sklearn, Pandas, and the Iris Dataset -- Working with Seaborn -- Features of Seaborn -- Seaborn Built-in Datasets -- The Iris Dataset in Seaborn -- The Titanic Dataset in Seaborn -- Extracting Data from the Titanic Dataset in Seaborn (1) -- Extracting Data from the Titanic Dataset in Seaborn (2) -- Visualizing a Pandas Dataset in Seaborn -- Data Visualization in Pandas -- What is Bokeh? -- Summary -- Index. |
| Record Nr. | UNINA-9911006690203321 |
Campesato Oswald
|
||
| Bloomfield : , : Mercury Learning & Information, , 2022 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||