Bash for Data Scientists
| Bash for Data Scientists |
| Autore | Campesato Oswald |
| Edizione | [1st ed.] |
| Pubbl/distr/stampa | Bloomfield : , : Mercury Learning & Information, , 2022 |
| Descrizione fisica | 1 online resource (293 pages) |
| Disciplina | 005.43 |
| Soggetto topico | COMPUTERS / Programming Languages / Python |
| Soggetto non controllato |
Computer Science
Data Science Pandas Programming Python UNIX awk data mining grep sed |
| ISBN |
9781683929710
1683929713 9781683929727 1683929721 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto |
Intro -- Bash for Data Scientists -- CONTENTS -- PREFACE -- WHAT IS THE GOAL? -- IS THIS BOOK IS FOR ME AND WHAT WILL I LEARN? -- HOW WERE THE CODE SAMPLES CREATED? -- WHAT YOU NEED TO KNOW FOR THIS BOOK -- WHICH BASH COMMANDS ARE EXCLUDED? -- HOW DO I SET UP A COMMAND SHELL? -- WHAT ARE THE "NEXT STEPS" AFTER FINISHING THIS BOOK? -- CHAPTER 1 INTRODUCTION -- WHAT IS UNIX? -- Available Shell Types -- WHAT IS BASH? -- Getting Help for Bash Commands -- Navigating Around Directories -- The history Command -- LISTING FILENAMES WITH THE LS COMMAND -- DISPLAYING CONTENTS OF FILES -- The cat Command -- The head and tail Commands -- The Pipe Symbol -- The fold Command -- FILE OWNERSHIP: OWNER, GROUP, AND WORLD -- HIDDEN FILES -- HANDLING PROBLEMATIC FILENAMES -- WORKING WITH ENVIRONMENT VARIABLES -- The env Command -- Useful Environment Variables -- Setting the PATH Environment Variable -- Specifying Aliases and Environment Variables -- FINDING EXECUTABLE FILES -- THE printf COMMAND AND THE echo COMMAND -- THE cut COMMAND -- THE echo COMMAND AND WHITESPACES -- COMMAND SUBSTITUTION ("BACK TICK") -- THE PIPE SYMBOL AND MULTIPLE COMMA -- USING A SEMICOLON TO SEPARATE COMMANDS -- THE paste COMMAND -- Inserting Blank Lines with the paste Command -- A SIMPLE USE CASE WITH THE paste COMMAND -- A SIMPLE USE CASE WITH cut AND paste COMMANDS -- WORKING WITH META CHARACTERS -- WORKING WITH CHARACTER CLASSES -- WHAT ABOUT ZSH? -- Switching between bash and zsh -- Configuring zsh -- SUMMARY -- CHAPTER 2 FILES AND DIRECTORIES -- CREATE, COPY, REMOVE, AND MOVE FILES -- Creating Files -- Copying Files -- Copy Files with Command Substitution -- Deleting Files -- Moving Files -- THE BASENAME, DIRNAME, AND FILE COMMANDS -- THE wc COMMAND -- THE more COMMAND AND THE less COMMAND -- THE head COMMAND -- THE tail COMMAND -- FILE COMPARISON COMMANDS -- THE PARTS OF A FILENA.
WORKING WITH FILE PERMISSIONS -- The chmod Command -- The chown Command -- The chgrp Command -- The umask and ulimit Commands -- WORKING WITH DIRECTORIES -- Absolute and Relative Directories -- Absolute and Relative Path Names -- Creating Directories -- Removing Directories -- Changing Directories -- Renaming Directories -- USING QUOTE CHARACTERS -- STREAMS AND REDIRECTION COMMANDS -- METACHARACTERS AND CHARACTER CLASSES -- Digits and Characters -- Working with "^" and "\" and "!" -- FILENAMES AND METACHARACTERS -- SUMMARY -- CHAPTER 3 USEFUL COMMANDS -- THE join COMMAND -- THE fold COMMAND -- THE split COMMAND -- THE sort COMMAND -- THE uniq COMMAND -- HOW TO COMPARE FILES -- THE od COMMAND -- THE tr COMMAND -- A SIMPLE USE CASE -- THE find COMMAND -- THE tee COMMAND -- FILE COMPRESSION COMMANDS -- The tar command -- The cpio Command -- The gzip and gunzip Commands -- The bunzip2 Command -- The zip Command -- COMMANDS FOR zip FILES AND bz FILES -- INTERNAL FIELD SEPARATOR (IFS) -- DATA FROM A RANGE OF COLUMNS IN A DATASET -- WORKING WITH UNEVEN ROWS IN DATASETS -- THE alias COMMAND -- SUMMARY -- CHAPTER 4 CONDITIONAL LOGIC AND LOOPS -- ARITHMETIC OPERATIONS AND OPERATORS -- WORKING WITH ARRAYS -- ARRAYS AND TEXT FILES -- WORKING WITH VARIABLES -- Assigning Values to Variables -- WORKING WITH OPERATORS FOR STRINGS AND NUMBERS -- THE read COMMAND FOR USER INPUT -- THE test COMMAND FOR VARIABLES, FILES, AND DIRECTORIES -- Relational Operators -- Boolean Operators -- String Operators -- File Test Operators -- CONDITIONAL LOGIC WITH if/else STATEMENTS -- THE case/esac STATEMENT -- ARITHMETIC OPERATORS AND COMPARISONS -- WORKING WITH STRINGS IN SHELL SCRIPTS -- Working with Strings -- WORKING WITH LOOPS -- Using a for loop -- WORKING WITH NESTED LOOPS -- USING A while LOOP -- THE while, case, AND if/elif/fi STATEMENTS -- USING AN UNTIL LOOP. USER-DEFINED FUNCTIONS -- CREATING A SIMPLE MENU FROM SHELL COMMANDS -- SUMMARY -- CHAPTER 5 PROCESSING DATASETS WITH GREPAND SED -- WHAT IS THE grep COMMAND? -- METACHARACTERS AND THE grep COMMAND -- ESCAPING METACHARACTERS WITH THE grep COMMAND -- USEFUL OPTIONS FOR THE grep COMMAND -- Character Classes and the grep Command -- WORKING WITH THE -C OPTION IN grep -- MATCHING A RANGE OF LINES -- USING BACK REFERENCES IN THE grep COMMAND -- FINDING EMPTY LINES IN DATASETS -- USING KEYS TO SEARCH DATASETS -- THE BACKSLASH CHARACTER AND THE grep COMMAND -- MULTIPLE MATCHES IN THE GREP COMMAND -- THE grep COMMAND AND THE xargs COMMAND -- Searching zip Files for a String -- CHECKING FOR A UNIQUE KEY VALUE -- Redirecting Error Messages -- THE egrep COMMAND AND fgrep COMMAND -- Displaying "Pure" Words in a Dataset with egrep -- Redirecting Error Messages -- THE egrep COMMAND AND fgrep COMMAND -- Displaying "Pure" Words in a Dataset with egrep -- The fgrep Command -- DELETE ROWS WITH MISSING VALUES -- A SIMPLE USE CASE -- WHAT IS THE sed COMMAND? -- The sed Execution Cycle -- MATCHING STRING PATTERNS USING sed -- SUBSTITUTING STRING PATTERNS USING sed -- Replacing Vowels from a String or a File -- Deleting Multiple Digits and Letters from a String -- SEARCH AND REPLACE WITH sed -- DATASETS WITH MULTIPLE DELIMITERS -- USEFUL SWITCHES IN sed -- WORKING WITH DATASETS -- Printing Lines -- Character Classes and sed -- Removing Control Characters -- COUNTING WORDS IN A DATASET -- BACK REFERENCES IN sed -- ONE-LINE sed COMMANDS -- POPULATE MISSING VALUES WITH THE sed COMMAND -- A DATASET WITH 1,000,000 ROWS -- Numeric Comparisons -- Counting Adjacent Digits -- Average Support Rate -- SUMMARY -- CHAPTER 6 PROCESSING DATASETS WITH AWK -- THE awk COMMAND -- Built-in Variables that Control awk -- How Does the awk Command Work? -- ALIGNING TEXT WITH THE printf COMMAND. CONDITIONAL LOGIC AND CONTROL STATEMENTS -- The while Statement -- A for loop in awk -- A for loop with a break Statement -- The next and continue Statements -- DELETING ALTERNATE LINES IN DATASETS -- MERGING LINES IN DATASETS -- Printing File Contents as a Single Line -- Joining Groups of Lines in a Text File -- Joining Alternate Lines in a Text File -- MATCHING WITH METACHARACTERS AND CHARACTER SETS -- PRINTING LINES USING CONDITIONAL LOGIC -- SPLITTING FILENAMES WITH awk -- WORKING WITH POSTFIX ARITHMETIC OPERATORS -- NUMERIC FUNCTIONS IN awk -- ONE-LINE awk COMMANDS -- USEFUL SHORT awk SCRIPTS -- PRINTING THE WORDS IN A TEXT STRING IN awk -- COUNT OCCURRENCES OF A STRING IN SPECIFIC ROWS -- PRINTING A STRING IN A FIXED NUMBER OF COLUMNS -- PRINTING A DATASET IN A FIXED NUMBER OF COLUMNS -- ALIGNING COLUMNS IN DATASETS -- ALIGNING COLUMNS AND MULTIPLE ROWS IN DATASETS -- DISPLAYING A SUBSET OF COLUMNS IN A TEXT FILE -- SUBSETS OF COLUMN-ALIGNED ROWS IN DATASETS -- COUNTING WORD FREQUENCY IN DATASETS -- DISPLAYING ONLY "PURE" WORDS IN A DATASET -- DELETE ROWS WITH MISSING VALUES -- WORKING WITH MULTI-LINE RECORDS IN AWK -- A SIMPLE USE CASE -- ANOTHER USE CASE -- A DATASET WITH 1,000,000 ROWS -- Counting Adjacent Digits -- Average Support Rate -- SUMMARY -- CHAPTER 7 PROCESSING DATASETS (PANDAS) -- PREREQUISITES FOR THIS CHAPTER -- ANALYZING MISSING DATA -- Causes of Missing Data -- PANDAS, CSV FILES, AND MISSING DATA -- Single Column CSV Files -- Two Column CSV Files -- MISSING DATA AND IMPUTATION -- Counting Missing Data Values -- Drop Redundant Columns -- Remove Duplicate Rows -- Display Duplicate Rows -- Uniformity of Data Values -- Too Many Missing Data Values -- Categorical Data -- Data Inconsistency -- Mean Value Imputation -- Random Value Imputation -- Multiple Imputation -- Matching and Hot Deck Imputation. Is a Zero Value Valid or Invalid? -- SKEWED DATASETS -- CSV FILES WITH MULTI-ROW RECORDS -- COLUMN SUBSET AND ROW SUBRANGE OF THE TITANIC CSV FILE -- DATA NORMALIZATION -- Assigning Classes to Data -- Other Data Cleaning Tasks -- DeepChecks and Data Validation -- HANDLING CATEGORICAL DATA -- Processing Inconsistent Categorical Data -- Mapping Categorical Data to Numeric Values -- Mapping Categorical Data to One Hot Encoded Values -- WORKING WITH CURRENCY -- WORKING WITH DATES -- Find Missing Dates -- Find Unique Dates -- Switch Date Formats -- WORKING WITH IMBALANCED DATASETS -- Data Sampling Techniques -- Removing Noisy Data -- Cost-sensitive Learning -- Detecting Imbalanced Data -- Rebalancing Datasets -- Specify stratify in Data Splits -- WHAT IS SMOTE? -- DATA WRANGLING -- Data Transformation: What Does This Mean? -- A DATASET WITH 1,000,000 ROWS -- Dataset Details -- Numeric Comparisons -- Counting Adjacent Digits -- SAVING CSV DATA TO XML, JSON, AND HTML FILES -- SUMMARY -- CHAPTER 8 NOSQL, SQLITE, AND PYTHON -- NON-RELATIONAL DATABASE SYSTEMS -- Advantages of Non-relational Databases -- WHAT IS NOSQL? -- What is NewSQL? -- RDBMS VERSUS NOSQL: WHICH ONE TO USE? -- Good Data Types for NoSQL -- Some Guidelines for Selecting a Database -- NoSQL Databases -- WHAT IS MONGODB? -- Features of MongoDB -- Installing MongoDB -- Launching MongoDB -- USEFUL MONGO APIS -- Metacharacters in Mongo Queries -- MONGODB COLLECTIONS AND DOCUMENTS -- Document Format in MongoDB -- CREATE A MONGODB COLLECTION -- WORKING WITH MONGODB COLLECTIONS -- Find All Android Phones -- Find All Android Phones in 2018 -- Insert a New Item (Document) -- Update an Existing Item (Document) -- Calculate the Average Price for Each Brand -- Calculate the Average Price for Each Brand in 2019 -- Import Data with mongoimport -- WHAT IS FUGUE? -- WHAT IS COMPASS? -- WHAT IS PYMONGO?. MYSQL, SQLALCHEMY, AND PANDAS. |
| Record Nr. | UNINA-9911006689403321 |
Campesato Oswald
|
||
| Bloomfield : , : Mercury Learning & Information, , 2022 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||
Datenwissenschaften und Gesellschaft : Die Genese eines transversalen Wissensfeldes / / Philippe Saner
| Datenwissenschaften und Gesellschaft : Die Genese eines transversalen Wissensfeldes / / Philippe Saner |
| Autore | Saner Philippe |
| Pubbl/distr/stampa | Bielefeld, : transcript Verlag, 2022 |
| Descrizione fisica | 1 online resource (320 pages) |
| Disciplina | 005.7 |
| Collana | Digitale Soziologie |
| Soggetto topico |
Big data
Data mining |
| Soggetto non controllato |
Datenwissenschaft
Digitalisierung Politik Arbeitsmarkt Hochschulbildung Feldtheorie Wissenschaft Big Data Universität Datengesellschaft Schweiz Wissenschaftssoziologie Wissenssoziologie Bildungsforschung Soziologie Data Science Digitalization Politics Labour Market University Education Field Theory Science University Data Society Switzerland Sociology of Science Sociology of Knowledge Educational Research Sociology |
| ISBN | 3-8394-6259-2 |
| Classificazione | AK 26600 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | ger |
| Nota di contenuto | Frontmatter -- Editorial -- Inhalt -- Vorwort -- Abbildungsverzeichnis -- Tabellenverzeichnis -- Danksagung -- Kapitel 1 - Einleitung -- Teil I - Grundlagen -- Kapitel 2 - Transversale Wissensgebiete als Räume zwischen Feldern -- Kapitel 3 - »Data Science« als soziales Phänomen: Genese und multiple Perspektiven -- Kapitel 4 - Forschungsdesign -- Teil II - Repräsentationen und Imaginationen von Datenwissenschaften in Arbeitsmarkt und Politik -- Einleitung -- Kapitel 5 - Repräsentationen der Datenwissenschaften im schweizerischen Arbeitsmarkt -- Kapitel 6 - Zukunftsentwürfe der Datenwissenschaften in Diskursen der Bildungs- und Forschungspolitik -- Teil III - Konstruktionen der Datenwissenschaften im akademischen Feld -- Einleitung -- Kapitel 7 - Die Konstruktion der Datenwissenschaften im akademischen Feld durch Begriffsarbeit und boundary work -- Kapitel 8 - Die Verhandlung der Datenwissenschaften in Universitäten und Hochschulen -- Kapitel 9 - Die Strukturlogik datenwissenschaftlicher Curricula -- Kapitel 10 - Die Suche nach den richtigen Kompetenzen -- Teil IV - Schlussbetrachtungen -- Kapitel 11 - Synthese -- Bibliografie -- Anhang |
| Record Nr. | UNISA-996483168603316 |
Saner Philippe
|
||
| Bielefeld, : transcript Verlag, 2022 | ||
| Lo trovi qui: Univ. di Salerno | ||
| ||
Datenwissenschaften und Gesellschaft : Die Genese eines transversalen Wissensfeldes / / Philippe Saner
| Datenwissenschaften und Gesellschaft : Die Genese eines transversalen Wissensfeldes / / Philippe Saner |
| Autore | Saner Philippe |
| Pubbl/distr/stampa | Bielefeld, : transcript Verlag, 2022 |
| Descrizione fisica | 1 online resource (320 pages) |
| Disciplina | 005.7 |
| Collana | Digitale Soziologie |
| Soggetto topico |
Big data
Data mining |
| Soggetto non controllato |
Datenwissenschaft
Digitalisierung Politik Arbeitsmarkt Hochschulbildung Feldtheorie Wissenschaft Big Data Universität Datengesellschaft Schweiz Wissenschaftssoziologie Wissenssoziologie Bildungsforschung Soziologie Data Science Digitalization Politics Labour Market University Education Field Theory Science University Data Society Switzerland Sociology of Science Sociology of Knowledge Educational Research Sociology |
| ISBN | 3-8394-6259-2 |
| Classificazione | AK 26600 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | ger |
| Nota di contenuto | Frontmatter -- Editorial -- Inhalt -- Vorwort -- Abbildungsverzeichnis -- Tabellenverzeichnis -- Danksagung -- Kapitel 1 - Einleitung -- Teil I - Grundlagen -- Kapitel 2 - Transversale Wissensgebiete als Räume zwischen Feldern -- Kapitel 3 - »Data Science« als soziales Phänomen: Genese und multiple Perspektiven -- Kapitel 4 - Forschungsdesign -- Teil II - Repräsentationen und Imaginationen von Datenwissenschaften in Arbeitsmarkt und Politik -- Einleitung -- Kapitel 5 - Repräsentationen der Datenwissenschaften im schweizerischen Arbeitsmarkt -- Kapitel 6 - Zukunftsentwürfe der Datenwissenschaften in Diskursen der Bildungs- und Forschungspolitik -- Teil III - Konstruktionen der Datenwissenschaften im akademischen Feld -- Einleitung -- Kapitel 7 - Die Konstruktion der Datenwissenschaften im akademischen Feld durch Begriffsarbeit und boundary work -- Kapitel 8 - Die Verhandlung der Datenwissenschaften in Universitäten und Hochschulen -- Kapitel 9 - Die Strukturlogik datenwissenschaftlicher Curricula -- Kapitel 10 - Die Suche nach den richtigen Kompetenzen -- Teil IV - Schlussbetrachtungen -- Kapitel 11 - Synthese -- Bibliografie -- Anhang |
| Record Nr. | UNINA-9910591164703321 |
Saner Philippe
|
||
| Bielefeld, : transcript Verlag, 2022 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||
From Opinion Mining to Financial Argument Mining
| From Opinion Mining to Financial Argument Mining |
| Autore | Chen Chung-Chi |
| Pubbl/distr/stampa | Springer Nature, 2021 |
| Descrizione fisica | 1 online resource (102 pages) |
| Altri autori (Persone) |
HuangHen-Hsen
ChenHsin-Hsi |
| Collana | SpringerBriefs in Computer Science |
| Soggetto topico |
Natural language & machine translation
Data mining Algorithms & data structures Artificial intelligence Information technology: general issues |
| Soggetto non controllato |
Natural Language Processing (NLP)
Data Mining and Knowledge Discovery Data Structures and Information Theory Artificial Intelligence Computer Applications Data Science Computer and Information Systems Applications Open Access financial opinion mining text mining in finance financial technology application FinTech argument mining in finance opinion quality evaluation numeral understanding Natural language & machine translation Data mining Expert systems / knowledge-based systems Algorithms & data structures Information theory Information technology: general issues |
| ISBN | 981-16-2881-5 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Record Nr. | UNISA-996464443103316 |
Chen Chung-Chi
|
||
| Springer Nature, 2021 | ||
| Lo trovi qui: Univ. di Salerno | ||
| ||
From Opinion Mining to Financial Argument Mining
| From Opinion Mining to Financial Argument Mining |
| Autore | Chen Chung-Chi |
| Pubbl/distr/stampa | Springer Nature, 2021 |
| Descrizione fisica | 1 online resource (102 pages) |
| Altri autori (Persone) |
HuangHen-Hsen
ChenHsin-Hsi |
| Collana | SpringerBriefs in Computer Science |
| Soggetto topico |
Natural language & machine translation
Data mining Algorithms & data structures Artificial intelligence Information technology: general issues |
| Soggetto non controllato |
Natural Language Processing (NLP)
Data Mining and Knowledge Discovery Data Structures and Information Theory Artificial Intelligence Computer Applications Data Science Computer and Information Systems Applications Open Access financial opinion mining text mining in finance financial technology application FinTech argument mining in finance opinion quality evaluation numeral understanding Natural language & machine translation Data mining Expert systems / knowledge-based systems Algorithms & data structures Information theory Information technology: general issues |
| ISBN | 981-16-2881-5 |
| Classificazione | COM004000COM018000COM021030COM031000COM073000 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Record Nr. | UNINA-9910482868303321 |
Chen Chung-Chi
|
||
| Springer Nature, 2021 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||
Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)
| Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA) |
| Autore | Jung Christopher [Hrsg.]Meyer, Jörg [Hrsg.]Streit, Achim [Hrsg.] |
| Pubbl/distr/stampa | KIT Scientific Publishing, 2017 |
| Descrizione fisica | 1 online resource (V, 259 p. p.) |
| Soggetto non controllato |
Big Data
data analysis data life cycle data management data science Data Science Datenanalyse Datenlebenszyklus Datenmanagement |
| ISBN | 1000071931 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Altri titoli varianti | Helmholtz Portfolio Theme Large-Scale Data Management and Analysis |
| Record Nr. | UNINA-9910346960303321 |
Jung Christopher [Hrsg.]Meyer, Jörg [Hrsg.]Streit, Achim [Hrsg.]
|
||
| KIT Scientific Publishing, 2017 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||
Pandas Basics
| Pandas Basics |
| Autore | Campesato Oswald |
| Edizione | [1st ed.] |
| Pubbl/distr/stampa | Bloomfield : , : Mercury Learning & Information, , 2022 |
| Descrizione fisica | 1 online resource (215 pages) |
| Disciplina | 005.133 |
| Soggetto topico | COMPUTERS / Programming Languages / Python |
| Soggetto non controllato |
Computer Science
Data Science Developers Matplotlib NumPy Programming Python Seaborn data mining |
| ISBN |
9781683928249
1683928245 9781683928256 1683928253 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto |
Cover -- Title Page -- Copyright -- Dedication -- Contents -- Preface -- Chapter 1: Introduction to Python -- Tools for Python -- easy_install and pip -- virtualenv -- IPython -- Python Installation -- Setting the PATH Environment Variable (Windows Only) -- Launching Python on Your Machine -- The Python Interactive Interpreter -- Python Identifiers -- Lines, Indentation, and Multi-lines -- Quotations and Comments -- Saving Your Code in a Module -- Some Standard Modules -- The help() and dir() Functions -- Compile Time and Runtime Code Checking -- Simple Data Types -- Working with Numbers -- Working with Other Bases -- The chr() Function -- The round() Function -- Formatting Numbers -- Working with Fractions -- Unicode and UTF-8 -- Working with Unicode -- Working with Strings -- Comparing Strings -- Formatting Strings -- Uninitialized Variables and the Value None -- Slicing and Splicing Strings -- Testing for Digits and Alphabetic Characters -- Search and Replace a String in Other Strings -- Remove Leading and Trailing Characters -- Printing Text without NewLine Characters -- Text Alignment -- Working with Dates -- Converting Strings to Dates -- Exception Handling -- Handling User Input -- Command-line Arguments -- Summary -- Chapter 2: Working with Data -- Dealing with Data: What Can Go Wrong? -- What is Data Drift? -- What are Datasets? -- Data Preprocessing -- Data Types -- Preparing Datasets -- Discrete Data Versus Continuous Data -- Binning Continuous Data -- Scaling Numeric Data via Normalization -- Scaling Numeric Data via Standardization -- Scaling Numeric Data via Robust Standardization -- What to Look for in Categorical Data -- Mapping Categorical Data to Numeric Values -- Working with Dates -- Working with Currency -- Working with Outliers and Anomalies -- Outlier Detection/Removal -- Finding Outliers with NumPy.
Finding Outliers with Pandas -- Calculating Z-scores to Find Outliers -- Finding Outliers with SkLearn (Optional) -- Working with Missing Data -- Imputing Values: When is Zero a Valid Value? -- Dealing with Imbalanced Datasets -- What is SMOTE? -- SMOTE extensions -- The Bias-Variance Tradeoff -- Types of Bias in Data -- Analyzing Classifiers (Optional) -- What is LIME? -- What is ANOVA? -- Summary -- Chapter 3: Introduction to Probability and Statistics -- What is a Probability? -- Calculating the Expected Value -- Random Variables -- Discrete versus Continuous Random Variables -- Well-known Probability Distributions -- Fundamental Concepts in Statistics -- The Mean -- The Median -- The Mode -- The Variance and Standard Deviation -- Population, Sample, and Population Variance -- Chebyshev's Inequality -- What is a p-value? -- The Moments of a Function (Optional) -- What is Skewness? -- What is Kurtosis? -- Data and Statistics -- The Central Limit Theorem -- Correlation versus Causation -- Statistical Inferences -- Statistical Terms: RSS, TSS, R^2, and F1 Score -- What is an F1 score? -- Gini Impurity, Entropy, and Perplexity -- What is the Gini Impurity? -- What is Entropy? -- Calculating the Gini Impurity and Entropy Values -- Multi-dimensional Gini Index -- What is Perplexity? -- Cross-Entropy and KL Divergence -- What is Cross-Entropy? -- What is KL Divergence? -- What's Their Purpose? -- Covariance and Correlation Matrices -- The Covariance Matrix -- Covariance Matrix: An Example -- The Correlation Matrix -- Eigenvalues and Eigenvectors -- Calculating Eigenvectors: A Simple Example -- Gauss Jordan Elimination (Optional) -- PCA (Principal Component Analysis) -- The New Matrix of Eigenvectors -- Well-known Distance Metrics -- Pearson Correlation Coefficient -- Jaccard Index (or Similarity) -- Local Sensitivity Hashing (Optional). Types of Distance Metrics -- What is Bayesian Inference? -- Bayes' Theorem -- Some Bayesian Terminology -- What is MAP? -- Why Use Bayes' Theorem? -- Summary -- Chapter 4: Introduction to Pandas (1) -- What is Pandas? -- Pandas Options and Settings -- Pandas Data Frames -- Data Frames and Data Cleaning Tasks -- Alternatives to Pandas -- A Pandas Data Frame with a NumPy Example -- Describing a Pandas Data Frame -- Pandas Boolean Data Frames -- Transposing a Pandas Data Frame -- Pandas Data Frames and Random Numbers -- Reading CSV Files in Pandas -- Specifying a Separator and Column Sets in Text Files -- Specifying an Index in Text Files -- The loc() and iloc() Methods in Pandas -- Converting Categorical Data to Numeric Data -- Matching and Splitting Strings in Pandas -- Converting Strings to Dates in Pandas -- Working with Date Ranges in Pandas -- Detecting Missing Dates in Pandas -- Interpolating Missing Dates in Pandas -- Other Operations with Dates in Pandas -- Merging and Splitting Columns in Pandas -- Reading HTML Web Pages in Pandas -- Saving a Pandas Data Frame as an HTML Web Page -- Summary -- Chapter 5: Introduction to Pandas (2) -- Combining Pandas Data Frames -- Data Manipulation with Pandas Data Frames (1) -- Data Manipulation with Pandas Data Frames (2) -- Data Manipulation with Pandas Data Frames (3) -- Pandas Data Frames and CSV Files -- Managing Columns in Data Frames -- Switching Columns -- Appending Columns -- Deleting Columns -- Inserting Columns -- Scaling Numeric Columns -- Managing Rows in Pandas -- Selecting a Range of Rows in Pandas -- Finding Duplicate Rows in Pandas -- Inserting New Rows in Pandas -- Handling Missing Data in Pandas -- Multiple Types of Missing Values -- Test for Numeric Values in a Column -- Replacing NaN Values in Pandas -- Summary -- Chapter 6: Introduction to Pandas (3) -- Threshold Values and Outliers. The Pandas Pipe Method -- Pandas query() Method for Filtering Data -- Sorting Data Frames in Pandas -- Working with groupby() in Pandas -- Working with apply() and mapapply() in Pandas -- Handling Outliers in Pandas -- Pandas Data Frames and Scatterplots -- Pandas Data Frames and Simple Statistics -- Aggregate Operations in Pandas Data Frames -- Aggregate Operations with the titanic.csv Dataset -- Save Data Frames as CSV Files and Zip Files -- Pandas Data Frames and Excel Spreadsheets -- Working with JSON-based Data -- Python Dictionary and JSON -- Python, Pandas, and JSON -- Window Functions in Pandas -- Useful One-line Commands in Pandas -- What is pandasql? -- What is Method Chaining? -- Pandas and Method Chaining -- Pandas Profiling -- Alternatives to Pandas -- Summary -- Chapter 7: Data Visualization -- What is Data Visualization? -- Types of Data Visualization -- What is Matplotlib? -- Lines in a Grid in Matplotlib -- A Colored Grid in Matplotlib -- Randomized Data Points in Matplotlib -- A Histogram in Matplotlib -- A Set of Line Segments in Matplotlib -- Plotting Multiple Lines in Matplotlib -- Trigonometric Functions in Matplotlib -- Display IQ Scores in Matplotlib -- Plot a Best-Fitting Line in Matplotlib -- The Iris Dataset in Sklearn -- Sklearn, Pandas, and the Iris Dataset -- Working with Seaborn -- Features of Seaborn -- Seaborn Built-in Datasets -- The Iris Dataset in Seaborn -- The Titanic Dataset in Seaborn -- Extracting Data from the Titanic Dataset in Seaborn (1) -- Extracting Data from the Titanic Dataset in Seaborn (2) -- Visualizing a Pandas Dataset in Seaborn -- Data Visualization in Pandas -- What is Bokeh? -- Summary -- Index. |
| Record Nr. | UNINA-9911006690203321 |
Campesato Oswald
|
||
| Bloomfield : , : Mercury Learning & Information, , 2022 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||
Process mining workshops : ICPM 2021 international workshops, Eindhoven, The Netherlands, October 31 - November 4, 2021 : revised selected papers / / editors, Jorge Muñoz Gama, Xixi Lu
| Process mining workshops : ICPM 2021 international workshops, Eindhoven, The Netherlands, October 31 - November 4, 2021 : revised selected papers / / editors, Jorge Muñoz Gama, Xixi Lu |
| Autore | Munoz-Gama Jorge |
| Pubbl/distr/stampa | Cham, : Springer Nature, 2022 |
| Descrizione fisica | 1 online resource (xiv, 410 pages) : illustrations (chiefly color) |
| Altri autori (Persone) |
Munoz-GamaJorge
LuXixi |
| Collana | Lecture notes in business information processing |
| Soggetto topico |
Data mining
Electronic data processing |
| Soggetto non controllato |
Process Mining
Process Discovery Process Analytics Process Querying Conformance Checking Predictive Process Monitoring Data Science Event Data Streaming Analytics Machine Learning Decision Support Systems Business Process Management Information Systems Petri Nets Open Access |
| ISBN | 3-030-98581-4 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto |
Intro -- Preface -- Organization -- Contents -- XES 2.0 Workshop and Survey -- Rethinking the Input for Process Mining: Insights from the XES Survey and Workshop -- 1 Introduction -- 2 XES Standard: A Brief Overview -- 3 Survey Design and Insights -- 4 Adding Context: Reflections from the XES 2.0 Workshop -- 5 Conclusion -- References -- EdbA 2021: 2nd International Workshop on Event Data and Behavioral Analytics -- Second International Workshop on Event Data and Behavioral Analytics (EdbA'21) -- Organization -- Workshop Chairs -- Program Committee -- Probability Estimation of Uncertain Process Trace Realizations -- 1 Introduction -- 2 Related Work -- 3 Running Example -- 4 Preliminaries -- 5 Method -- 6 Validation of Probability Estimates -- 7 Conclusion -- References -- Visualizing Trace Variants from Partially Ordered Event Data -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Visualizing Trace Variants -- 4.1 Approach -- 4.2 Formal Guarantees -- 4.3 Limitations -- 4.4 Implementation -- 5 Evaluation -- 6 Conclusion -- References -- Analyzing Multi-level BOM-Structured Event Data -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Methods -- 4.1 Analysis Methodology -- 4.2 M2BOM-Structured Assembly Processes -- 5 Case Study -- 6 Conclusion -- References -- Linac: A Smart Environment Simulator of Human Activities -- 1 Introduction -- 2 Existing Solutions -- 3 Proposed Simulation Solution -- 3.1 Configuration of the Smart Environment -- 3.2 Configuration of the Agents' Behavior - AIL Language -- 3.3 Simulation Execution -- 3.4 Clock Simulation -- 3.5 MQTT Output -- 4 Implementation -- 5 Evaluation -- 5.1 Configuration -- 5.2 Results -- 6 Conclusions and Future Works -- References -- Root Cause Analysis in Process Mining with Probabilistic Temporal Logic -- 1 Introduction -- 2 Related Work -- 3 The AITIA-PM Algorithm.
3.1 Background -- 3.2 Algorithmic Procedure -- 4 Demonstration -- 5 Conclusion -- References -- xPM: A Framework for Process Mining with Exogenous Data -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 A Framework for Process Mining with Exogenous Data -- 4.1 Linking -- 4.2 Slicing -- 4.3 Transformation -- 4.4 Discovery -- 4.5 Enhancing -- 5 Evaluation -- 5.1 Procedure -- 5.2 Quality Measures -- 5.3 Event Logs and Exogenous Data -- 5.4 Results and Discussion -- 6 Conclusion -- References -- A Bridging Model for Process Mining and IoT -- 1 Introduction -- 2 Background -- 2.1 IoT Ontologies -- 2.2 Business Process Context Modelling -- 3 Conceptual Ambiguity in IoT and PM -- 3.1 IoT Data -- 3.2 Context in PM vs Context in IoT -- 3.3 Process Event vs IoT Event -- 4 Connecting IoT and Process Mining: A Conceptual Model -- 5 Use Case Validation -- 6 Related Work -- 7 Conclusion -- References -- ML4PM 2021: 2nd International Workshop in Leveraging Machine Learning for Process Mining -- 2nd International Workshop in Leveraging Machine Learning for Process Mining (ML4PM 2021) -- Organization -- Workshop Chairs -- Program Committee -- Additional Reviewers -- Exploiting Instance Graphs and Graph Neural Networks for Next Activity Prediction -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 Building Instance Graphs -- 3.2 Data Preprocessing -- 3.3 Deep Graph Convolutional Neural Network -- 4 Experiments -- 4.1 Experimental Setup -- 4.2 Results -- 5 Conclusions and Future Works -- References -- Can Deep Neural Networks Learn Process Model Structure? An Assessment Framework and Analysis -- 1 Introduction -- 2 Related Work -- 3 A Framework for Assessing the Generalisation Capacity of RNNs -- 3.1 The Resampling Procedure -- 3.2 Metrics -- 4 Experimental Evaluation -- 4.1 Process Models -- 4.2 Hyperparameter Search -- 4.3 Results -- 5 Discussion. 6 Conclusion and Future Work -- References -- Remaining Time Prediction for Processes with Inter-case Dynamics -- 1 Introduction -- 2 Preliminaries and Related Work -- 2.1 Related Work -- 2.2 RTM Background -- 2.3 Performance Spectrum with Error Progression -- 3 Approach -- 3.1 Detecting Uncertain Segments -- 3.2 Identifying Inter-case Dynamics in Uncertain Segments -- 3.3 Inter-case Feature Creation -- 3.4 Predicting the Next Segment -- 3.5 Predicting Waiting Time -- 4 Evaluation -- 4.1 Experimental Setup -- 4.2 Results -- 5 Conclusion -- References -- Event Log Sampling for Predictive Monitoring -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Proposed Sampling Methods -- 5 Evaluation -- 5.1 Event Logs -- 5.2 Implementation -- 5.3 Evaluation Setting -- 5.4 Experimental Results -- 6 Discussion -- 7 Conclusion -- References -- Active Anomaly Detection for Key Item Selection in Process Auditing -- 1 Introduction -- 2 Related Work -- 2.1 Anomaly Detection -- 2.2 Active Anomaly Detection -- 2.3 Trace Visualisation -- 3 Active Selection Approach -- 3.1 Step One: Encode Process Data -- 3.2 Step Two: Assign Anomaly Score -- 3.3 Step Three: Actively Label Exceptions -- 4 Evaluation -- 4.1 Step One: Encode Process Data -- 4.2 Step Two: Assign Anomaly Score -- 4.3 Step Three: Actively Label Exceptions -- 4.4 Performance Results -- 5 Discussion -- 5.1 Cycle One -- 5.2 Cycle Two -- 5.3 Cycle Three -- 6 Limitations -- 7 Conclusion and Future Work -- References -- Prescriptive Process Monitoring Under Resource Constraints: A Causal Inference Approach -- 1 Introduction -- 2 Background and Related Work -- 2.1 Predictive Process Monitoring -- 2.2 Prescriptive Process Monitoring -- 2.3 Causal Inference -- 3 Approach -- 3.1 Log Preprocessing -- 3.2 Predictive Model -- 3.3 Causal Model -- 3.4 Resource Allocator -- 4 Evaluation -- 4.1 Dataset. 4.2 Experiment Setup -- 4.3 Results -- 4.4 Threats to Validity -- 5 Conclusion -- References -- Quantifying Explainability in Outcome-Oriented Predictive Process Monitoring -- 1 Introduction -- 2 Preliminaries -- 3 Explainability in OOPPM -- 3.1 Explainability Through Interpretability and Faithfulness -- 3.2 Logit Leaf Model -- 3.3 Generalized Logistic Rule Model -- 4 Experimental Evaluation -- 4.1 Benchmark Models -- 4.2 Event Logs -- 4.3 Implementation -- 4.4 Quantitative Metrics Results -- 5 Conclusion -- References -- SA4PM 2021: 2nd International Workshop on Streaming Analytics for Process Mining -- 2nd International Workshop on Streaming Analytics for Process Mining (SA4PM) -- Organization -- Workshop Chairs -- Program Committee -- Online Prediction of Aggregated Retailer Consumer Behaviour -- 1 Introduction -- 2 Framework -- 2.1 Features -- 2.2 Clustering -- 2.3 Training -- 2.4 Predicting -- 3 Experimental Evaluation -- 3.1 Experimental Setup -- 3.2 Results -- 4 Related Work -- 5 Conclusion and Future Work -- References -- PErrCas: Process Error Cascade Mining in Trace Streams -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Online Cascade Mining -- 4.1 Outlier Segment-Level Events -- 4.2 Error Cascade Construction -- 4.3 Cascade Patterns -- 5 Evaluation -- 5.1 Synthetic Data -- 5.2 Travel Reimbursement Process -- 6 Conclusion -- References -- Continuous Performance Evaluation for Business Process Outcome Monitoring -- 1 Introduction -- 2 Related Work -- 3 Continuous Prediction Evaluation Framework -- 4 Performance Evaluation Methods -- 4.1 Evaluating Performance Using a Local Timeline -- 4.2 Real-Time Model Performance -- 5 Experimental Analysis and Results -- 6 Conclusions -- References -- PQMI 2021: 6th International Workshop on Process Querying, Manipulation, and Intelligence. 6th International Workshop on Process Querying, Manipulation, and Intelligence (PQMI 2021) -- Organization -- Workshop Organizers -- Program Committee -- An Event Data Extraction Approach from SAP ERP for Process Mining -- 1 Introduction -- 2 Background -- 2.1 Object-Centric Event Logs -- 2.2 SAP: Entities and Relationships -- 3 Extracting Event Data from SAP ERP: Approach -- 3.1 Building Graphs of Relations -- 3.2 Extracting Object-Centric Event Logs -- 4 Extracting Event Data from SAP ERP: Tool -- 5 Assessment -- 5.1 Building a Graph of Relations -- 5.2 Extracting Object-Centric Event Logs -- 6 Related Work -- 7 Conclusion -- References -- Towards a Natural Language Conversational Interface for Process Mining -- 1 Introduction -- 2 Related Work -- 3 Proposed Method -- 3.1 Pre-processing and Tagging -- 3.2 Semantic Parsing -- 3.3 PM Tool Interface Mapping -- 4 Sample Questions -- 5 Proof of Concept -- 6 Conclusions and Future Work -- References -- On the Performance Analysis of the Adversarial System Variant Approximation Method to Quantify Process Model Generalization -- 1 Introduction -- 2 Related Work -- 2.1 Generalization Metric -- 2.2 Adversarial System Variant Approximation -- 3 Notations -- 4 Problem Statement -- 5 Experimental Setup -- 5.1 Sampling Parameter -- 5.2 Variant Log Size -- 5.3 Biased Variant Logs -- 6 Results -- 6.1 Sampling Parameter Results -- 6.2 Variant Log Size Results -- 6.3 Biased Variant Log Results -- 7 Conclusion -- References -- PODS4H 2021: 4th International Workshop on Process-Oriented Data Science for Healthcare -- Fourth International Workshop on Process-Oriented Data Science for Healthcare (PODS4H) -- Organization -- Workshop Chairs -- Program Committee -- Verifying Guideline Compliance in Clinical Treatment Using Multi-perspective Conformance Checking: A Case Study -- 1 Introduction -- 2 Background. 3 Research Method. |
| Record Nr. | UNISA-996464540703316 |
Munoz-Gama Jorge
|
||
| Cham, : Springer Nature, 2022 | ||
| Lo trovi qui: Univ. di Salerno | ||
| ||
Projection-Based Clustering through Self-Organization and Swarm Intelligence [[electronic resource] ] : Combining Cluster Analysis with the Visualization of High-Dimensional Data / / by Michael Christoph Thrun
| Projection-Based Clustering through Self-Organization and Swarm Intelligence [[electronic resource] ] : Combining Cluster Analysis with the Visualization of High-Dimensional Data / / by Michael Christoph Thrun |
| Autore | Thrun Michael Christoph |
| Edizione | [1st ed. 2018.] |
| Pubbl/distr/stampa | Cham, : Springer Nature, 2018 |
| Descrizione fisica | 1 online resource (XX, 201 p. 90 illus., 29 illus. in color.) |
| Disciplina | 006.4 |
| Soggetto topico |
Pattern recognition
Data structures (Computer science) Pattern Recognition Data Structures |
| Soggetto non controllato |
Cluster Analysis
Dimensionality Reduction Swarm Intelligence Visualization Unsupervised Machine Learning Data Science Knowledge Discovery 3D Printing Self-Organization Emergence Game Theory Advanced Analytics High-Dimensional Data Multivariate Data Analysis of Structured Data |
| ISBN | 3-658-20540-7 |
| Formato | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione | eng |
| Nota di contenuto | Approaches to Unsupervised Machine Learning -- Methods of Visualization of High-Dimensional Data -- Quality Assessments of Visualizations -- Behavior-Based Systems in Data Science -- Databionic Swarm (DBS). |
| Record Nr. | UNINA-9910293141703321 |
Thrun Michael Christoph
|
||
| Cham, : Springer Nature, 2018 | ||
| Lo trovi qui: Univ. Federico II | ||
| ||