top

  Info

  • Utilizzare la checkbox di selezione a fianco di ciascun documento per attivare le funzionalità di stampa, invio email, download nei formati disponibili del (i) record.

  Info

  • Utilizzare questo link per rimuovere la selezione effettuata.
Bash for Data Scientists
Bash for Data Scientists
Autore Campesato Oswald
Edizione [1st ed.]
Pubbl/distr/stampa Bloomfield : , : Mercury Learning & Information, , 2022
Descrizione fisica 1 online resource (293 pages)
Disciplina 005.43
Soggetto topico COMPUTERS / Programming Languages / Python
Soggetto non controllato Computer Science
Data Science
Pandas
Programming
Python
UNIX
awk
data mining
grep
sed
ISBN 9781683929710
1683929713
9781683929727
1683929721
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Bash for Data Scientists -- CONTENTS -- PREFACE -- WHAT IS THE GOAL? -- IS THIS BOOK IS FOR ME AND WHAT WILL I LEARN? -- HOW WERE THE CODE SAMPLES CREATED? -- WHAT YOU NEED TO KNOW FOR THIS BOOK -- WHICH BASH COMMANDS ARE EXCLUDED? -- HOW DO I SET UP A COMMAND SHELL? -- WHAT ARE THE "NEXT STEPS" AFTER FINISHING THIS BOOK? -- CHAPTER 1 INTRODUCTION -- WHAT IS UNIX? -- Available Shell Types -- WHAT IS BASH? -- Getting Help for Bash Commands -- Navigating Around Directories -- The history Command -- LISTING FILENAMES WITH THE LS COMMAND -- DISPLAYING CONTENTS OF FILES -- The cat Command -- The head and tail Commands -- The Pipe Symbol -- The fold Command -- FILE OWNERSHIP: OWNER, GROUP, AND WORLD -- HIDDEN FILES -- HANDLING PROBLEMATIC FILENAMES -- WORKING WITH ENVIRONMENT VARIABLES -- The env Command -- Useful Environment Variables -- Setting the PATH Environment Variable -- Specifying Aliases and Environment Variables -- FINDING EXECUTABLE FILES -- THE printf COMMAND AND THE echo COMMAND -- THE cut COMMAND -- THE echo COMMAND AND WHITESPACES -- COMMAND SUBSTITUTION ("BACK TICK") -- THE PIPE SYMBOL AND MULTIPLE COMMA -- USING A SEMICOLON TO SEPARATE COMMANDS -- THE paste COMMAND -- Inserting Blank Lines with the paste Command -- A SIMPLE USE CASE WITH THE paste COMMAND -- A SIMPLE USE CASE WITH cut AND paste COMMANDS -- WORKING WITH META CHARACTERS -- WORKING WITH CHARACTER CLASSES -- WHAT ABOUT ZSH? -- Switching between bash and zsh -- Configuring zsh -- SUMMARY -- CHAPTER 2 FILES AND DIRECTORIES -- CREATE, COPY, REMOVE, AND MOVE FILES -- Creating Files -- Copying Files -- Copy Files with Command Substitution -- Deleting Files -- Moving Files -- THE BASENAME, DIRNAME, AND FILE COMMANDS -- THE wc COMMAND -- THE more COMMAND AND THE less COMMAND -- THE head COMMAND -- THE tail COMMAND -- FILE COMPARISON COMMANDS -- THE PARTS OF A FILENA.
WORKING WITH FILE PERMISSIONS -- The chmod Command -- The chown Command -- The chgrp Command -- The umask and ulimit Commands -- WORKING WITH DIRECTORIES -- Absolute and Relative Directories -- Absolute and Relative Path Names -- Creating Directories -- Removing Directories -- Changing Directories -- Renaming Directories -- USING QUOTE CHARACTERS -- STREAMS AND REDIRECTION COMMANDS -- METACHARACTERS AND CHARACTER CLASSES -- Digits and Characters -- Working with "^" and "\" and "!" -- FILENAMES AND METACHARACTERS -- SUMMARY -- CHAPTER 3 USEFUL COMMANDS -- THE join COMMAND -- THE fold COMMAND -- THE split COMMAND -- THE sort COMMAND -- THE uniq COMMAND -- HOW TO COMPARE FILES -- THE od COMMAND -- THE tr COMMAND -- A SIMPLE USE CASE -- THE find COMMAND -- THE tee COMMAND -- FILE COMPRESSION COMMANDS -- The tar command -- The cpio Command -- The gzip and gunzip Commands -- The bunzip2 Command -- The zip Command -- COMMANDS FOR zip FILES AND bz FILES -- INTERNAL FIELD SEPARATOR (IFS) -- DATA FROM A RANGE OF COLUMNS IN A DATASET -- WORKING WITH UNEVEN ROWS IN DATASETS -- THE alias COMMAND -- SUMMARY -- CHAPTER 4 CONDITIONAL LOGIC AND LOOPS -- ARITHMETIC OPERATIONS AND OPERATORS -- WORKING WITH ARRAYS -- ARRAYS AND TEXT FILES -- WORKING WITH VARIABLES -- Assigning Values to Variables -- WORKING WITH OPERATORS FOR STRINGS AND NUMBERS -- THE read COMMAND FOR USER INPUT -- THE test COMMAND FOR VARIABLES, FILES, AND DIRECTORIES -- Relational Operators -- Boolean Operators -- String Operators -- File Test Operators -- CONDITIONAL LOGIC WITH if/else STATEMENTS -- THE case/esac STATEMENT -- ARITHMETIC OPERATORS AND COMPARISONS -- WORKING WITH STRINGS IN SHELL SCRIPTS -- Working with Strings -- WORKING WITH LOOPS -- Using a for loop -- WORKING WITH NESTED LOOPS -- USING A while LOOP -- THE while, case, AND if/elif/fi STATEMENTS -- USING AN UNTIL LOOP.
USER-DEFINED FUNCTIONS -- CREATING A SIMPLE MENU FROM SHELL COMMANDS -- SUMMARY -- CHAPTER 5 PROCESSING DATASETS WITH GREPAND SED -- WHAT IS THE grep COMMAND? -- METACHARACTERS AND THE grep COMMAND -- ESCAPING METACHARACTERS WITH THE grep COMMAND -- USEFUL OPTIONS FOR THE grep COMMAND -- Character Classes and the grep Command -- WORKING WITH THE -C OPTION IN grep -- MATCHING A RANGE OF LINES -- USING BACK REFERENCES IN THE grep COMMAND -- FINDING EMPTY LINES IN DATASETS -- USING KEYS TO SEARCH DATASETS -- THE BACKSLASH CHARACTER AND THE grep COMMAND -- MULTIPLE MATCHES IN THE GREP COMMAND -- THE grep COMMAND AND THE xargs COMMAND -- Searching zip Files for a String -- CHECKING FOR A UNIQUE KEY VALUE -- Redirecting Error Messages -- THE egrep COMMAND AND fgrep COMMAND -- Displaying "Pure" Words in a Dataset with egrep -- Redirecting Error Messages -- THE egrep COMMAND AND fgrep COMMAND -- Displaying "Pure" Words in a Dataset with egrep -- The fgrep Command -- DELETE ROWS WITH MISSING VALUES -- A SIMPLE USE CASE -- WHAT IS THE sed COMMAND? -- The sed Execution Cycle -- MATCHING STRING PATTERNS USING sed -- SUBSTITUTING STRING PATTERNS USING sed -- Replacing Vowels from a String or a File -- Deleting Multiple Digits and Letters from a String -- SEARCH AND REPLACE WITH sed -- DATASETS WITH MULTIPLE DELIMITERS -- USEFUL SWITCHES IN sed -- WORKING WITH DATASETS -- Printing Lines -- Character Classes and sed -- Removing Control Characters -- COUNTING WORDS IN A DATASET -- BACK REFERENCES IN sed -- ONE-LINE sed COMMANDS -- POPULATE MISSING VALUES WITH THE sed COMMAND -- A DATASET WITH 1,000,000 ROWS -- Numeric Comparisons -- Counting Adjacent Digits -- Average Support Rate -- SUMMARY -- CHAPTER 6 PROCESSING DATASETS WITH AWK -- THE awk COMMAND -- Built-in Variables that Control awk -- How Does the awk Command Work? -- ALIGNING TEXT WITH THE printf COMMAND.
CONDITIONAL LOGIC AND CONTROL STATEMENTS -- The while Statement -- A for loop in awk -- A for loop with a break Statement -- The next and continue Statements -- DELETING ALTERNATE LINES IN DATASETS -- MERGING LINES IN DATASETS -- Printing File Contents as a Single Line -- Joining Groups of Lines in a Text File -- Joining Alternate Lines in a Text File -- MATCHING WITH METACHARACTERS AND CHARACTER SETS -- PRINTING LINES USING CONDITIONAL LOGIC -- SPLITTING FILENAMES WITH awk -- WORKING WITH POSTFIX ARITHMETIC OPERATORS -- NUMERIC FUNCTIONS IN awk -- ONE-LINE awk COMMANDS -- USEFUL SHORT awk SCRIPTS -- PRINTING THE WORDS IN A TEXT STRING IN awk -- COUNT OCCURRENCES OF A STRING IN SPECIFIC ROWS -- PRINTING A STRING IN A FIXED NUMBER OF COLUMNS -- PRINTING A DATASET IN A FIXED NUMBER OF COLUMNS -- ALIGNING COLUMNS IN DATASETS -- ALIGNING COLUMNS AND MULTIPLE ROWS IN DATASETS -- DISPLAYING A SUBSET OF COLUMNS IN A TEXT FILE -- SUBSETS OF COLUMN-ALIGNED ROWS IN DATASETS -- COUNTING WORD FREQUENCY IN DATASETS -- DISPLAYING ONLY "PURE" WORDS IN A DATASET -- DELETE ROWS WITH MISSING VALUES -- WORKING WITH MULTI-LINE RECORDS IN AWK -- A SIMPLE USE CASE -- ANOTHER USE CASE -- A DATASET WITH 1,000,000 ROWS -- Counting Adjacent Digits -- Average Support Rate -- SUMMARY -- CHAPTER 7 PROCESSING DATASETS (PANDAS) -- PREREQUISITES FOR THIS CHAPTER -- ANALYZING MISSING DATA -- Causes of Missing Data -- PANDAS, CSV FILES, AND MISSING DATA -- Single Column CSV Files -- Two Column CSV Files -- MISSING DATA AND IMPUTATION -- Counting Missing Data Values -- Drop Redundant Columns -- Remove Duplicate Rows -- Display Duplicate Rows -- Uniformity of Data Values -- Too Many Missing Data Values -- Categorical Data -- Data Inconsistency -- Mean Value Imputation -- Random Value Imputation -- Multiple Imputation -- Matching and Hot Deck Imputation.
Is a Zero Value Valid or Invalid? -- SKEWED DATASETS -- CSV FILES WITH MULTI-ROW RECORDS -- COLUMN SUBSET AND ROW SUBRANGE OF THE TITANIC CSV FILE -- DATA NORMALIZATION -- Assigning Classes to Data -- Other Data Cleaning Tasks -- DeepChecks and Data Validation -- HANDLING CATEGORICAL DATA -- Processing Inconsistent Categorical Data -- Mapping Categorical Data to Numeric Values -- Mapping Categorical Data to One Hot Encoded Values -- WORKING WITH CURRENCY -- WORKING WITH DATES -- Find Missing Dates -- Find Unique Dates -- Switch Date Formats -- WORKING WITH IMBALANCED DATASETS -- Data Sampling Techniques -- Removing Noisy Data -- Cost-sensitive Learning -- Detecting Imbalanced Data -- Rebalancing Datasets -- Specify stratify in Data Splits -- WHAT IS SMOTE? -- DATA WRANGLING -- Data Transformation: What Does This Mean? -- A DATASET WITH 1,000,000 ROWS -- Dataset Details -- Numeric Comparisons -- Counting Adjacent Digits -- SAVING CSV DATA TO XML, JSON, AND HTML FILES -- SUMMARY -- CHAPTER 8 NOSQL, SQLITE, AND PYTHON -- NON-RELATIONAL DATABASE SYSTEMS -- Advantages of Non-relational Databases -- WHAT IS NOSQL? -- What is NewSQL? -- RDBMS VERSUS NOSQL: WHICH ONE TO USE? -- Good Data Types for NoSQL -- Some Guidelines for Selecting a Database -- NoSQL Databases -- WHAT IS MONGODB? -- Features of MongoDB -- Installing MongoDB -- Launching MongoDB -- USEFUL MONGO APIS -- Metacharacters in Mongo Queries -- MONGODB COLLECTIONS AND DOCUMENTS -- Document Format in MongoDB -- CREATE A MONGODB COLLECTION -- WORKING WITH MONGODB COLLECTIONS -- Find All Android Phones -- Find All Android Phones in 2018 -- Insert a New Item (Document) -- Update an Existing Item (Document) -- Calculate the Average Price for Each Brand -- Calculate the Average Price for Each Brand in 2019 -- Import Data with mongoimport -- WHAT IS FUGUE? -- WHAT IS COMPASS? -- WHAT IS PYMONGO?.
MYSQL, SQLALCHEMY, AND PANDAS.
Record Nr. UNINA-9911006689403321
Campesato Oswald  
Bloomfield : , : Mercury Learning & Information, , 2022
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Datenwissenschaften und Gesellschaft : Die Genese eines transversalen Wissensfeldes / / Philippe Saner
Datenwissenschaften und Gesellschaft : Die Genese eines transversalen Wissensfeldes / / Philippe Saner
Autore Saner Philippe
Pubbl/distr/stampa Bielefeld, : transcript Verlag, 2022
Descrizione fisica 1 online resource (320 pages)
Disciplina 005.7
Collana Digitale Soziologie
Soggetto topico Big data
Data mining
Soggetto non controllato Datenwissenschaft
Digitalisierung
Politik
Arbeitsmarkt
Hochschulbildung
Feldtheorie
Wissenschaft
Big Data
Universität
Datengesellschaft
Schweiz
Wissenschaftssoziologie
Wissenssoziologie
Bildungsforschung
Soziologie
Data Science
Digitalization
Politics
Labour Market
University Education
Field Theory
Science
University
Data Society
Switzerland
Sociology of Science
Sociology of Knowledge
Educational Research
Sociology
ISBN 3-8394-6259-2
Classificazione AK 26600
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione ger
Nota di contenuto Frontmatter -- Editorial -- Inhalt -- Vorwort -- Abbildungsverzeichnis -- Tabellenverzeichnis -- Danksagung -- Kapitel 1 - Einleitung -- Teil I - Grundlagen -- Kapitel 2 - Transversale Wissensgebiete als Räume zwischen Feldern -- Kapitel 3 - »Data Science« als soziales Phänomen: Genese und multiple Perspektiven -- Kapitel 4 - Forschungsdesign -- Teil II - Repräsentationen und Imaginationen von Datenwissenschaften in Arbeitsmarkt und Politik -- Einleitung -- Kapitel 5 - Repräsentationen der Datenwissenschaften im schweizerischen Arbeitsmarkt -- Kapitel 6 - Zukunftsentwürfe der Datenwissenschaften in Diskursen der Bildungs- und Forschungspolitik -- Teil III - Konstruktionen der Datenwissenschaften im akademischen Feld -- Einleitung -- Kapitel 7 - Die Konstruktion der Datenwissenschaften im akademischen Feld durch Begriffsarbeit und boundary work -- Kapitel 8 - Die Verhandlung der Datenwissenschaften in Universitäten und Hochschulen -- Kapitel 9 - Die Strukturlogik datenwissenschaftlicher Curricula -- Kapitel 10 - Die Suche nach den richtigen Kompetenzen -- Teil IV - Schlussbetrachtungen -- Kapitel 11 - Synthese -- Bibliografie -- Anhang
Record Nr. UNISA-996483168603316
Saner Philippe  
Bielefeld, : transcript Verlag, 2022
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Datenwissenschaften und Gesellschaft : Die Genese eines transversalen Wissensfeldes / / Philippe Saner
Datenwissenschaften und Gesellschaft : Die Genese eines transversalen Wissensfeldes / / Philippe Saner
Autore Saner Philippe
Pubbl/distr/stampa Bielefeld, : transcript Verlag, 2022
Descrizione fisica 1 online resource (320 pages)
Disciplina 005.7
Collana Digitale Soziologie
Soggetto topico Big data
Data mining
Soggetto non controllato Datenwissenschaft
Digitalisierung
Politik
Arbeitsmarkt
Hochschulbildung
Feldtheorie
Wissenschaft
Big Data
Universität
Datengesellschaft
Schweiz
Wissenschaftssoziologie
Wissenssoziologie
Bildungsforschung
Soziologie
Data Science
Digitalization
Politics
Labour Market
University Education
Field Theory
Science
University
Data Society
Switzerland
Sociology of Science
Sociology of Knowledge
Educational Research
Sociology
ISBN 3-8394-6259-2
Classificazione AK 26600
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione ger
Nota di contenuto Frontmatter -- Editorial -- Inhalt -- Vorwort -- Abbildungsverzeichnis -- Tabellenverzeichnis -- Danksagung -- Kapitel 1 - Einleitung -- Teil I - Grundlagen -- Kapitel 2 - Transversale Wissensgebiete als Räume zwischen Feldern -- Kapitel 3 - »Data Science« als soziales Phänomen: Genese und multiple Perspektiven -- Kapitel 4 - Forschungsdesign -- Teil II - Repräsentationen und Imaginationen von Datenwissenschaften in Arbeitsmarkt und Politik -- Einleitung -- Kapitel 5 - Repräsentationen der Datenwissenschaften im schweizerischen Arbeitsmarkt -- Kapitel 6 - Zukunftsentwürfe der Datenwissenschaften in Diskursen der Bildungs- und Forschungspolitik -- Teil III - Konstruktionen der Datenwissenschaften im akademischen Feld -- Einleitung -- Kapitel 7 - Die Konstruktion der Datenwissenschaften im akademischen Feld durch Begriffsarbeit und boundary work -- Kapitel 8 - Die Verhandlung der Datenwissenschaften in Universitäten und Hochschulen -- Kapitel 9 - Die Strukturlogik datenwissenschaftlicher Curricula -- Kapitel 10 - Die Suche nach den richtigen Kompetenzen -- Teil IV - Schlussbetrachtungen -- Kapitel 11 - Synthese -- Bibliografie -- Anhang
Record Nr. UNINA-9910591164703321
Saner Philippe  
Bielefeld, : transcript Verlag, 2022
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
From Opinion Mining to Financial Argument Mining
From Opinion Mining to Financial Argument Mining
Autore Chen Chung-Chi
Pubbl/distr/stampa Springer Nature, 2021
Descrizione fisica 1 online resource (102 pages)
Altri autori (Persone) HuangHen-Hsen
ChenHsin-Hsi
Collana SpringerBriefs in Computer Science
Soggetto topico Natural language & machine translation
Data mining
Algorithms & data structures
Artificial intelligence
Information technology: general issues
Soggetto non controllato Natural Language Processing (NLP)
Data Mining and Knowledge Discovery
Data Structures and Information Theory
Artificial Intelligence
Computer Applications
Data Science
Computer and Information Systems Applications
Open Access
financial opinion mining
text mining in finance
financial technology application
FinTech
argument mining in finance
opinion quality evaluation
numeral understanding
Natural language & machine translation
Data mining
Expert systems / knowledge-based systems
Algorithms & data structures
Information theory
Information technology: general issues
ISBN 981-16-2881-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNISA-996464443103316
Chen Chung-Chi  
Springer Nature, 2021
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
From Opinion Mining to Financial Argument Mining
From Opinion Mining to Financial Argument Mining
Autore Chen Chung-Chi
Pubbl/distr/stampa Springer Nature, 2021
Descrizione fisica 1 online resource (102 pages)
Altri autori (Persone) HuangHen-Hsen
ChenHsin-Hsi
Collana SpringerBriefs in Computer Science
Soggetto topico Natural language & machine translation
Data mining
Algorithms & data structures
Artificial intelligence
Information technology: general issues
Soggetto non controllato Natural Language Processing (NLP)
Data Mining and Knowledge Discovery
Data Structures and Information Theory
Artificial Intelligence
Computer Applications
Data Science
Computer and Information Systems Applications
Open Access
financial opinion mining
text mining in finance
financial technology application
FinTech
argument mining in finance
opinion quality evaluation
numeral understanding
Natural language & machine translation
Data mining
Expert systems / knowledge-based systems
Algorithms & data structures
Information theory
Information technology: general issues
ISBN 981-16-2881-5
Classificazione COM004000COM018000COM021030COM031000COM073000
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNINA-9910482868303321
Chen Chung-Chi  
Springer Nature, 2021
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)
Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)
Autore Jung Christopher [Hrsg.]Meyer, Jörg [Hrsg.]Streit, Achim [Hrsg.]
Pubbl/distr/stampa KIT Scientific Publishing, 2017
Descrizione fisica 1 online resource (V, 259 p. p.)
Soggetto non controllato Big Data
data analysis
data life cycle
data management
data science
Data Science
Datenanalyse
Datenlebenszyklus
Datenmanagement
ISBN 1000071931
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Altri titoli varianti Helmholtz Portfolio Theme Large-Scale Data Management and Analysis
Record Nr. UNINA-9910346960303321
Jung Christopher [Hrsg.]Meyer, Jörg [Hrsg.]Streit, Achim [Hrsg.]  
KIT Scientific Publishing, 2017
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Pandas Basics
Pandas Basics
Autore Campesato Oswald
Edizione [1st ed.]
Pubbl/distr/stampa Bloomfield : , : Mercury Learning & Information, , 2022
Descrizione fisica 1 online resource (215 pages)
Disciplina 005.133
Soggetto topico COMPUTERS / Programming Languages / Python
Soggetto non controllato Computer Science
Data Science
Developers
Matplotlib
NumPy
Programming
Python
Seaborn
data mining
ISBN 9781683928249
1683928245
9781683928256
1683928253
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Cover -- Title Page -- Copyright -- Dedication -- Contents -- Preface -- Chapter 1: Introduction to Python -- Tools for Python -- easy_install and pip -- virtualenv -- IPython -- Python Installation -- Setting the PATH Environment Variable (Windows Only) -- Launching Python on Your Machine -- The Python Interactive Interpreter -- Python Identifiers -- Lines, Indentation, and Multi-lines -- Quotations and Comments -- Saving Your Code in a Module -- Some Standard Modules -- The help() and dir() Functions -- Compile Time and Runtime Code Checking -- Simple Data Types -- Working with Numbers -- Working with Other Bases -- The chr() Function -- The round() Function -- Formatting Numbers -- Working with Fractions -- Unicode and UTF-8 -- Working with Unicode -- Working with Strings -- Comparing Strings -- Formatting Strings -- Uninitialized Variables and the Value None -- Slicing and Splicing Strings -- Testing for Digits and Alphabetic Characters -- Search and Replace a String in Other Strings -- Remove Leading and Trailing Characters -- Printing Text without NewLine Characters -- Text Alignment -- Working with Dates -- Converting Strings to Dates -- Exception Handling -- Handling User Input -- Command-line Arguments -- Summary -- Chapter 2: Working with Data -- Dealing with Data: What Can Go Wrong? -- What is Data Drift? -- What are Datasets? -- Data Preprocessing -- Data Types -- Preparing Datasets -- Discrete Data Versus Continuous Data -- Binning Continuous Data -- Scaling Numeric Data via Normalization -- Scaling Numeric Data via Standardization -- Scaling Numeric Data via Robust Standardization -- What to Look for in Categorical Data -- Mapping Categorical Data to Numeric Values -- Working with Dates -- Working with Currency -- Working with Outliers and Anomalies -- Outlier Detection/Removal -- Finding Outliers with NumPy.
Finding Outliers with Pandas -- Calculating Z-scores to Find Outliers -- Finding Outliers with SkLearn (Optional) -- Working with Missing Data -- Imputing Values: When is Zero a Valid Value? -- Dealing with Imbalanced Datasets -- What is SMOTE? -- SMOTE extensions -- The Bias-Variance Tradeoff -- Types of Bias in Data -- Analyzing Classifiers (Optional) -- What is LIME? -- What is ANOVA? -- Summary -- Chapter 3: Introduction to Probability and Statistics -- What is a Probability? -- Calculating the Expected Value -- Random Variables -- Discrete versus Continuous Random Variables -- Well-known Probability Distributions -- Fundamental Concepts in Statistics -- The Mean -- The Median -- The Mode -- The Variance and Standard Deviation -- Population, Sample, and Population Variance -- Chebyshev's Inequality -- What is a p-value? -- The Moments of a Function (Optional) -- What is Skewness? -- What is Kurtosis? -- Data and Statistics -- The Central Limit Theorem -- Correlation versus Causation -- Statistical Inferences -- Statistical Terms: RSS, TSS, R^2, and F1 Score -- What is an F1 score? -- Gini Impurity, Entropy, and Perplexity -- What is the Gini Impurity? -- What is Entropy? -- Calculating the Gini Impurity and Entropy Values -- Multi-dimensional Gini Index -- What is Perplexity? -- Cross-Entropy and KL Divergence -- What is Cross-Entropy? -- What is KL Divergence? -- What's Their Purpose? -- Covariance and Correlation Matrices -- The Covariance Matrix -- Covariance Matrix: An Example -- The Correlation Matrix -- Eigenvalues and Eigenvectors -- Calculating Eigenvectors: A Simple Example -- Gauss Jordan Elimination (Optional) -- PCA (Principal Component Analysis) -- The New Matrix of Eigenvectors -- Well-known Distance Metrics -- Pearson Correlation Coefficient -- Jaccard Index (or Similarity) -- Local Sensitivity Hashing (Optional).
Types of Distance Metrics -- What is Bayesian Inference? -- Bayes' Theorem -- Some Bayesian Terminology -- What is MAP? -- Why Use Bayes' Theorem? -- Summary -- Chapter 4: Introduction to Pandas (1) -- What is Pandas? -- Pandas Options and Settings -- Pandas Data Frames -- Data Frames and Data Cleaning Tasks -- Alternatives to Pandas -- A Pandas Data Frame with a NumPy Example -- Describing a Pandas Data Frame -- Pandas Boolean Data Frames -- Transposing a Pandas Data Frame -- Pandas Data Frames and Random Numbers -- Reading CSV Files in Pandas -- Specifying a Separator and Column Sets in Text Files -- Specifying an Index in Text Files -- The loc() and iloc() Methods in Pandas -- Converting Categorical Data to Numeric Data -- Matching and Splitting Strings in Pandas -- Converting Strings to Dates in Pandas -- Working with Date Ranges in Pandas -- Detecting Missing Dates in Pandas -- Interpolating Missing Dates in Pandas -- Other Operations with Dates in Pandas -- Merging and Splitting Columns in Pandas -- Reading HTML Web Pages in Pandas -- Saving a Pandas Data Frame as an HTML Web Page -- Summary -- Chapter 5: Introduction to Pandas (2) -- Combining Pandas Data Frames -- Data Manipulation with Pandas Data Frames (1) -- Data Manipulation with Pandas Data Frames (2) -- Data Manipulation with Pandas Data Frames (3) -- Pandas Data Frames and CSV Files -- Managing Columns in Data Frames -- Switching Columns -- Appending Columns -- Deleting Columns -- Inserting Columns -- Scaling Numeric Columns -- Managing Rows in Pandas -- Selecting a Range of Rows in Pandas -- Finding Duplicate Rows in Pandas -- Inserting New Rows in Pandas -- Handling Missing Data in Pandas -- Multiple Types of Missing Values -- Test for Numeric Values in a Column -- Replacing NaN Values in Pandas -- Summary -- Chapter 6: Introduction to Pandas (3) -- Threshold Values and Outliers.
The Pandas Pipe Method -- Pandas query() Method for Filtering Data -- Sorting Data Frames in Pandas -- Working with groupby() in Pandas -- Working with apply() and mapapply() in Pandas -- Handling Outliers in Pandas -- Pandas Data Frames and Scatterplots -- Pandas Data Frames and Simple Statistics -- Aggregate Operations in Pandas Data Frames -- Aggregate Operations with the titanic.csv Dataset -- Save Data Frames as CSV Files and Zip Files -- Pandas Data Frames and Excel Spreadsheets -- Working with JSON-based Data -- Python Dictionary and JSON -- Python, Pandas, and JSON -- Window Functions in Pandas -- Useful One-line Commands in Pandas -- What is pandasql? -- What is Method Chaining? -- Pandas and Method Chaining -- Pandas Profiling -- Alternatives to Pandas -- Summary -- Chapter 7: Data Visualization -- What is Data Visualization? -- Types of Data Visualization -- What is Matplotlib? -- Lines in a Grid in Matplotlib -- A Colored Grid in Matplotlib -- Randomized Data Points in Matplotlib -- A Histogram in Matplotlib -- A Set of Line Segments in Matplotlib -- Plotting Multiple Lines in Matplotlib -- Trigonometric Functions in Matplotlib -- Display IQ Scores in Matplotlib -- Plot a Best-Fitting Line in Matplotlib -- The Iris Dataset in Sklearn -- Sklearn, Pandas, and the Iris Dataset -- Working with Seaborn -- Features of Seaborn -- Seaborn Built-in Datasets -- The Iris Dataset in Seaborn -- The Titanic Dataset in Seaborn -- Extracting Data from the Titanic Dataset in Seaborn (1) -- Extracting Data from the Titanic Dataset in Seaborn (2) -- Visualizing a Pandas Dataset in Seaborn -- Data Visualization in Pandas -- What is Bokeh? -- Summary -- Index.
Record Nr. UNINA-9911006690203321
Campesato Oswald  
Bloomfield : , : Mercury Learning & Information, , 2022
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Process mining workshops : ICPM 2021 international workshops, Eindhoven, The Netherlands, October 31 - November 4, 2021 : revised selected papers / / editors, Jorge Muñoz Gama, Xixi Lu
Process mining workshops : ICPM 2021 international workshops, Eindhoven, The Netherlands, October 31 - November 4, 2021 : revised selected papers / / editors, Jorge Muñoz Gama, Xixi Lu
Autore Munoz-Gama Jorge
Pubbl/distr/stampa Cham, : Springer Nature, 2022
Descrizione fisica 1 online resource (xiv, 410 pages) : illustrations (chiefly color)
Altri autori (Persone) Munoz-GamaJorge
LuXixi
Collana Lecture notes in business information processing
Soggetto topico Data mining
Electronic data processing
Soggetto non controllato Process Mining
Process Discovery
Process Analytics
Process Querying
Conformance Checking
Predictive Process Monitoring
Data Science
Event Data
Streaming Analytics
Machine Learning
Decision Support Systems
Business Process Management
Information Systems
Petri Nets
Open Access
ISBN 3-030-98581-4
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Intro -- Preface -- Organization -- Contents -- XES 2.0 Workshop and Survey -- Rethinking the Input for Process Mining: Insights from the XES Survey and Workshop -- 1 Introduction -- 2 XES Standard: A Brief Overview -- 3 Survey Design and Insights -- 4 Adding Context: Reflections from the XES 2.0 Workshop -- 5 Conclusion -- References -- EdbA 2021: 2nd International Workshop on Event Data and Behavioral Analytics -- Second International Workshop on Event Data and Behavioral Analytics (EdbA'21) -- Organization -- Workshop Chairs -- Program Committee -- Probability Estimation of Uncertain Process Trace Realizations -- 1 Introduction -- 2 Related Work -- 3 Running Example -- 4 Preliminaries -- 5 Method -- 6 Validation of Probability Estimates -- 7 Conclusion -- References -- Visualizing Trace Variants from Partially Ordered Event Data -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Visualizing Trace Variants -- 4.1 Approach -- 4.2 Formal Guarantees -- 4.3 Limitations -- 4.4 Implementation -- 5 Evaluation -- 6 Conclusion -- References -- Analyzing Multi-level BOM-Structured Event Data -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Methods -- 4.1 Analysis Methodology -- 4.2 M2BOM-Structured Assembly Processes -- 5 Case Study -- 6 Conclusion -- References -- Linac: A Smart Environment Simulator of Human Activities -- 1 Introduction -- 2 Existing Solutions -- 3 Proposed Simulation Solution -- 3.1 Configuration of the Smart Environment -- 3.2 Configuration of the Agents' Behavior - AIL Language -- 3.3 Simulation Execution -- 3.4 Clock Simulation -- 3.5 MQTT Output -- 4 Implementation -- 5 Evaluation -- 5.1 Configuration -- 5.2 Results -- 6 Conclusions and Future Works -- References -- Root Cause Analysis in Process Mining with Probabilistic Temporal Logic -- 1 Introduction -- 2 Related Work -- 3 The AITIA-PM Algorithm.
3.1 Background -- 3.2 Algorithmic Procedure -- 4 Demonstration -- 5 Conclusion -- References -- xPM: A Framework for Process Mining with Exogenous Data -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 A Framework for Process Mining with Exogenous Data -- 4.1 Linking -- 4.2 Slicing -- 4.3 Transformation -- 4.4 Discovery -- 4.5 Enhancing -- 5 Evaluation -- 5.1 Procedure -- 5.2 Quality Measures -- 5.3 Event Logs and Exogenous Data -- 5.4 Results and Discussion -- 6 Conclusion -- References -- A Bridging Model for Process Mining and IoT -- 1 Introduction -- 2 Background -- 2.1 IoT Ontologies -- 2.2 Business Process Context Modelling -- 3 Conceptual Ambiguity in IoT and PM -- 3.1 IoT Data -- 3.2 Context in PM vs Context in IoT -- 3.3 Process Event vs IoT Event -- 4 Connecting IoT and Process Mining: A Conceptual Model -- 5 Use Case Validation -- 6 Related Work -- 7 Conclusion -- References -- ML4PM 2021: 2nd International Workshop in Leveraging Machine Learning for Process Mining -- 2nd International Workshop in Leveraging Machine Learning for Process Mining (ML4PM 2021) -- Organization -- Workshop Chairs -- Program Committee -- Additional Reviewers -- Exploiting Instance Graphs and Graph Neural Networks for Next Activity Prediction -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 Building Instance Graphs -- 3.2 Data Preprocessing -- 3.3 Deep Graph Convolutional Neural Network -- 4 Experiments -- 4.1 Experimental Setup -- 4.2 Results -- 5 Conclusions and Future Works -- References -- Can Deep Neural Networks Learn Process Model Structure? An Assessment Framework and Analysis -- 1 Introduction -- 2 Related Work -- 3 A Framework for Assessing the Generalisation Capacity of RNNs -- 3.1 The Resampling Procedure -- 3.2 Metrics -- 4 Experimental Evaluation -- 4.1 Process Models -- 4.2 Hyperparameter Search -- 4.3 Results -- 5 Discussion.
6 Conclusion and Future Work -- References -- Remaining Time Prediction for Processes with Inter-case Dynamics -- 1 Introduction -- 2 Preliminaries and Related Work -- 2.1 Related Work -- 2.2 RTM Background -- 2.3 Performance Spectrum with Error Progression -- 3 Approach -- 3.1 Detecting Uncertain Segments -- 3.2 Identifying Inter-case Dynamics in Uncertain Segments -- 3.3 Inter-case Feature Creation -- 3.4 Predicting the Next Segment -- 3.5 Predicting Waiting Time -- 4 Evaluation -- 4.1 Experimental Setup -- 4.2 Results -- 5 Conclusion -- References -- Event Log Sampling for Predictive Monitoring -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Proposed Sampling Methods -- 5 Evaluation -- 5.1 Event Logs -- 5.2 Implementation -- 5.3 Evaluation Setting -- 5.4 Experimental Results -- 6 Discussion -- 7 Conclusion -- References -- Active Anomaly Detection for Key Item Selection in Process Auditing -- 1 Introduction -- 2 Related Work -- 2.1 Anomaly Detection -- 2.2 Active Anomaly Detection -- 2.3 Trace Visualisation -- 3 Active Selection Approach -- 3.1 Step One: Encode Process Data -- 3.2 Step Two: Assign Anomaly Score -- 3.3 Step Three: Actively Label Exceptions -- 4 Evaluation -- 4.1 Step One: Encode Process Data -- 4.2 Step Two: Assign Anomaly Score -- 4.3 Step Three: Actively Label Exceptions -- 4.4 Performance Results -- 5 Discussion -- 5.1 Cycle One -- 5.2 Cycle Two -- 5.3 Cycle Three -- 6 Limitations -- 7 Conclusion and Future Work -- References -- Prescriptive Process Monitoring Under Resource Constraints: A Causal Inference Approach -- 1 Introduction -- 2 Background and Related Work -- 2.1 Predictive Process Monitoring -- 2.2 Prescriptive Process Monitoring -- 2.3 Causal Inference -- 3 Approach -- 3.1 Log Preprocessing -- 3.2 Predictive Model -- 3.3 Causal Model -- 3.4 Resource Allocator -- 4 Evaluation -- 4.1 Dataset.
4.2 Experiment Setup -- 4.3 Results -- 4.4 Threats to Validity -- 5 Conclusion -- References -- Quantifying Explainability in Outcome-Oriented Predictive Process Monitoring -- 1 Introduction -- 2 Preliminaries -- 3 Explainability in OOPPM -- 3.1 Explainability Through Interpretability and Faithfulness -- 3.2 Logit Leaf Model -- 3.3 Generalized Logistic Rule Model -- 4 Experimental Evaluation -- 4.1 Benchmark Models -- 4.2 Event Logs -- 4.3 Implementation -- 4.4 Quantitative Metrics Results -- 5 Conclusion -- References -- SA4PM 2021: 2nd International Workshop on Streaming Analytics for Process Mining -- 2nd International Workshop on Streaming Analytics for Process Mining (SA4PM) -- Organization -- Workshop Chairs -- Program Committee -- Online Prediction of Aggregated Retailer Consumer Behaviour -- 1 Introduction -- 2 Framework -- 2.1 Features -- 2.2 Clustering -- 2.3 Training -- 2.4 Predicting -- 3 Experimental Evaluation -- 3.1 Experimental Setup -- 3.2 Results -- 4 Related Work -- 5 Conclusion and Future Work -- References -- PErrCas: Process Error Cascade Mining in Trace Streams -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Online Cascade Mining -- 4.1 Outlier Segment-Level Events -- 4.2 Error Cascade Construction -- 4.3 Cascade Patterns -- 5 Evaluation -- 5.1 Synthetic Data -- 5.2 Travel Reimbursement Process -- 6 Conclusion -- References -- Continuous Performance Evaluation for Business Process Outcome Monitoring -- 1 Introduction -- 2 Related Work -- 3 Continuous Prediction Evaluation Framework -- 4 Performance Evaluation Methods -- 4.1 Evaluating Performance Using a Local Timeline -- 4.2 Real-Time Model Performance -- 5 Experimental Analysis and Results -- 6 Conclusions -- References -- PQMI 2021: 6th International Workshop on Process Querying, Manipulation, and Intelligence.
6th International Workshop on Process Querying, Manipulation, and Intelligence (PQMI 2021) -- Organization -- Workshop Organizers -- Program Committee -- An Event Data Extraction Approach from SAP ERP for Process Mining -- 1 Introduction -- 2 Background -- 2.1 Object-Centric Event Logs -- 2.2 SAP: Entities and Relationships -- 3 Extracting Event Data from SAP ERP: Approach -- 3.1 Building Graphs of Relations -- 3.2 Extracting Object-Centric Event Logs -- 4 Extracting Event Data from SAP ERP: Tool -- 5 Assessment -- 5.1 Building a Graph of Relations -- 5.2 Extracting Object-Centric Event Logs -- 6 Related Work -- 7 Conclusion -- References -- Towards a Natural Language Conversational Interface for Process Mining -- 1 Introduction -- 2 Related Work -- 3 Proposed Method -- 3.1 Pre-processing and Tagging -- 3.2 Semantic Parsing -- 3.3 PM Tool Interface Mapping -- 4 Sample Questions -- 5 Proof of Concept -- 6 Conclusions and Future Work -- References -- On the Performance Analysis of the Adversarial System Variant Approximation Method to Quantify Process Model Generalization -- 1 Introduction -- 2 Related Work -- 2.1 Generalization Metric -- 2.2 Adversarial System Variant Approximation -- 3 Notations -- 4 Problem Statement -- 5 Experimental Setup -- 5.1 Sampling Parameter -- 5.2 Variant Log Size -- 5.3 Biased Variant Logs -- 6 Results -- 6.1 Sampling Parameter Results -- 6.2 Variant Log Size Results -- 6.3 Biased Variant Log Results -- 7 Conclusion -- References -- PODS4H 2021: 4th International Workshop on Process-Oriented Data Science for Healthcare -- Fourth International Workshop on Process-Oriented Data Science for Healthcare (PODS4H) -- Organization -- Workshop Chairs -- Program Committee -- Verifying Guideline Compliance in Clinical Treatment Using Multi-perspective Conformance Checking: A Case Study -- 1 Introduction -- 2 Background.
3 Research Method.
Record Nr. UNISA-996464540703316
Munoz-Gama Jorge  
Cham, : Springer Nature, 2022
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Projection-Based Clustering through Self-Organization and Swarm Intelligence [[electronic resource] ] : Combining Cluster Analysis with the Visualization of High-Dimensional Data / / by Michael Christoph Thrun
Projection-Based Clustering through Self-Organization and Swarm Intelligence [[electronic resource] ] : Combining Cluster Analysis with the Visualization of High-Dimensional Data / / by Michael Christoph Thrun
Autore Thrun Michael Christoph
Edizione [1st ed. 2018.]
Pubbl/distr/stampa Cham, : Springer Nature, 2018
Descrizione fisica 1 online resource (XX, 201 p. 90 illus., 29 illus. in color.)
Disciplina 006.4
Soggetto topico Pattern recognition
Data structures (Computer science)
Pattern Recognition
Data Structures
Soggetto non controllato Cluster Analysis
Dimensionality Reduction
Swarm Intelligence
Visualization
Unsupervised Machine Learning
Data Science
Knowledge Discovery
3D Printing
Self-Organization
Emergence
Game Theory
Advanced Analytics
High-Dimensional Data
Multivariate Data
Analysis of Structured Data
ISBN 3-658-20540-7
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Approaches to Unsupervised Machine Learning -- Methods of Visualization of High-Dimensional Data -- Quality Assessments of Visualizations -- Behavior-Based Systems in Data Science -- Databionic Swarm (DBS).
Record Nr. UNINA-9910293141703321
Thrun Michael Christoph  
Cham, : Springer Nature, 2018
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui