Vai al contenuto principale della pagina
Titolo: | Computer vision - ECCV 2022 . Part IV : 17th European Conference, Tel Aviv, Israel, October 23-27, 2022 : proceedings / / Shai Avidan [and four others] |
Pubblicazione: | Cham, Switzerland : , : Springer, , [2022] |
©2022 | |
Descrizione fisica: | 1 online resource (801 pages) |
Disciplina: | 006.37 |
Soggetto topico: | Computer vision |
Pattern recognition systems | |
Persona (resp. second.): | AvidanShai |
Nota di contenuto: | Intro -- Foreword -- Preface -- Organization -- Contents - Part IV -- Expanding Language-Image Pretrained Models for General Video Recognition*-4pt -- 1 Introduction -- 2 Related Work -- 3 Approach -- 3.1 Overview -- 3.2 Video Encoder -- 3.3 Text Encoder -- 4 Experiments -- 4.1 Experimental Setup -- 4.2 Fully-Supervised Experiments -- 4.3 Zero-Shot Experiments -- 4.4 Few-Shot Experiments -- 4.5 Ablation and Analysis -- 5 Conclusion -- References -- Hunting Group Clues with Transformers for Social Group Activity Recognition -- 1 Introduction -- 2 Related Works -- 2.1 Group Member Identification and Activity Recognition -- 2.2 Detection Transformer -- 3 Proposed Method -- 3.1 Overall Architecture -- 3.2 Loss Calculation -- 3.3 Group Member Identification -- 4 Experiments -- 4.1 Datasets and Evaluation Metrics -- 4.2 Implementation Details -- 4.3 Group Activity Recognition -- 4.4 Social Group Activity Recognition -- 5 Conclusions -- References -- Contrastive Positive Mining for Unsupervised 3D Action Representation Learning -- 1 Introduction -- 2 Related Works -- 2.1 Unsupervised Contrastive Learning -- 2.2 Unsupervised 3D Action Recognition -- 3 Proposed Method -- 3.1 Overview -- 3.2 Similarity Distribution and Positive Mining -- 3.3 Positive-Enhanced Learning -- 3.4 Learning of CPM -- 4 Experiments -- 4.1 Datasets -- 4.2 Implementation -- 4.3 Results and Comparison -- 4.4 Ablation Study -- 5 Conclusion -- References -- Target-Absent Human Attention -- 1 Introduction -- 2 Related Work -- 3 Approach -- 3.1 Foveated Feature Maps (FFMs) -- 3.2 Reward and Policy Learning -- 4 Experiments -- 4.1 Semantic Sequence Score -- 4.2 Implementation Details -- 4.3 Comparing Scanpath Prediction Methods -- 4.4 Group Model Versus Individual Model -- 4.5 Ablation Study -- 4.6 Generalization to Target-Present Search -- 5 Conclusions and Discussion -- References. |
Uncertainty-Based Spatial-Temporal Attention for Online Action Detection -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Problem Setup -- 3.2 Uncertainty Quantification -- 3.3 Uncertainty-Based Spatial-Temporal Attention -- 3.4 Mechanism of Uncertainty-Based Attention -- 3.5 Two-Stream Framework -- 4 Experiments -- 4.1 Datasets -- 4.2 Evaluation Metrics -- 4.3 Implementation Details -- 4.4 Main Experimental Results -- 4.5 Qualitative Results -- 4.6 Ablation Studies -- 5 Conclusion and Future Work -- References -- .26em plus .1em minus .1emIwin: Human-Object Interaction Detection via Transformer with Irregular Windows -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Architecture Overview -- 3.2 Irregular Window Partition -- 3.3 Irregular-Window-Based Token Representation Learning -- 3.4 Irregular-window-based Token Agglomeration -- 4 Experiments -- 4.1 Datasets and Evaluation Metric -- 4.2 Implementation Details -- 4.3 Comparisons with State-of-the-Art -- 4.4 Ablation Study -- 5 Conclusion -- References -- Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions -- 1 Introduction -- 2 Related Work -- 3 Proposed Approach -- 3.1 Group Alignment (GA) Module -- 3.2 Consistency Loss -- 3.3 Group Excitation (GE) Module -- 3.4 Jigsaw Network (JigsawNet) -- 4 Experiments -- 4.1 Datasets -- 4.2 Implementation Details -- 4.3 Comparison with State-of-the-Art Methods -- 4.4 Ablation Studies -- 4.5 Visualization -- 5 Conclusion -- References -- .26em plus .1em minus .1emMining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Overview -- 3.2 Constructing Body-Part Saliency Map -- 3.3 Progressively Body-Part Masking for Flexibility -- 3.4 One-Time Passing via Body-Parts Filtering and Merging -- 3.5 Training and Inference. | |
4 Discussion: Sparse vs. Crowded Scene -- 5 Experiment -- 5.1 Dataset and Metric -- 5.2 Implementation Details -- 5.3 Results -- 5.4 Visualization -- 5.5 Ablation Studies -- 6 Conclusions -- References -- Collaborating Domain-Shared and Target-Specific Feature Clustering for Cross-domain 3D Action Recognition -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Problem Formulation -- 3.2 Overview -- 3.3 Base Module -- 3.4 Online Clustering Module -- 3.5 Collaborative Clustering Module -- 3.6 Training and Test -- 4 Experiment -- 4.1 Datasets and Metrics -- 4.2 Comparison with Different Baselines -- 4.3 Ablation Study -- 5 Conclusion and Future Work -- References -- Is Appearance Free Action Recognition Possible? -- 1 Introduction -- 1.1 Motivation -- 1.2 Related Work -- 1.3 Contributions -- 2 Appearance Free Dataset -- 2.1 Dataset Generation Methodology -- 2.2 Implementation Details -- 2.3 Is Optical Flow Appearance Free? -- 2.4 AFD5: A Reduced Subset Suitable for Small Scale Exploration -- 3 Psychophysical Study: Human Performance on AFD5 -- 3.1 Experiment Design -- 3.2 Human Results -- 4 Computational Study: Model Performances on AFD -- 4.1 Training Procedure -- 4.2 Training Details -- 4.3 Architecture Results -- 5 Two-Stream Strikes Back -- 5.1 E2S-X3D: Design of a Novel Action Recognition Architecture -- 5.2 E2S-X3D: Empirical Evaluation -- 6 Conclusions -- References -- Learning Spatial-Preserved Skeleton Representations for Few-Shot Action Recognition*-10pt -- 1 Introduction -- 2 Related Work -- 2.1 Few-Shot Action Recognition -- 2.2 Graph Representation and Matching -- 2.3 Temporal Alignment -- 3 Preliminary -- 4 Proposed Framework -- 4.1 Spatial Disentanglement and Activation -- 4.2 Temporal Matching -- 4.3 The Learning Objective -- 5 Experiments -- 5.1 Datasets -- 5.2 Baselines -- 5.3 Implementation Details -- 5.4 Results. | |
5.5 Ablation Studies -- 5.6 Hyper-Parameter Analyses -- 6 Conclusion and Future Works -- References -- Dual-Evidential Learning for Weakly-supervised Temporal Action Localization -- 1 Introduction -- 2 Related Work -- 3 Proposed Approach -- 3.1 Background of Evidential Deep Learning -- 3.2 Notations and Preliminaries -- 3.3 Generalizing EDL for Video-level WS-Multi Classification -- 3.4 Snippet-level Progressive Learning -- 3.5 Learning and Inference -- 4 Experimental Results -- 4.1 Experimental Setup -- 4.2 Comparison with State-of-the-Art Methods -- 4.3 Ablation Study -- 4.4 Evaluation for Insights -- 5 Conclusions -- References -- Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning -- 1 Introduction -- 2 Related Works -- 3 Proposed Method -- 3.1 Overall Scheme -- 3.2 Model Architecture -- 3.3 Multi-interval Pose Displacement Prediction (MPDP) Strategy -- 4 Experiments -- 4.1 Datasets and Evaluation Protocol -- 4.2 Implementation Details -- 4.3 Ablation Study -- 4.4 Analysis of Learned Attention -- 4.5 Comparison with State-of-the-Art Methods -- 5 Conclusions -- References -- AdaFocusV3: On Unified Spatial-Temporal Dynamic Video Recognition -- 1 Introduction -- 2 Related Works -- 3 Method -- 3.1 Overview -- 3.2 Network Architecture -- 3.3 Training Algorithm -- 3.4 Implementation Details -- 4 Experiment -- 4.1 Comparisons with State-of-the-Art Baselines -- 4.2 Deploying on Top of Light-weighted Models -- 4.3 Analytical Results -- 5 Conclusion -- References -- Panoramic Human Activity Recognition -- 1 Introduction -- 2 Related Work -- 3 The Proposed Method -- 3.1 Overview -- 3.2 Basic Graph Construction -- 3.3 Hierarchical Graph Network Architecture -- 3.4 Implementation Details -- 4 Experiments -- 4.1 Datasets -- 4.2 Metrics -- 4.3 Results -- 4.4 Ablation Study -- 4.5 Experimental Analysis -- 5 Conclusion -- References. | |
Delving into Details: Synopsis-to-Detail Networks for Video Recognition -- 1 Introduction -- 2 Related Works -- 3 Synopsis-to-Detail Network -- 3.1 Synopsis Network (SNet) -- 3.2 Detail Network (DNet) -- 3.3 Instantiations -- 4 Experiments -- 4.1 Setups -- 4.2 Comparison with State-of-the-Arts -- 4.3 Ablation Experiments -- 4.4 Comparison with Efficient Action Recognition Methods -- 4.5 Efficiency Analysis -- 4.6 Visualization -- 5 Conclusion -- References -- A Generalized and Robust Framework for Timestamp Supervision in Temporal Action Segmentation -- 1 Introduction -- 2 Related Works -- 3 Preliminaries -- 3.1 Temporal Action Segmentation Task -- 3.2 E-M Algorithm -- 4 Method -- 4.1 Segment-Based Notation -- 4.2 Timestamp Supervision -- 4.3 Timestamp Supervision Under Missed Timestamps -- 4.4 Prior Distribution -- 4.5 Loss Function -- 4.6 SkipTag Supervision -- 5 Experiments -- 5.1 Timestamp Supervision Results -- 5.2 Performance with Missed Segments -- 5.3 SkipTag Supervision Results -- 5.4 Additional Results -- 5.5 Training Complexity -- 6 Conclusion -- References -- Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Video Encoder -- 3.2 Zoom-in Matching Module -- 3.3 Mixed-Supervised Hierarchical Contrastive Learning -- 4 Experiments -- 4.1 Experimental Setup -- 4.2 Ablation Study -- 4.3 Comparison with State-of-the-Art Methods -- 4.4 Cross-Domain Evaluation -- 4.5 Quality Analysis -- 5 Conclusion -- References -- PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens -- 1 Introduction -- 2 Related Work -- 3 Privacy-Preserving Action Recognition -- 3.1 Optical Component -- 3.2 Action Recognition Component -- 3.3 Adversarial Component and Training Algorithm -- 4 Experimental Results -- 4.1 Metrics and Evaluation Method -- 4.2 Simulation Experiments. | |
4.3 Hardware Experiments. | |
Titolo autorizzato: | Computer Vision – ECCV 2022 |
ISBN: | 3-031-19772-0 |
Formato: | Materiale a stampa |
Livello bibliografico | Monografia |
Lingua di pubblicazione: | Inglese |
Record Nr.: | 9910624384903321 |
Lo trovi qui: | Univ. Federico II |
Opac: | Controlla la disponibilità qui |