11094nam 2200553 450 991061927860332120230305173552.03-031-19806-9(MiAaPQ)EBC7119892(Au-PeEL)EBL7119892(CKB)25179630400041(OCoLC)1348488963(PPN)265855845(EXLCZ)992517963040004120230305d2022 uy 0engurcnu||||||||txtrdacontentcrdamediacrrdacarrierComputer vision - ECCV 2022Part XXV 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, proceedings /Shai Avidan [and four others] (editors)Cham, Switzerland :Springer,[2022]©20221 online resource (815 pages)Lecture notes in computer science ;Volume 13685Print version: Avidan, Shai Computer Vision - ECCV 2022 Cham : Springer,c2022 9783031198052 Includes bibliographical references and index.Intro -- Foreword -- Preface -- Organization -- Contents - Part XXV -- Cross-domain Ensemble Distillation for Domain Generalization -- 1 Introduction -- 2 Related Work -- 3 Our Method -- 3.1 Cross-Domain Ensemble Distillation -- 3.2 UniStyle: Removing and Unifying Style Bias -- 3.3 Analysis of Our Method -- 4 Experiments -- 4.1 Generalization in Image Classification -- 4.2 Generalization in Person Re-ID -- 4.3 Generalization in Semantic Segmentation -- 4.4 In-depth Analysis -- 5 Conclusion -- References -- Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Overview -- 3.2 Feature-based Clustering -- 3.3 Consistency-based Classification -- 3.4 Training Procedure -- 4 Experiment -- 4.1 Datasets -- 4.2 Implementation Details -- 4.3 Experimental Results -- 4.4 Ablation Study -- 4.5 Performance Against Class Imbalance -- 4.6 AUC of Noisy vs. Clean Classification -- 5 Conclusion -- References -- Hyperspherical Learning in Multi-Label Classification -- 1 Introduction -- 2 Related Works -- 2.1 Learning from Noisy Labels -- 2.2 Hyperspherical Learning -- 2.3 Label Correlation -- 3 Method -- 3.1 Preliminaries -- 3.2 Learning in Hyperspherical Space -- 3.3 Learning from Single Positive Labels -- 3.4 Adaptive Learning -- 3.5 Label Correlation -- 4 Experiments -- 4.1 Settings -- 4.2 Single Positive Labels -- 4.3 Partial Positive Labels -- 4.4 Full Labels -- 4.5 Ablation Study -- 4.6 Label Correlation -- 5 Conclusion -- References -- When Active Learning Meets Implicit Semantic Data Augmentation -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Implicit Data Augmentation via Semantic Transformation -- 3.2 The Proposed Expected Partial Model Change Maximization -- 4 Experiments -- 4.1 Active Learning for Image Classification.4.2 Active Learning for Semantic Segmentation -- 4.3 Ablation Study and Discussion -- 4.4 Timing Analysis -- 5 Conclusion and Future Work -- References -- VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition -- 1 Introduction -- 2 Related Work -- 2.1 Long-Tailed Visual Recognition -- 2.2 Visual-Linguistic Model -- 3 Methodology -- 3.1 Overall Architecture -- 3.2 Class-Wise Visual-Linguistic Pre-training -- 3.3 Language-Guided Recognition -- 3.4 Loss Function -- 4 Experiments -- 4.1 Datasets -- 4.2 Evaluation Protocol -- 4.3 Experiments on ImageNet-LT -- 4.4 Experiments on Places-LT -- 4.5 iNaturalist 2018 -- 4.6 Ablation Study -- 4.7 Limitations -- 5 Conclusions -- References -- Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization*-10pt -- 1 Introduction -- 2 Related Work -- 3 Common Pipeline: Invariance as Class -- 3.1 Empirical Risk Minimization (ERM) -- 3.2 Invariant Risk Minimization (IRM) -- 3.3 Inverse Probability Weighting (IPW) -- 4 Our Approach: Invariance as Context -- 5 Experiments -- 5.1 Datasets and Settings -- 5.2 Results and Analyses -- 6 Conclusions -- References -- Hierarchical Semi-supervised Contrastive Learning for Contamination-Resistant Anomaly Detection -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Problem Description -- 3.2 Hierarchical Semi-Supervised Contrastive Learning -- 3.3 Training and Inference -- 4 Experiments -- 4.1 Scenario Setup -- 4.2 Implementation Details -- 4.3 Performance Comparison -- 4.4 Ablation Study -- 5 Conclusion -- References -- Tracking by Associating Clips -- 1 Introduction -- 2 Related Work -- 3 Tracking by Associating Clips -- 3.1 Intra Clip Association -- 3.2 Inter Clip Association -- 3.3 Clip Tracker -- 4 Experiments -- 4.1 Datasets -- 4.2 Implementation Details -- 4.3 Main Results.4.4 Ablation Studies -- 4.5 Qualitative Results -- 5 Conclusion -- References -- RealPatch: A Statistical Matching Framework for Model Patching with Real Samples -- 1 Introduction -- 2 Our RealPatch Framework -- 2.1 Stage 1: Statistical Matching -- 2.2 Stage 2: Target Prediction -- 3 Experiments -- 3.1 Reducing Subgroup Performance Gap -- 3.2 Reducing Dataset and Model Leakage -- 4 Limitations and Intended Use -- 5 Conclusions -- References -- Background-Insensitive Scene Text Recognition with Text Semantic Segmentation -- 1 Introduction -- 2 Related Work -- 2.1 Scene Text Recognition -- 2.2 Text Semantic Segmentation -- 2.3 Training Strategy -- 3 STR with Text Segmentation -- 3.1 Overview -- 3.2 Semantic Segmentation Network -- 3.3 Segmentation Refinement -- 3.4 Segmentation Embedding -- 3.5 Optimization -- 4 Experiments -- 4.1 Datasets and Implementation Details -- 4.2 Comparing with State-of-the-Art Methods -- 4.3 Ablation Study -- 5 Conclusion -- References -- Semantic Novelty Detection via Relational Reasoning -- 1 Introduction -- 2 Related Works -- 3 Method -- 3.1 Notation and Background -- 3.2 Representation Learning via Relational Reasoning -- 3.3 Evaluation Process -- 3.4 Relational Module -- 4 Experimental Setup -- 5 Experiments -- 5.1 Intra-domain Analysis -- 5.2 Cross-domain Analysis -- 5.3 OOD with Budget-Limited Finetuning -- 5.4 Open-Set Domain Generalization -- 6 Further Analysis and Discussions -- 7 Conclusions -- References -- Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers -- 1 Introduction -- 2 Related Work -- 3 Attribute Data Preparation -- 4 TAP - Transformer for Attribute Prediction -- 4.1 Model Architecture -- 4.2 Training and Loss Functions -- 5 Experiments -- 5.1 Closed-set Attribute Prediction -- 5.2 Open-vocabulary Attribute Prediction -- 5.3 Ablation Studies.5.4 Closed-set Human-Object Interaction Classification -- 6 Conclusions -- References -- Training Vision Transformers with only 2040 Images -- 1 Introduction -- 2 Related Works -- 3 Method -- 3.1 Analyses on Instance Discrimination -- 3.2 Gradient Analysis -- 4 Experiments -- 4.1 Why Training from Scratch? -- 4.2 Training from Scratch Results -- 4.3 Transfer Ability of Small Datasets -- 4.4 Ablation Studies -- 5 Conclusions -- References -- Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection -- 1 Introduction -- 2 Related Work -- 3 Proposed Method -- 3.1 Learn to Track in LVIS -- 3.2 Learn to Unforget in TAO -- 3.3 Regularizing Semantic Flickering -- 3.4 Unified Learning -- 4 Experiments -- 4.1 Main Results -- 4.2 Ablation Studies -- 4.3 Image to Video Transfer Learning -- 5 Conclusion -- References -- TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs -- 1 Introduction -- 2 Related Work -- 2.1 Attention Modules for CNNs and Feedforward Attention Mechanisms -- 2.2 Top-down Feedback Computation in CNNs -- 3 Top-down (TD) Attention Module -- 4 Experimental Results and Discussion -- 4.1 Large-scale Object Classification (ImageNet-1k) -- 4.2 Attention Visualization and Weakly-Supervised Object Localization -- 4.3 Fine-grained and Multi-label Classification -- 4.4 Ablative Analysis of Feedback Computation -- 5 Conclusions -- References -- Automatic Check-Out via Prototype-Based Classifier Learning from Single-Product Exemplars -- 1 Introduction -- 2 Related Work -- 2.1 Automatic Check-Out -- 2.2 Object Detection -- 2.3 Classifier Boundary Transformation -- 3 Methodology -- 3.1 Overall Framework and Notations -- 3.2 Prototype-Based Classifier Generation -- 3.3 Discriminative Re-Ranking -- 3.4 Loss Functions -- 4 Experiments -- 4.1 Dataset.4.2 Baseline Methods and Implementation Details -- 4.3 Main Results -- 4.4 Ablation Studies -- 5 Conclusion -- References -- Overcoming Shortcut Learning in a Target Domain by Generalizing Basic Visual Factors from a Source Domain -- 1 Introduction -- 2 Related Works -- 3 Methodology -- 3.1 Problem Formulation -- 3.2 Network Architecture -- 3.3 Additional Constraints -- 3.4 Learning Factor Association Matrix A -- 4 Experiments -- 4.1 Compositional Generalization in Fully-Correlated Scenario -- 4.2 Impact of Additional Constraints -- 4.3 Learning Association Matrix for Semi-correlated Scenario -- 4.4 Properties of the Source Domains -- 5 Conclusion -- References -- Photo-realistic Neural Domain Randomization -- 1 Introduction -- 2 Related Work -- 3 Photo-Realistic Neural Domain Randomization -- 3.1 Geometric Scene Representation -- 3.2 Neural Ray Tracer Approximator -- 4 Downstream Tasks -- 5 Experiments -- 5.1 Evaluation Metrics -- 6 Results -- 6.1 HB Dynamic Lighting Benchmark -- 6.2 HB-LM Cross-Domain Adaptation Benchmark -- 6.3 HB Generalization Benchmark -- 6.4 Ablation Study -- 6.5 Monocular Depth Estimation -- 6.6 Object Material and Light Recovery -- 7 Conclusion -- References -- Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning -- 1 Introduction -- 2 Related Work -- 3 Our Approach: Wavelet Vision Transformer -- 3.1 Preliminaries -- 3.2 Wavelets Block -- 3.3 Wavelet Vision Transformer -- 4 Experiments -- 4.1 Image Recognition on ImageNet1K -- 4.2 Object Detection and Instance Segmentation on COCO -- 4.3 Semantic Segmentation on ADE20K -- 4.4 Ablation Study -- 4.5 Visualization of Learnt Visual Representation -- 5 Conclusions -- References -- Tailoring Self-Supervision for Supervised Learning -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 Desired Properties in Supervised Learning.3.2 Localizable Rotation (LoRot).Lecture notes in computer science ;Volume 13685.Computer visionCongressesImage processingDigital techniquesCongressesOptical pattern recognitionCongressesComputer visionImage processingDigital techniquesOptical pattern recognition006.37Avidan ShaiMiAaPQMiAaPQMiAaPQBOOK9910619278603321Computer Vision – ECCV 20222952264UNINA