1.

Record Nr.

UNINA9910619267503321

Titolo

Computer vision - ECCV 2022 . Part XXVIII : 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, proceedings / / Shai Avidan [and four others] (editors)

Pubbl/distr/stampa

Cham, Switzerland : , : Springer, , [2022]

©2022

ISBN

3-031-19815-8

Descrizione fisica

1 online resource (806 pages)

Collana

Lecture notes in computer science ; ; Volume 13688

Disciplina

006.37

Soggetti

Computer vision

Pattern recognition systems

Lingua di pubblicazione

Inglese

Formato

Materiale a stampa

Livello bibliografico

Monografia

Nota di bibliografia

Includes bibliographical references and index.

Nota di contenuto

Intro -- Foreword -- Preface -- Organization -- Contents - Part XXVIII -- Salient Object Detection for Point Clouds*-10pt -- 1 Introduction -- 2 Related Work -- 3 Proposed Dataset -- 3.1 Dataset Construction -- 3.2 Dataset Statistics -- 4 Proposed Method -- 4.1 Overall Architecture -- 4.2 Proposed Modules -- 5 Experiments -- 5.1 Experimental Setup -- 5.2 Comparison and Analysis -- 5.3 Ablation Study -- 6 Conclusion -- References -- Learning Semantic Segmentation from Multiple Datasets with Label Shifts -- 1 Introduction -- 2 Related Work -- 3 Proposed Method -- 3.1 Multi-dataset Semantic Segmentation -- 3.2 Revisited Binary Cross-Entropy Loss -- 3.3 Class-Relational Binary Cross-Entropy Loss -- 3.4 Model Training and Implementation Details -- 4 Experimental Results -- 4.1 Datasets and Experimental Setting -- 4.2 Overall Performance -- 4.3 Results on WildDash2 Benchmark -- 4.4 Qualitative Analysis -- 5 Conclusion -- References -- Weakly Supervised 3D Scene Segmentation with Region-Level Boundary Awareness and Instance Discrimination*-10pt -- 1 Introduction -- 2 Related Work -- 3 Proposed Methodology -- 3.1 Unsupervised Region-Level Boundary Awareness -- 3.2 Unsupervised Region-Level Instance Discrimination -- 3.3 Supervised Learning for Labeled Data -- 3.4 The Overall Optimization Loss Function -- 4 Experiments -- 4.1 Experimental Settings -- 4.2 WSL-Based 3D Semantic Segmentation --



4.3 WSL-Based 3D Instance Segmentation -- 4.4 Ablation Study -- 5 Conclusion -- References -- Towards Open-Vocabulary Scene Graph Generation with Prompt-Based Finetuning -- 1 Introduction -- 2 Related Work -- 3 Problem Definition -- 4 Method -- 4.1 Pretrained Context-Aware Visual-Relation Model -- 4.2 Prompt-Based Finetuning for Ov-SGG -- 5 Experiments -- 5.1 Datasets -- 5.2 Evaluation Settings -- 5.3 Results and Analysis -- 6 Conclusion -- References.

Variance-Aware Weight Initialization for Point Convolutional Neural Networks -- 1 Introduction -- 2 Related Work -- 3 Formalizing Point Cloud Convolution -- 3.1 Discrete Convolution -- 3.2 Continuous Convolution -- 3.3 Zoo Axis 1: Basis -- 3.4 Zoo Axis 2: Integral Estimation -- 4 Weight Initialization -- 4.1 Discrete Convolutions -- 4.2 Continuous Convolutions -- 4.3 Variance Computation -- 5 Experiments -- 5.1 Operators -- 5.2 Variance Evaluation -- 5.3 Classification -- 5.4 Semantic Segmentation -- 6 Limitations -- 7 Conclusions -- References -- Break and Make: Interactive Structural Understanding Using LEGO Bricks -- 1 Introduction -- 2 Related Work -- 2.1 Understanding Compositional Structures -- 2.2 Building 3D Structures -- 3 Task and Data -- 3.1 LEGO Bricks -- 3.2 Environment -- 3.3 Evaluation -- 3.4 Dataset -- 4 Methods -- 4.1 Model -- 4.2 Training -- 4.3 Limitations -- 5 Experiments -- 5.1 Break and Make -- 5.2 Ablations and Failure Analysis -- 6 Conclusion -- References -- Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation -- 1 Introduction -- 2 Related Work -- 2.1 Scene Flow Estimation -- 2.2 Bidirectional Models -- 3 Problem Statement -- 4 Bi-PointFlowNet -- 4.1 Hierarchical Feature Extraction -- 4.2 Bidirectional Flow Embedding -- 4.3 Decomposed Form of Bidirectional Flow Embedding -- 4.4 Upsampling and Warping -- 4.5 Scene Flow Prediction -- 4.6 Loss Function -- 5 Experiments -- 5.1 Experimental Settings -- 5.2 Evaluation Metrics -- 5.3 Training and Evaluation on FlyingThings3D -- 5.4 Generalization on KITTI -- 5.5 Ablation Study -- 5.6 Runtime -- 6 Conclusion -- References -- 3DG-STFM: 3D Geometric Guided Student-Teacher Feature Matching -- 1 Introduction -- 2 Related Work -- 2.1 Learning-Based Dense Local Feature Matching -- 2.2 Student-Teacher Learning -- 3 Method.

3.1 Transformer-Based Local Feature Matching -- 3.2 Coarse-Level Knowledge Distillation -- 3.3 Fine-Level Attentive Knowledge Transfer -- 3.4 Supervision -- 3.5 Implementation Details -- 4 Experiments -- 4.1 Indoor Pose Estimation -- 4.2 Outdoor Pose Estimation -- 4.3 Homography Estimation -- 4.4 Student-Teacher Learning Understanding -- 5 Conclusion -- References -- Video Restoration Framework and Its Meta-adaptations to Data-Poor Conditions -- 1 Introduction -- 2 Literature Survey -- 3 Proposed Video Restoration Framework -- 3.1 Network Architecture -- 3.2 Learning the Model Parameters -- 3.3 Meta-learning Based Adaptation -- 4 Multi-weather Database Generation -- 5 Experiments -- 5.1 Implementation Details -- 5.2 Analysis of Proposed Architecture -- 5.3 Ablation Study on Proposed Architecture -- 5.4 Analysis of Meta-adaptation -- 6 Conclusion -- References -- MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud -- 1 Introduction -- 2 Related Work -- 2.1 Cuboid Fitting on Point Clouds -- 2.2 Solution Search for Scene Understanding -- 3 Method -- 3.1 Generating Cuboid Proposals from Noisy Scans -- 3.2 The Cuboids Arrangement Search Problem -- 3.3 Solution Search Baseline Algorithms -- 3.4 Our Algorithm: MonteBoxFinder -- 4 Experiments -- 4.1 Dataset -- 4.2 Metrics -- 4.3 Evaluation Protocol -- 4.4 Quantitative Results -- 4.5 Qualitative Results -- 5 Conclusion -- References -- Scene Text Recognition with Permuted Autoregressive Sequence Models -- 1 Introduction -- 2



Related Work -- 3 Permuted Autoregressive Sequence Models -- 3.1 Model Architecture -- 3.2 Permutation Language Modeling -- 3.3 Decoding Schemes -- 4 Results and Analysis -- 4.1 Datasets -- 4.2 Training Protocol and Model Selection -- 4.3 Evaluation Protocol and Metrics -- 4.4 Ablation on Training Permutations vs Test Accuracy.

4.5 Comparison to State-of-the-Art (SOTA) -- 5 Conclusion -- References -- When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition -- 1 Introduction -- 2 Related Work -- 2.1 HMER -- 2.2 Object Counting -- 3 Methodology -- 3.1 Overview -- 3.2 Multi-Scale Counting Module -- 3.3 Counting-Combined Attentional Decoder -- 3.4 Loss Function -- 4 Experiments -- 4.1 Datasets -- 4.2 Implementation Details -- 4.3 Evaluation Metrics -- 4.4 Comparison with State-of-the-Art -- 4.5 Results on the HME100K Dataset -- 4.6 Inference Speed -- 4.7 Ablation Study -- 4.8 Case Study with Maps -- 4.9 Limitation -- 5 Conclusion -- References -- Detecting Tampered Scene Text in the Wild -- 1 Introduction -- 2 Related Work -- 2.1 Scene Text Detection -- 2.2 Scene Text Editing and Tampering Detection -- 3 Our Method -- 3.1 The S3R Strategy -- 3.2 The Parallel-branch Feature Extractor -- 3.3 Tampered-IC13 Dataset -- 4 Experiment -- 4.1 Evaluation Metric -- 4.2 Implementation Details -- 4.3 The Evaluation of S3R Strategy -- 4.4 The Effectiveness of Parallel-branch Feature Extractor -- 4.5 Discussion -- 5 Conclusion -- References -- Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning -- 1 Introduction -- 2 Related Works -- 3 Methodology -- 3.1 State and Action Space -- 3.2 Text Recognition-based Reward -- 3.3 BoxDQN Model -- 3.4 Domain Adaptation -- 3.5 Training BoxDQN Model -- 4 Experiments -- 4.1 Datasets -- 4.2 Implementation Details -- 4.3 Qualitative Results -- 4.4 Quantitative Results -- 4.5 Domain Adaption -- 4.6 Ablation Study -- 4.7 Exploration on Arbitrarily-shaped Text Based on Bezier Curves -- 5 Conclusion and Future Work -- References -- GLASS: Global to Local Attention for Scene-Text Spotting -- 1 Introduction -- 2 Background and Related Work -- 3 Method.

3.1 GLASS Fusion Module -- 3.2 Orientation Prediction -- 3.3 Global to Local End-to-end Text Spotting -- 4 Experiments -- 4.1 Datasets -- 4.2 Implementation Details -- 4.3 Comparison with State-of-the-Art -- 4.4 Incorporating Glass into Other Methods -- 4.5 Ablation Study -- 5 Discussion -- References -- COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts -- 1 Introduction -- 2 COO: Comic Onomatopoeia Dataset -- 2.1 Why Use Onomatopoeias of Japanese Comics? -- 2.2 Label Annotation -- 2.3 Dataset Analysis -- 2.4 Comparison with Existing Arbitrary Scene Text Datasets -- 2.5 Truncated Texts in English -- 3 Methods for Three Tasks -- 3.1 Text Detection -- 3.2 Text Recognition -- 3.3 Link Prediction -- 4 Experiment and Analysis -- 4.1 Implementation Detail -- 4.2 Text Detection -- 4.3 Text Recognition -- 4.4 Link Prediction -- 5 Conclusion -- References -- Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting -- 1 Introduction -- 2 Related Work -- 2.1 Scene Text Detection and Spotting -- 2.2 Vision-Language Pre-training -- 3 Methodology -- 3.1 Character-Aware Text Encoder -- 3.2 Visual-Textual Decoder -- 3.3 Network Optimization -- 4 Experiments -- 4.1 Datasets -- 4.2 Implementation Details -- 4.3 Experimental Results -- 4.4 Ablation Studies -- 5 Conclusion -- References -- Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 Overview -- 3.2 Corner-Guided Encoder -- 3.3 Character



Contrastive Loss -- 4 Experiments -- 4.1 WordArt Dataset -- 4.2 Implementation Details -- 4.3 Ablation Study -- 4.4 Performance for Artistic Text Recognition -- 4.5 Evaluation on STR Benchmarks -- 4.6 Further Visualization and Analysis -- 4.7 Limitations -- 5 Conclusion -- References -- Levenshtein OCR.

1 Introduction.