LEADER 10887nam 2200505 450 001 996495567003316 005 20230315124751.0 010 $a3-031-19769-0 035 $a(MiAaPQ)EBC7120767 035 $a(Au-PeEL)EBL7120767 035 $a(CKB)25188970000041 035 $a(PPN)265855934 035 $a(EXLCZ)9925188970000041 100 $a20230315d2022 uy 0 101 0 $aeng 135 $aurcnu|||||||| 181 $ctxt$2rdacontent 182 $cc$2rdamedia 183 $acr$2rdacarrier 200 00$aComputer vision - ECCV 2022$hPart I $e17th European Conference, Tel Aviv, Israel, October 23-27, 2022 : proceedings /$fShai Avidan [and four others] 210 1$aCham, Switzerland :$cSpringer,$d[2022] 210 4$d©2022 215 $a1 online resource (803 pages) 225 1 $aLecture Notes in Computer Science 311 08$aPrint version: Avidan, Shai Computer Vision - ECCV 2022 Cham : Springer,c2022 9783031197680 327 $aIntro -- Foreword -- Preface -- Organization -- Contents - Part I -- Learning Depth from Focus in the Wild -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 A Network for Defocus Image Alignment -- 3.2 Focal Stack-oriented Feature Extraction -- 3.3 Aggregation and Refinement -- 4 Evaluation -- 4.1 Comparisons to State-of-the-art Methods -- 4.2 Ablation Studies -- 5 Conclusion -- References -- Learning-Based Point Cloud Registration for 6D Object Pose Estimation in the Real World -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 Problem Formulation -- 3.2 Method Overview -- 3.3 Match Normalization for Robust Feature Extraction -- 3.4 NLL Loss Function for Stable Training -- 3.5 Network Architectures -- 4 Experiments -- 4.1 Datasets and Training Parameters -- 4.2 Evaluation Metrics -- 4.3 Comparison with Existing Methods -- 4.4 Ablation Study -- 5 Conclusion -- References -- An End-to-End Transformer Model for Crowd Localization -- 1 Introduction -- 2 Related Works -- 2.1 Detection-Based Methods -- 2.2 Map-Based Methods -- 2.3 Regression-Based Methods -- 2.4 Visual Transformer -- 3 Our Method -- 3.1 Transformer Encoder -- 3.2 Transformer Decoder -- 3.3 KMO-Based Matcher -- 3.4 Loss Function -- 4 Experiments -- 4.1 Implementation Details -- 4.2 Dataset -- 4.3 Evaluation Metrics -- 5 Results and Analysis -- 5.1 Crowd Localization -- 5.2 Crowd Counting -- 5.3 Visualizations -- 5.4 Ablation Studies -- 5.5 Limitations -- 6 Conclusion -- References -- Few-Shot Single-View 3D Reconstruction with Memory Prior Contrastive Network -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Memory Network -- 3.2 Prior Module -- 3.3 3D-Aware Contrastive Learning Method -- 3.4 Training Procedure in Few-Shot Settings -- 3.5 Architecture -- 3.6 Loss Function -- 4 Experiment -- 4.1 Experimental Setup -- 4.2 Results on ShapeNet Dataset. 327 $a4.3 Results on Real-world Dataset -- 5 Ablation Study -- 6 Conclusion -- References -- DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection -- 1 Introduction -- 2 Related Work -- 2.1 LiDAR-Based 3D Object Detection -- 2.2 Monocular 3D Object Detection -- 2.3 Estimation of Instance Depth -- 3 Overview and Framework -- 4 Decoupled Instance Depth -- 4.1 Visual Depth -- 4.2 Attribute Depth -- 4.3 Data Augmentation -- 4.4 Depth Uncertainty and Aggregation -- 4.5 Loss Functions -- 5 Experiments -- 5.1 Implementation Details -- 5.2 Dataset and Metrics -- 5.3 Performance on KITTI Benchmark -- 5.4 Ablation Study -- 6 Conclusion -- References -- Adaptive Co-teaching for Unsupervised Monocular Depth Estimation -- 1 Introduction -- 2 Related Work -- 2.1 Unsupervised Monocular Depth Estimation -- 2.2 Knowledge Distillation -- 3 Problem Formulation -- 4 Methodology -- 4.1 MUSTNet: MUlit-STream Ensemble Network -- 4.2 Adaptive Co-teaching Framework -- 4.3 Implementation -- 5 Experiments -- 5.1 Datasets -- 5.2 Quantitative Evaluation -- 5.3 Ablation Studies -- 5.4 Extension of Our Work -- 5.5 Discussion -- 6 Conclusion -- References -- Fusing Local Similarities for Retrieval-Based 3D Orientation Estimation of Unseen Objects -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Problem Formulation -- 3.2 Motivation -- 3.3 Multi-scale Patch-Level Image Comparison -- 3.4 Fast Retrieval -- 3.5 Training and Testing -- 4 Experiments -- 4.1 Implementation Details -- 4.2 Experimental Setup -- 4.3 Experiments on LineMOD and LineMOD-O -- 4.4 Experiments on T-LESS -- 4.5 Ablation Studies -- 5 Conclusion -- References -- Lidar Point Cloud Guided Monocular 3D Object Detection -- 1 Introduction -- 2 Related Work -- 2.1 LiDAR-based 3D Object Detection -- 2.2 Image-Only-Based Monocular 3D Object Detection -- 2.3 Depth-Map-Based Monocular 3D Object Detection. 327 $a3 LiDAR Guided Monocular 3D Detection -- 3.1 High Accuracy Mode -- 3.2 Low Cost Mode -- 4 Applications in Real-World Self-driving System -- 5 Experiments -- 5.1 Implementation Details -- 5.2 Dataset and Metrics -- 5.3 Results on KITTI -- 5.4 Results on Waymo -- 5.5 Comparisons on Pseudo Labels and Manually Annotated Labels -- 5.6 Ablation Studies -- 6 Conclusion -- References -- Structural Causal 3D Reconstruction -- 1 Introduction -- 2 Related Work -- 3 Causal Ordering of Latent Factors Matters -- 3.1 A Motivating Example from Function Approximation -- 3.2 Expressiveness of Representing Conditional Distributions -- 3.3 Modeling Causality in Rendering-Based Decoding -- 3.4 Empirical Evidence on 3D Reconstruction -- 4 Learning Causal Ordering for 3D Reconstruction -- 4.1 General SCR Framework -- 4.2 Learning Dense SCR via Bayesian Optimization -- 4.3 Learning Generic SCR via Optimization Unrolling -- 4.4 Learning Dynamic SCR via Masked Self-attention -- 4.5 Insights and Discussion -- 5 Experiments and Results -- 5.1 Quantitative Results -- 5.2 Qualitative Results -- References -- 3D Human Pose Estimation Using Möbius Graph Convolutional Networks -- 1 Introduction -- 2 Related Work -- 3 Spectral Graph Convolutional Network -- 3.1 Graph Definitions -- 3.2 Graph Fourier Transform -- 3.3 Spectral Graph Convolutional Network -- 3.4 Spectral Graph Filter -- 4 MöbiusGCN -- 4.1 Möbius Transformation -- 4.2 MöbiusGCN -- 4.3 Why MöbiusGCN is a Light Architecture -- 4.4 Discontinuity -- 5 Experimental Results -- 5.1 Datasets and Evaluation Protocols -- 5.2 Implementation Details -- 5.3 Fully-Supervised MöbiusGCN -- 5.4 MöbiusGCN with Reduced Dataset -- 6 Conclusion and Discussion -- References -- Learning to Train a Point Cloud Reconstruction Network Without Matching -- 1 Introduction -- 2 Related Works -- 2.1 Optimization-Based Matching Losses. 327 $a2.2 Generative Adversarial Network -- 3 Methodology -- 3.1 The Architecture of PCLossNet -- 3.2 Training of the Reconstruction Network -- 3.3 Algorithm Analysis -- 4 Experiments -- 4.1 Datasets and Implementation Details -- 4.2 Comparisons with Basic Matching-Based Losses -- 4.3 Comparisons with Discriminators-Based Losses -- 4.4 Comparisons on Training Efficiency -- 4.5 How Is the Training Process Going? -- 4.6 Ablation Study -- 5 Conclusion -- References -- PanoFormer: Panorama Transformer for Indoor 360 Depth Estimation -- 1 Introduction -- 2 Related Work -- 2.1 Panoramic Depth Estimation -- 2.2 Vision Transformer -- 3 PanoFomer -- 3.1 Architecture Overview -- 3.2 Transformer-Customized Spherical Token -- 3.3 Relative Position Embedding -- 3.4 Panorama Self-attention with Token Flow -- 3.5 Objective Function -- 4 Panorama-Specific Metrics -- 5 Experiments -- 5.1 Datasets and Implementations -- 5.2 Comparison Results -- 5.3 Ablation Study -- 5.4 Extensibility -- 6 Conclusion -- References -- Self-supervised Human Mesh Recovery with Cross-Representation Alignment -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Prerequisites -- 3.2 Training Data Synthesis -- 3.3 Individual Coarse-to-Fine Regression -- 3.4 Evidential Cross-Representation Alignment -- 3.5 Loss Function -- 4 Experiments -- 4.1 Datasets -- 4.2 Implementation Details -- 4.3 Quantitative Results -- 4.4 Qualitative Results -- 5 Conclusion -- References -- AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Hand Pose Estimation -- 3.2 Object Pose Estimation -- 3.3 Hand and Object Shape Reconstruction -- 4 Experiments -- 4.1 Benchmarks -- 4.2 Evaluation Metrics -- 4.3 Implementation Details -- 4.4 Hand-Only Experiments on ObMan -- 4.5 Hand-Object Experiments on ObMan -- 4.6 Hand-Object Experiments on DexYCB. 327 $a5 Conclusion -- References -- A Reliable Online Method for Joint Estimation of Focal Length and Camera Rotation -- 1 Introduction -- 2 Prior Work -- 2.1 Image Features and Deviation Measures -- 2.2 Benchmarks -- 2.3 State-of-the-Art Systems -- 3 fR Method -- 3.1 Probabilistic Model -- 3.2 Parameter Search -- 3.3 Error Prediction -- 4 Datasets -- 5 Experiments -- 5.1 Evaluating Deviation Measures -- 5.2 Evaluating Line Segment Detectors -- 5.3 Comparison with State of the Art -- 5.4 Predicting Reliability -- 5.5 Run Time -- 6 Limitations -- 7 Conclusions -- References -- PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 Overview -- 3.2 Stage I: Initial Shape Modeling -- 3.3 Stage II: Joint Optimization with Inverse Rendering -- 4 Experiments -- 4.1 Implementation Details -- 4.2 Dataset -- 4.3 Comparison with MVPS Methods -- 4.4 Comparison with Neural Rendering Based Methods -- 4.5 Method Analysis -- 5 Conclusions -- References -- Share with Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency -- 1 Introduction -- 2 Related Work -- 3 Approach -- 3.1 Structured Autoencoding -- 3.2 Unsupervised Learning with Cross-Instance Consistency -- 3.3 Alternate 3D and Pose Learning -- 4 Experiments -- 4.1 Evaluation on the ShapeNet Benchmark -- 4.2 Results on Real Images -- 4.3 Ablation Study -- 5 Conclusion -- References -- Towards Comprehensive Representation Enhancement in Semantics-Guided Self-supervised Monocular Depth Estimation -- 1 Introduction -- 2 Related Work -- 2.1 Self-supervised Monocular Depth Estimation -- 2.2 Vision Transformer -- 2.3 Deep Metric Learning -- 3 Methods -- 3.1 Proposed Model -- 3.2 Photometric Loss and Edge-Aware Smoothness Loss -- 3.3 Hardest Non-boundary Triplet Loss with Minimum-Distance Based Candidate Mining Strategy. 327 $a4 Experiments. 410 0$aLecture notes in computer science. 606 $aComputer vision$vCongresses 606 $aPattern recognition systems$vCongresses 615 0$aComputer vision 615 0$aPattern recognition systems 676 $a006.37 702 $aShai Avidan 801 0$bMiAaPQ 801 1$bMiAaPQ 801 2$bMiAaPQ 906 $aBOOK 912 $a996495567003316 996 $aComputer Vision ? ECCV 2022$92952264 997 $aUNISA