My Arxiv Daily
Author:baiyucraft
BLog: baiyucraft’s Home
Updated on 2023.10.25 09:31
MOT
Publish Date | Title | Authors | arxiv | Code | |
---|---|---|---|---|---|
2023-10-23 | Achromatic, planar Fresnel-Reflector for a Single-beam Magneto-optical Trap | Saskia Bondza et.al. | 2310.14861 | :mortar_board: | None |
2023-10-23 | Label Space Partition Selection for Multi-Object Tracking Using Two-Layer Partitioning | Ji Youn Lee et.al. | 2310.14506 | :mortar_board: | None |
2023-10-23 | Player Re-Identification Using Body Part Appearences | Mahesh Bhosale et.al. | 2310.14469 | :mortar_board: | None |
2023-10-22 | Neural Text Sanitization with Privacy Risk Indicators: An Empirical Analysis | Anthi Papadopoulou et.al. | 2310.14312 | :mortar_board: | None |
2023-10-22 | Deep MDP: A Modular Framework for Multi-Object Tracking | Abhineet Singh et.al. | 2310.14294 | :mortar_board: | Code |
2023-10-20 | EarlyBird: Early-Fusion for Multi-View Tracking in the Bird’s Eye View | Torben Teepe et.al. | 2310.13350 | :mortar_board: | Code |
2023-10-19 | Deep Learning Techniques for Video Instance Segmentation: A Survey | Chenhao Xu et.al. | 2310.12393 | :mortar_board: | None |
2023-10-18 | Runner re-identification from single-view video in the open-world setting | Tomohiro Suzuki et.al. | 2310.11700 | :mortar_board: | None |
2023-10-17 | Learning Comprehensive Representations with Richer Self for Text-to-Image Person Re-Identification | Shuanglin Yan et.al. | 2310.11210 | :mortar_board: | None |
2023-10-13 | Pairwise Similarity Learning is SimPLE | Yandong Wen et.al. | 2310.09449 | :mortar_board: | None |
2023-10-12 | Progress towards ultracold Sr for the AION project – sub-microkelvin atoms and an optical-heterodyne diagnostic tool for injection-locked laser diodes | E. Pasatembou et.al. | 2310.08500 | :mortar_board: | None |
2023-10-12 | Beyond Sharing Weights in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification | Xingyue Liu et.al. | 2310.08026 | :mortar_board: | None |
2023-10-11 | ProtoHPE: Prototype-guided High-frequency Patch Enhancement for Visible-Infrared Person Re-identification | Guiwei Zhang et.al. | 2310.07552 | :mortar_board: | None |
2023-10-10 | Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination | Siyuan Jiang et.al. | 2310.06339 | :mortar_board: | None |
2023-10-09 | Joint object detection and re-identification for 3D obstacle multi-camera systems | Irene Cortés et.al. | 2310.05785 | :mortar_board: | None |
2023-10-08 | Multi-Ship Tracking by Robust Similarity metric | Hongyu Zhao et.al. | 2310.05171 | :mortar_board: | None |
2023-10-07 | Comparative study of multi-person tracking methods | Denis Mbey Akola et.al. | 2310.04825 | :mortar_board: | None |
2023-10-06 | Alice Benchmarks: Connecting Real World Object Re-Identification with the Synthetic | Xiaoxiao Sun et.al. | 2310.04416 | :mortar_board: | None |
2023-10-06 | VI-Diff: Unpaired Visible-Infrared Translation Diffusion Model for Single Modality Labeled Visible-Infrared Person Re-identification | Han Huang et.al. | 2310.04122 | :mortar_board: | None |
2023-10-04 | COOLer: Class-Incremental Learning for Appearance-Based Multiple Object Tracking | Zhizheng Liu et.al. | 2310.03006 | :mortar_board: | Code |
2023-10-04 | ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking | Tara Sadjadpour et.al. | 2310.02532 | :mortar_board: | None |
2023-10-03 | DARTH: Holistic Test-time Adaptation for Multiple Object Tracking | Mattia Segu et.al. | 2310.01926 | :mortar_board: | Code |
2023-10-02 | Offline Tracking with Object Permanence | Xianzhong Liu et.al. | 2310.01288 | :mortar_board: | None |
2023-10-02 | Strength in Diversity: Multi-Branch Representation Learning for Vehicle Re-Identification | Eurico Almeida et.al. | 2310.01129 | :mortar_board: | Code |
2023-10-02 | LoCUS: Learning Multiscale 3D-consistent Features from Posed Images | Dominik A. Kloepfer et.al. | 2310.01095 | :mortar_board: | None |
2023-09-30 | Magneto-optical trap reaction microscope for photoioization of cold strontium atoms | Shushu Ruan et.al. | 2310.00389 | :mortar_board: | None |
2023-09-30 | Walking = Traversable? : Traversability Prediction via Multiple Human Object Tracking under Occlusion | Jonathan Tay Yu Liang et.al. | 2310.00242 | :mortar_board: | None |
2023-09-29 | Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification | Tiantian Gong et.al. | 2309.17104 | :mortar_board: | None |
2023-09-29 | SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features | Song Wang et.al. | 2309.16987 | :mortar_board: | None |
2023-09-28 | Hyperfine structure of the state of AlCl and its relevance to laser cooling and trapping | J. R. Daniel et.al. | 2309.16835 | :mortar_board: | None |
2023-09-27 | AaP-ReID: Improved Attention-Aware Person Re-identification | Vipin Gautam et.al. | 2309.15780 | :mortar_board: | None |
2023-09-27 | 3D Multiple Object Tracking on Autonomous Driving: A Literature Review | Peng Zhang et.al. | 2309.15411 | :mortar_board: | None |
2023-09-26 | A Quantitative Information Flow Analysis of the Topics API | Mário S. Alvim et.al. | 2309.14746 | :mortar_board: | None |
2023-09-25 | Magneto-optical trap performance for high-bandwidth applications | Benjamin Adams et.al. | 2309.14026 | :mortar_board: | None |
2023-09-24 | Combining Two Adversarial Attacks Against Person Re-Identification Systems | Eduardo de O. Andrade et.al. | 2309.13763 | :mortar_board: | None |
2023-09-24 | Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset | Arthur Zhang et.al. | 2309.13549 | :mortar_board: | Code |
2023-09-23 | AgriSORT: A Simple Online Real-time Tracking-by-Detection framework for robotics in precision agriculture | Leonardo Saraceni et.al. | 2309.13393 | :mortar_board: | None |
2023-09-23 | YOLORe-IDNet: An Efficient Multi-Camera System for Person-Tracking | Vipin Gautam et.al. | 2309.13387 | :mortar_board: | None |
2023-09-21 | Human Following in Mobile Platforms with Person Re-Identification | Mario Srouji et.al. | 2309.12479 | :mortar_board: | None |
2023-09-21 | DIOR: Dataset for Indoor-Outdoor Reidentification – Long Range 3D/2D Skeleton Gait Collection Pipeline, Semi-Automated Gait Keypoint Labeling and Baseline Evaluation Methods | Yuyang Chen et.al. | 2309.12429 | :mortar_board: | None |
2023-09-21 | BASE: Probably a Better Approach to Multi-Object Tracking | Martin Vonheim Larsen et.al. | 2309.12035 | :mortar_board: | None |
2023-09-21 | Person Re-Identification for Robot Person Following with Online Continual Learning | Hanjing Ye et.al. | 2309.11727 | :mortar_board: | None |
2023-09-20 | PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement | Chengyou Jia et.al. | 2309.11125 | :mortar_board: | None |
2023-09-19 | OccluTrack: Rethinking Awareness of Occlusion for Enhancing Multiple Pedestrian Tracking | Jianjun Gao et.al. | 2309.10360 | :mortar_board: | None |
2023-09-18 | Localization-Guided Track: A Deep Association Multi-Object Tracking Framework Based on Localization Confidence of Detections | Ting Meng et.al. | 2309.09765 | :mortar_board: | Code |
2023-09-18 | Moving Object Detection and Tracking with 4D Radar Point Cloud | Zhijun Pan et.al. | 2309.09737 | :mortar_board: | None |
2023-09-15 | Beyond Domain Gap: Exploiting Subjectivity in Sketch-Based Person Retrieval | Kejun Lin et.al. | 2309.08372 | :mortar_board: | Code |
2023-09-13 | Tracking Particles Ejected From Active Asteroid Bennu With Event-Based Vision | Loïc J. Azzalini et.al. | 2309.06819 | :mortar_board: | None |
2023-09-12 | The Influence of Contrast and Temporal Expansion on the Marching-on-in-Time Contrast Current Density Volume Integral Equation | Petrus W. N. van Diepen et.al. | 2309.06321 | :mortar_board: | None |
2023-09-12 | Modality Unifying Network for Visible-Infrared Person Re-Identification | Hao Yu et.al. | 2309.06262 | :mortar_board: | None |
2023-09-12 | Which Framework is Suitable for Online 3D Multi-Object Tracking for Autonomous Driving with Automotive 4D Imaging Radar? | Jianan Liu et.al. | 2309.06036 | :mortar_board: | None |
2023-09-12 | SoccerNet 2023 Challenges Results | Anthony Cioppa et.al. | 2309.06006 | :mortar_board: | Code |
2023-09-09 | DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions | Teng Fu et.al. | 2309.04682 | :mortar_board: | None |
2023-09-09 | BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification | Takuro Fujii et.al. | 2309.04675 | :mortar_board: | None |
2023-09-07 | Region Generation and Assessment Network for Occluded Person Re-Identification | Shuting He et.al. | 2309.03558 | :mortar_board: | None |
2023-09-07 | Genericity of singularities in spacetimes with weakly trapped submanifolds | Ivan Pontual Costa e Silva et.al. | 2309.03421 | :mortar_board: | None |
2023-09-06 | FishMOT: A Simple and Effective Method for Fish Tracking Based on IoU Matching | Shuo Liu et.al. | 2309.02975 | :mortar_board: | Code |
2023-09-06 | Fast and Resource-Efficient Object Tracking on Edge Devices: A Measurement Study | Sanjana Vijay Ganesh et.al. | 2309.02666 | :mortar_board: | None |
2023-09-04 | Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification | Zhiyin Shao et.al. | 2309.01420 | :mortar_board: | None |
2023-09-03 | Spatial-temporal Vehicle Re-identification | Hye-Geun Kim et.al. | 2309.01166 | :mortar_board: | None |
2023-09-03 | UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with Geometric Topology Guidance | Son Tran et.al. | 2309.01078 | :mortar_board: | None |
2023-09-02 | Tracking without Label: Unsupervised Multiple Object Tracking via Contrastive Similarity Learning | Sha Meng et.al. | 2309.00942 | :mortar_board: | None |
2023-09-01 | Object-Centric Multiple Object Tracking | Zixu Zhao et.al. | 2309.00233 | :mortar_board: | Code |
2023-08-31 | Illumination Distillation Framework for Nighttime Person Re-Identification and A New Benchmark | Andong Lu et.al. | 2308.16486 | :mortar_board: | Code |
2023-08-30 | Occlusion-Aware Detection and Re-ID Calibrated Network for Multi-Object Tracking | Yukun Su et.al. | 2308.15795 | :mortar_board: | None |
2023-08-29 | Learning Cross-modality Information Bottleneck Representation for Heterogeneous Person Re-Identification | Haichao Shi et.al. | 2308.15063 | :mortar_board: | None |
2023-08-27 | Semantic-aware Consistency Network for Cloth-changing Person Re-Identification | Peini Guo et.al. | 2308.14113 | :mortar_board: | Code |
2023-08-25 | ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking | Cheng-Che Cheng et.al. | 2308.13229 | :mortar_board: | Code |
2023-08-23 | Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification | Geon Lee et.al. | 2308.11901 | :mortar_board: | None |
2023-08-23 | HashReID: Dynamic Network with Binary Codes for Efficient Person Re-identification | Kshitij Nikhal et.al. | 2308.11900 | :mortar_board: | None |
2023-08-23 | Multi-object Detection, Tracking and Prediction in Rugged Dynamic Environments | Shixing Huang et.al. | 2308.11870 | :mortar_board: | None |
2023-08-22 | (Un)fair Exposure in Deep Face Rankings at a Distance | Andrea Atzori et.al. | 2308.11732 | :mortar_board: | None |
2023-08-22 | Delving into Motion-Aware Matching for Monocular 3D Object Tracking | Kuan-Chih Huang et.al. | 2308.11607 | :mortar_board: | Code |
2023-08-22 | TrackFlow: Multi-Object Tracking with Normalizing Flows | Gianluca Mancusi et.al. | 2308.11513 | :mortar_board: | None |
2023-08-22 | TOPIC: A Parallel Association Paradigm for Multi-Object Tracking under Complex Motions and Diverse Scenes | Xiaoyan Cao et.al. | 2308.11157 | :mortar_board: | Code |
2023-08-22 | Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models | Alex Nyffenegger et.al. | 2308.11103 | :mortar_board: | Code |
2023-08-21 | Rethinking Person Re-identification from a Projection-on-Prototypes Perspective | Qizao Wang et.al. | 2308.10717 | :mortar_board: | None |
2023-08-21 | Color Prompting for Data-Free Continual Unsupervised Domain Adaptive Person Re-Identification | Jianyang Gu et.al. | 2308.10716 | :mortar_board: | Code |
2023-08-21 | Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification | Qizao Wang et.al. | 2308.10692 | :mortar_board: | None |
2023-08-21 | Learning Clothing and Pose Invariant 3D Shape Representation for Long-Term Person Re-Identification | Feng Liu et.al. | 2308.10658 | :mortar_board: | None |
2023-08-19 | Noisy-Correspondence Learning for Text-to-Image Person Re-identification | Yang Qin et.al. | 2308.09911 | :mortar_board: | Code |
2023-08-19 | LEGO: Learning and Graph-Optimized Modular Tracker for Online Multi-Object Tracking with Point Clouds | Zhenrong Zhang et.al. | 2308.09908 | :mortar_board: | None |
2023-08-19 | DiffusionTrack: Diffusion Model For Multi-Object Tracking | Run Luo et.al. | 2308.09905 | :mortar_board: | None |
2023-08-17 | Identity-Aware Semi-Supervised Learning for Comic Character Re-Identification | Gürkan Soykan et.al. | 2308.09096 | :mortar_board: | None |
2023-08-17 | Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification | Zhaopeng Dou et.al. | 2308.08887 | :mortar_board: | Code |
2023-08-17 | BOTT: Box Only Transformer Tracker for 3D Object Tracking | Lubing Zhou et.al. | 2308.08753 | :mortar_board: | None |
2023-08-16 | Privacy at Risk: Exploiting Similarities in Health Data for Identity Inference | Lucas Lange et.al. | 2308.08310 | :mortar_board: | None |
2023-08-15 | AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary Pedestrian Attributes | Yunhao Li et.al. | 2308.07537 | :mortar_board: | None |
2023-08-14 | FOLT: Fast Multiple Object Tracking from UAV-captured Videos Based on Optical Flow | Mufeng Yao et.al. | 2308.07207 | :mortar_board: | None |
2023-08-12 | 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking | Shuxiao Ding et.al. | 2308.06635 | :mortar_board: | Code |
2023-08-11 | Combining feature aggregation and geometric similarity for re-identification of patterned animals | Veikka Immonen et.al. | 2308.06335 | :mortar_board: | None |
2023-08-11 | Collaborative Tracking Learning for Frame-Rate-Insensitive Multi-Object Tracking | Yiheng Liu et.al. | 2308.05911 | :mortar_board: | None |
2023-08-09 | An End-to-End Framework of Road User Detection, Tracking, and Prediction from Monocular Images | Hao Cheng et.al. | 2308.05026 | :mortar_board: | None |
2023-08-09 | Tracking Players in a Badminton Court by Two Cameras | Young-Ching Chou et.al. | 2308.04872 | :mortar_board: | None |
2023-08-08 | 1st Place Solution for CVPR2023 BURST Long Tail and Open World Challenges | Kaer Huang et.al. | 2308.04598 | :mortar_board: | None |
2023-08-08 | Person Re-Identification without Identification via Event Anonymization | Shafiq Ahmad et.al. | 2308.04402 | :mortar_board: | Code |
2023-08-08 | Multi-level Map Construction for Dynamic Scenes | Xinggang Hu et.al. | 2308.04000 | :mortar_board: | Code |
2023-08-07 | Video-based Person Re-identification with Long Short-Term Representation Learning | Xuehu Liu et.al. | 2308.03703 | :mortar_board: | None |
2023-08-07 | Part-Aware Transformer for Generalizable Person Re-identification | Hao Ni et.al. | 2308.03322 | :mortar_board: | None |
2023-08-04 | Exploring Part-Informed Visual-Language Learning for Person Re-Identification | Yin Lin et.al. | 2308.02738 | :mortar_board: | None |
2023-08-03 | ReIDTrack: Multi-Object Track and Segmentation Without Motion | Kaer Huang et.al. | 2308.01622 | :mortar_board: | None |
2023-08-02 | A Hybrid Approach To Real-Time Multi-Object Tracking | Vincenzo Mariano Scarrica et.al. | 2308.01248 | :mortar_board: | None |
2023-08-02 | Towards Discriminative Representation with Meta-learning for Colonoscopic Polyp Re-Identification | Suncheng Xiang et.al. | 2308.00929 | :mortar_board: | None |
2023-08-01 | Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking | Mingzhan Yang et.al. | 2308.00783 | :mortar_board: | Code |
2023-08-01 | Loading of a large Yb MOT on the S-P transition | Hector Letellier et.al. | 2308.00387 | :mortar_board: | None |
2023-08-01 | Advancing Frame-Dropping in Multi-Object Tracking-by-Detection Systems Through Event-Based Detection Triggering | Matti Henning et.al. | 2308.00330 | :mortar_board: | None |
2023-07-31 | A Trajectory K-Anonymity Model Based on Point Density and Partition | Wanshu Yu et.al. | 2307.16849 | :mortar_board: | None |
2023-07-31 | Poly-MOT: A Polyhedral Framework For 3D Multi-Object Tracking | Xiaoyu Li et.al. | 2307.16675 | :mortar_board: | Code |
2023-07-28 | MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking | Ruopeng Gao et.al. | 2307.15700 | :mortar_board: | None |
2023-07-28 | Uncertainty-aware Unsupervised Multi-Object Tracking | Kai Liu et.al. | 2307.15409 | :mortar_board: | None |
2023-07-27 | The detection and rectification for identity-switch based on unfalsified control | Junchao Huang et.al. | 2307.14591 | :mortar_board: | None |
2023-07-26 | Large-scale Fully-Unsupervised Re-Identification | Gabriel Bertocco et.al. | 2307.14278 | :mortar_board: | None |
2023-07-24 | Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-Identification | Haocong Rao et.al. | 2307.12917 | :mortar_board: | Code |
2023-07-24 | CTVIS: Consistent Training for Online Video Instance Segmentation | Kaining Ying et.al. | 2307.12616 | :mortar_board: | Code |
2023-07-20 | Learning Discriminative Visual-Text Representation for Polyp Re-Identification | Suncheng Xiang et.al. | 2307.10625 | :mortar_board: | Code |
2023-07-18 | Balancing Privacy and Progress in Artificial Intelligence: Anonymization in Histopathology for Biomedical Research and Education | Neel Kanwal et.al. | 2307.09426 | :mortar_board: | None |
2023-07-18 | Pixel-wise Graph Attention Networks for Person Re-identification | Wenyu Zhang et.al. | 2307.09183 | :mortar_board: | Code |
2023-07-17 | Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-Identification | Tengfei Liang et.al. | 2307.08316 | :mortar_board: | None |
2023-07-14 | Implementing an electronic sideband offset lock for precision spectroscopy in radium | Tenzin Rabga et.al. | 2307.07646 | :mortar_board: | None |
2023-07-14 | Erasing, Transforming, and Noising Defense Network for Occluded Person Re-Identification | Neng Dong et.al. | 2307.07187 | :mortar_board: | None |
2023-07-14 | TVPR: Text-to-Video Person Retrieval and a New Benchmark | Fan Ni et.al. | 2307.07184 | :mortar_board: | None |
2023-07-13 | Domain-adaptive Person Re-identification without Cross-camera Paired Samples | Huafeng Li et.al. | 2307.06533 | :mortar_board: | None |
2023-07-12 | Multi-Object Tracking as Attention Mechanism | Hiroshi Fukui et.al. | 2307.05874 | :mortar_board: | None |
2023-07-09 | HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding | Hao Zheng et.al. | 2307.05721 | :mortar_board: | None |
2023-07-11 | High density loading and collisional loss of laser cooled molecules in an optical trap | Varun Jorapur et.al. | 2307.05347 | :mortar_board: | None |
2023-07-11 | MinkSORT: A 3D deep feature extractor using sparse convolutions to improve 3D multi-object tracking in greenhouse tomato plants | David Rapado-Rincon et.al. | 2307.05219 | :mortar_board: | None |
2023-07-08 | Adversarial Self-Attack Defense and Spatial-Temporal Relation Mining for Visible-Infrared Video Person Re-Identification | Huafeng Li et.al. | 2307.03903 | :mortar_board: | None |
2023-07-06 | Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification | Mahdi Alehdaghi et.al. | 2307.03240 | :mortar_board: | None |
2023-07-06 | Smartphones in a Microwave: Formal and Experimental Feasibility Study on Fingerprinting the Corona-Warn-App | Henrik Graßhoff et.al. | 2307.02931 | :mortar_board: | None |
2023-07-05 | Multi Object Tracking for Predictive Collision Avoidance | Bruk Gebregziabher et.al. | 2307.02161 | :mortar_board: | None |
2023-07-01 | Improving CNN-based Person Re-identification using score Normalization | Ammar Chouchane et.al. | 2307.00397 | :mortar_board: | None |
2023-06-29 | MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion | Ce Zhang et.al. | 2306.17000 | :mortar_board: | None |
2023-06-29 | Trajectory Poisson multi-Bernoulli mixture filter for traffic monitoring using a drone | Ángel F. García-Fernández et.al. | 2306.16890 | :mortar_board: | None |
2023-06-27 | DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-bit CNNs | Yanjing Li et.al. | 2306.15390 | :mortar_board: | None |
2023-06-27 | On Gibbs Sampling Architecture for Labeled Random Finite Sets Multi-Object Tracking | Anthony Trezza et.al. | 2306.15135 | :mortar_board: | None |
2023-06-25 | A Novel Dual-pooling Attention Module for UAV Vehicle Re-identification | Xiaoyan Guo et.al. | 2306.14104 | :mortar_board: | None |
2023-06-23 | Segmentation and Tracking of Vegetable Plants by Exploiting Vegetable Shape Feature for Precision Spray of Agricultural Robots | Nan Hu et.al. | 2306.13518 | :mortar_board: | Code |
2023-06-23 | Deep macroscopic pure-optical potential for laser cooling and trapping of neutral atoms without using a magneto-optical trap | O. N. Prudnikov et.al. | 2306.13294 | :mortar_board: | None |
2023-06-22 | Iterative Scale-Up ExpansionIoU and Deep Features Association for Multi-Object Tracking in Sports | Hsiang-Wei Huang et.al. | 2306.13074 | :mortar_board: | Code |
2023-06-21 | Generalizable Metric Network for Cross-domain Person Re-identification | Lei Qi et.al. | 2306.11991 | :mortar_board: | None |
2023-06-20 | Data-Driven but Privacy-Conscious: Pedestrian Dataset De-identification via Full-Body Person Synthesis | Maxim Maximov et.al. | 2306.11710 | :mortar_board: | None |
2023-06-16 | Lightweight Attribute Localizing Models for Pedestrian Attribute Recognition | Ashish Jha et.al. | 2306.09822 | :mortar_board: | Code |
2023-06-16 | UTOPIA: Unconstrained Tracking Objects without Preliminary Examination via Cross-Domain Adaptation | Pha Nguyen et.al. | 2306.09613 | :mortar_board: | None |
2023-06-15 | Knowledge Assembly: Semi-Supervised Multi-Task Learning from Multiple Datasets with Disjoint Labels | Federica Spinola et.al. | 2306.08839 | :mortar_board: | None |
2023-06-15 | Graph Convolution Based Efficient Re-Ranking for Visual Retrieval | Yuqi Zhang et.al. | 2306.08792 | :mortar_board: | Code |
2023-06-14 | Self-Supervised Polyp Re-Identification in Colonoscopy | Yotam Intrator et.al. | 2306.08591 | :mortar_board: | None |
2023-06-13 | Marking anything: application of point cloud in extracting video target features | Xiangchun Xu et.al. | 2306.07559 | :mortar_board: | None |
2023-06-13 | Retrieve Anyone: A General-purpose Person Re-identification Task with Instructions | Weizhen He et.al. | 2306.07520 | :mortar_board: | None |
2023-06-10 | Vista-Morph: Unsupervised Image Registration of Visible-Thermal Facial Pairs | Catherine Ordun et.al. | 2306.06505 | :mortar_board: | None |
2023-06-09 | TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses | Xuesong Chen et.al. | 2306.05888 | :mortar_board: | None |
2023-06-09 | A Dual-Source Attention Transformer for Multi-Person Pose Tracking | Andreas Doering et.al. | 2306.05807 | :mortar_board: | None |
2023-06-08 | Tracking Objects with 3D Representation from Videos | Jiawei He et.al. | 2306.05416 | :mortar_board: | None |
2023-06-08 | SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth | Zelin Liu et.al. | 2306.05238 | :mortar_board: | Code |
2023-06-08 | Population-Based Evolutionary Gaming for Unsupervised Person Re-identification | Yunpeng Zhai et.al. | 2306.05236 | :mortar_board: | None |
2023-06-08 | On the Robustness of Topics API to a Re-Identification Attack | Nikhil Jha et.al. | 2306.05094 | :mortar_board: | Code |
2023-06-06 | Real-Time Online Unsupervised Domain Adaptation for Real-World Person Re-identification | Christopher Neff et.al. | 2306.03993 | :mortar_board: | None |
2023-06-05 | Differentially Private Cross-camera Person Re-identification | Lucas Maris et.al. | 2306.02765 | :mortar_board: | None |
2023-06-05 | MotionTrack: Learning Motion Predictor for Multiple Object Tracking | Changcheng Xiao et.al. | 2306.02585 | :mortar_board: | None |
2023-06-02 | Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work | Qiangchang Wang et.al. | 2306.01929 | :mortar_board: | None |
2023-06-02 | Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models | Virginia Fernandez et.al. | 2306.01322 | :mortar_board: | None |
2023-06-01 | Design and simulation of a source of cold cadmium for atom interferometry | Satvika Bandarupally et.al. | 2306.00782 | :mortar_board: | None |
2023-05-31 | Dictionary Learning under Symmetries via Group Representations | Subhroshekhar Ghosh et.al. | 2305.19557 | :mortar_board: | None |
2023-05-28 | Z-GMOT: Zero-shot Generic Multiple Object Tracking | Kim Hoang Tran et.al. | 2305.17648 | :mortar_board: | None |
2023-05-26 | Linear Object Detection in Document Images using Multiple Object Tracking | Philippe Bernet et.al. | 2305.16968 | :mortar_board: | None |
2023-05-26 | Fast refacing of MR images with a generative neural network lowers re-identification risk and preserves volumetric consistency | Nataliia Molchanova et.al. | 2305.16922 | :mortar_board: | Code |
2023-05-26 | Blue-detuned molecular magneto-optical trap schemes based on bayesian optimization | S. Xu et.al. | 2305.16576 | :mortar_board: | None |
2023-05-26 | Tree-Based Diffusion Schrödinger Bridge with Applications to Wasserstein Barycenters | Maxence Noble et.al. | 2305.16557 | :mortar_board: | Code |
2023-05-25 | Camera-Incremental Object Re-Identification with Identity Knowledge Evolution | Hantao Yao et.al. | 2305.15909 | :mortar_board: | Code |
2023-05-25 | Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language | Nicola Messina et.al. | 2305.15842 | :mortar_board: | Code |
2023-05-25 | Multi-query Vehicle Re-identification: Viewpoint-conditioned Network, Unified Dataset and New Metric | Aihua Zheng et.al. | 2305.15764 | :mortar_board: | None |
2023-05-25 | Dynamic Enhancement Network for Partial Multi-modality Person Re-identification | Aihua Zheng et.al. | 2305.15762 | :mortar_board: | None |
2023-05-24 | Reducing Rydberg state dc polarizability by microwave dressing | J. C. Bohorquez et.al. | 2305.15200 | :mortar_board: | None |
2023-05-23 | MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking | En Yu et.al. | 2305.14298 | :mortar_board: | None |
2023-05-23 | Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification | Aihua Zheng et.al. | 2305.13659 | :mortar_board: | None |
2023-05-23 | MaskCL: Semantic Mask-Driven Contrastive Learning for Unsupervised Person Re-Identification with Clothes Change | Mingkun Li et.al. | 2305.13600 | :mortar_board: | None |
2023-05-22 | Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking | Feng Yan et.al. | 2305.12724 | :mortar_board: | Code |
2023-05-22 | Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement | De Cheng et.al. | 2305.12711 | :mortar_board: | None |
2023-05-22 | Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID | De cheng et.al. | 2305.12673 | :mortar_board: | None |
2023-05-17 | Towards Object Re-Identification from Point Clouds for 3D MOT | Benjamin Thérien et.al. | 2305.10210 | :mortar_board: | None |
2023-05-17 | STrack: Self-supervised Tracking with Soft Assignment Flow | Fatemeh Azimi et.al. | 2305.09981 | :mortar_board: | None |
2023-05-16 | SCTracker: Multi-object tracking with shape and confidence constraints | Huan Mao et.al. | 2305.09523 | :mortar_board: | None |
2023-05-15 | DopUS-Net: Quality-Aware Robotic Ultrasound Imaging based on Doppler Signal | Zhongliang Jiang et.al. | 2305.08938 | :mortar_board: | Code |
2023-05-15 | GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training | Xiaoyu Tian et.al. | 2305.08808 | :mortar_board: | Code |
2023-05-15 | Non-Separable Multi-Dimensional Network Flows for Visual Computing | Viktoria Ehm et.al. | 2305.08628 | :mortar_board: | None |
2023-05-12 | Grating magneto-optical traps with complicated level structures | D. S. Barker et.al. | 2305.07732 | :mortar_board: | None |
2023-05-10 | Clothes-Invariant Feature Learning by Causal Intervention for Clothes-Changing Person Re-identification | Xulin Li et.al. | 2305.06145 | :mortar_board: | None |
2023-05-09 | MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts | Xiaonan Li et.al. | 2305.05181 | :mortar_board: | None |
2023-05-08 | Simulations of a frequency-chirped magneto-optical trap of MgF | Kayla J. Rodriguez et.al. | 2305.04879 | :mortar_board: | None |
2023-05-05 | A Race Track Trapped-Ion Quantum Processor | S. A. Moses et.al. | 2305.03828 | :mortar_board: | Code |
2023-05-03 | Imaging a Li Atom In An Optical Tweezer 2000 Times with -Enhanced Gray Molasses | Karl N. Blodgett et.al. | 2305.02405 | :mortar_board: | None |
2023-04-30 | LIMOT: A Tightly-Coupled System for LiDAR-Inertial Odometry and Multi-Object Tracking | Zhongyang Zhu et.al. | 2305.00406 | :mortar_board: | None |
2023-04-29 | Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data | Arthur Josi et.al. | 2305.00320 | :mortar_board: | Code |
2023-04-27 | Deeply-Coupled Convolution-Transformer with Spatial-temporal Complementary Learning for Video-based Person Re-identification | Xuehu Liu et.al. | 2304.14122 | :mortar_board: | None |
2023-04-25 | Self-Supervised Multi-Object Tracking From Consistency Across Timescales | Christopher Lang et.al. | 2304.13147 | :mortar_board: | None |
2023-04-25 | Pseudo Labels Refinement with Intra-camera Similarity for Unsupervised Person Re-identification | Pengna Li et.al. | 2304.12634 | :mortar_board: | None |
2023-04-24 | MOTLEE: Distributed Mobile Multi-Object Tracking with Localization Error Elimination | Mason B. Peterson et.al. | 2304.12175 | :mortar_board: | None |
2023-04-19 | Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification | Suncheng Xiang et.al. | 2304.09498 | :mortar_board: | Code |
2023-04-19 | Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and Spatio-Temporal Consistency ID Re-Assignment | Hsiang-Wei Huang et.al. | 2304.09471 | :mortar_board: | Code |
2023-04-18 | You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking | Xiyang Wang et.al. | 2304.08709 | :mortar_board: | Code |
2023-04-17 | OVTrack: Open-Vocabulary Multiple Object Tracking | Siyuan Li et.al. | 2304.08408 | :mortar_board: | None |
2023-04-17 | The Impact of Frame-Dropping on Performance and Energy Consumption for Multi-Object Tracking | Matti Henning et.al. | 2304.08152 | :mortar_board: | None |
2023-04-16 | Ontology for Healthcare Artificial Intelligence Privacy in Brazil | Tiago Andres Vaz et.al. | 2304.07889 | :mortar_board: | None |
2023-04-16 | Bent & Broken Bicycles: Leveraging synthetic data for damaged object re-identification | Luca Piano et.al. | 2304.07883 | :mortar_board: | None |
2023-04-16 | A Novel end-to-end Framework for Occluded Pixel Reconstruction with Spatio-temporal Features for Improved Person Re-identification | Prathistith Raj Medi et.al. | 2304.07721 | :mortar_board: | None |
2023-04-16 | Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the Heads | Yu Zhang et.al. | 2304.07705 | :mortar_board: | None |
2023-04-12 | Measuring Re-identification Risk | CJ Carey et.al. | 2304.07210 | :mortar_board: | Code |
2023-04-10 | Analysing Fairness of Privacy-Utility Mobility Models | Yuting Zhan et.al. | 2304.06469 | :mortar_board: | None |
2023-04-12 | TopTrack: Tracking Objects By Their Top | Jacob Meilleur et.al. | 2304.06114 | :mortar_board: | Code |
2023-04-12 | Learning Transferable Pedestrian Representation from Multimodal Information Supervision | Liping Bao et.al. | 2304.05554 | :mortar_board: | Code |
2023-04-11 | SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes | Yutao Cui et.al. | 2304.05170 | :mortar_board: | Code |
2023-04-10 | Multi-Object Tracking by Iteratively Associating Detections with Uniform Appearance for Trawl-Based Fishing Bycatch Monitoring | Cheng-Yen Yang et.al. | 2304.04816 | :mortar_board: | None |
2023-04-09 | Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification | Jiawei Feng et.al. | 2304.04205 | :mortar_board: | Code |
2023-04-07 | PSLT: A Light-weight Vision Transformer with Ladder Self-Attention and Progressive Shift | Gaojie Wu et.al. | 2304.03481 | :mortar_board: | None |
2023-04-04 | PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification | Minsu Kim et.al. | 2304.01537 | :mortar_board: | None |
2023-04-04 | Attention Map Guided Transformer Pruning for Edge Device | Junzhu Mao et.al. | 2304.01452 | :mortar_board: | Code |
2023-04-03 | A Scale-Invariant Trajectory Simplification Method for Efficient Data Collection in Videos | Yang Liu et.al. | 2304.01340 | :mortar_board: | Code |
2023-04-03 | Navigating to Objects Specified by Images | Jacob Krantz et.al. | 2304.01192 | :mortar_board: | None |
2023-03-31 | Adaptive Sparse Pairwise Loss for Object Re-Identification | Xiao Zhou et.al. | 2303.18247 | :mortar_board: | Code |
2023-03-27 | PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences | Zeyd Boukhers et.al. | 2303.18200 | :mortar_board: | None |
2023-03-30 | Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks | Weihua Chen et.al. | 2303.17602 | :mortar_board: | Code |
2023-03-30 | Streaming Video Model | Yucheng Zhao et.al. | 2303.17228 | :mortar_board: | Code |
2023-03-28 | Large-scale Training Data Search for Object Re-identification | Yue Yao et.al. | 2303.16186 | :mortar_board: | None |
2023-03-28 | Mask-Free Video Instance Segmentation | Lei Ke et.al. | 2303.15904 | :mortar_board: | Code |
2023-03-27 | Learnable Graph Matching: A Practical Paradigm for Data Association | Jiawei He et.al. | 2303.15414 | :mortar_board: | Code |
2023-03-27 | ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box | Yifu Zhang et.al. | 2303.15334 | :mortar_board: | None |
2023-03-26 | SDTracker: Synthetic Data Based Multi-Object Tracking | Yingda Guan et.al. | 2303.14653 | :mortar_board: | None |
2023-03-26 | MRCN: A Novel Modality Restitution and Compensation Network for Visible-Infrared Person Re-identification | Yukang Zhang et.al. | 2303.14626 | :mortar_board: | None |
2023-03-25 | Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification | Yukang Zhang et.al. | 2303.14481 | :mortar_board: | Code |
2023-03-25 | Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation | Sanbao Su et.al. | 2303.14346 | :mortar_board: | None |
2023-03-24 | A CNN-LSTM Architecture for Marine Vessel Track Association Using Automatic Identification System (AIS) Data | Md Asif Bin Syed et.al. | 2303.14068 | :mortar_board: | None |
2023-03-24 | Multimodal Adaptive Fusion of Face and Gait Features using Keyless attention based Deep Neural Networks for Human Identification | Ashwin Prakash et.al. | 2303.13814 | :mortar_board: | None |
2023-03-22 | Man vs the machine: The Struggle for Effective Text Anonymisation in the Age of Large Language Models | Constantinos Patsakis et.al. | 2303.12429 | :mortar_board: | None |
2023-03-21 | OmniTracker: Unifying Object Tracking by Tracking-with-Detection | Junke Wang et.al. | 2303.12079 | :mortar_board: | None |
2023-03-21 | CLIP-ReIdent: Contrastive Training for Player Re-Identification | Konrad Habel et.al. | 2303.11855 | :mortar_board: | None |
2023-03-21 | Deep Learning for Video-based Person Re-Identification: A Survey | Khawar Islam et.al. | 2303.11332 | :mortar_board: | None |
2023-03-20 | Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-Identification | Jiaer Xia et.al. | 2303.10976 | :mortar_board: | None |
2023-03-20 | Open-World Pose Transfer via Sequential Test-Time Adaption | Junyang Chen et.al. | 2303.10945 | :mortar_board: | None |
2023-03-18 | Report of the Medical Image De-Identification (MIDI) Task Group – Best Practices and Recommendations | David A. Clunie et.al. | 2303.10473 | :mortar_board: | None |
2023-03-18 | MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking | Zheng Qin et.al. | 2303.10404 | :mortar_board: | None |
2023-03-17 | GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates | Bingqi Shen et.al. | 2303.09800 | :mortar_board: | None |
2023-03-16 | Rt-Track: Robust Tricks for Multi-Pedestrian Tracking | Yukuan Zhang et.al. | 2303.09668 | :mortar_board: | None |
2023-03-15 | Mining False Positive Examples for Text-Based Person Re-identification | Wenhao Xu et.al. | 2303.08466 | :mortar_board: | Code |
2023-03-15 | Real-time Multi-Object Tracking Based on Bi-directional Matching | Huilan Luo et.al. | 2303.08444 | :mortar_board: | None |
2023-03-13 | MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID | Jianyang Gu et.al. | 2303.07065 | :mortar_board: | None |
2023-03-13 | TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification | Haocong Rao et.al. | 2303.06819 | :mortar_board: | Code |
2023-03-13 | Dynamic Clustering and Cluster Contrastive Learning for Unsupervised Person Re-identification | Ziqi He et.al. | 2303.06810 | :mortar_board: | None |
2023-03-11 | PRSNet: A Masked Self-Supervised Learning Pedestrian Re-Identification Method | Zhijie Xiao et.al. | 2303.06330 | :mortar_board: | Code |
2023-03-09 | A 2D MOT of dysprosium atoms as a compact source for efficient loading of a narrow-line 3D MOT | Shuwei Jin et.al. | 2303.05191 | :mortar_board: | None |
2023-03-06 | Memory Maps for Video Object Detection and Tracking on UAVs | Benjamin Kiefer et.al. | 2303.03508 | :mortar_board: | None |
2023-03-06 | Referring Multi-Object Tracking | Dongming Wu et.al. | 2303.03366 | :mortar_board: | Code |
2023-03-06 | Efficient Skill Acquisition for Complex Manipulation Tasks in Obstructed Environments | Jun Yamada et.al. | 2303.03365 | :mortar_board: | None |
2023-03-06 | UniHCP: A Unified Model for Human-Centric Perceptions | Yuanzheng Ci et.al. | 2303.02936 | :mortar_board: | None |
2023-03-03 | 3D Multi-Object Tracking Based on Uncertainty-Guided Data Association | Jiawei He et.al. | 2303.01786 | :mortar_board: | Code |
2023-03-03 | Feature Completion Transformer for Occluded Person Re-identification | Tao Wang et.al. | 2303.01656 | :mortar_board: | None |
2023-02-28 | DFR-FastMOT: Detection Failure Resistant Tracker for Fast Multi-Object Tracking Based on Sensor Fusion | Mohamed Nagy et.al. | 2302.14807 | :mortar_board: | Code |
2023-02-28 | Membership Inference Attack for Beluga Whales Discrimination | Voncarlos Marcelo Araújo et.al. | 2302.14769 | :mortar_board: | None |
2023-02-28 | Focus On Details: Online Multi-object Tracking with Diverse Fine-grained Representation | Hao Ren et.al. | 2302.14589 | :mortar_board: | None |
2023-02-28 | A Little Bit Attention Is All You Need for Person Re-Identification | Markus Eisenbach et.al. | 2302.14574 | :mortar_board: | None |
2023-02-28 | Mesh-SORT: Simple and effective of location-wise tracker | ZongTan Li et.al. | 2302.14415 | :mortar_board: | None |
2023-02-28 | DC-Former: Diverse and Compact Transformer for Person Re-Identification | Wen Li et.al. | 2302.14335 | :mortar_board: | Code |
2023-02-28 | Ultra-high vacuum pressure measurement using cold atoms | S. Supakar et.al. | 2302.14305 | :mortar_board: | None |
2023-02-25 | DeepBrainPrint: A Novel Contrastive Framework for Brain MRI Re-Identification | Lemuel Puglisi et.al. | 2302.13057 | :mortar_board: | None |
2023-02-23 | Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification | Gerard Maggiolino et.al. | 2302.11813 | :mortar_board: | Code |
2023-02-21 | BrackishMOT: The Brackish Multi-Object Tracking Dataset | Malte Pedersen et.al. | 2302.10645 | :mortar_board: | Code |
2023-02-20 | On the Stability and Generalization of Triplet Learning | Jun Chen et.al. | 2302.09815 | :mortar_board: | None |
2023-02-17 | A Review on Generative Adversarial Networks for Data Augmentation in Person Re-Identification Systems | Victor Uc-Cetina et.al. | 2302.09119 | :mortar_board: | None |
2023-02-17 | Self-Supervised Representation Learning from Temporal Ordering of Automated Driving Sequences | Christopher Lang et.al. | 2302.09043 | :mortar_board: | None |
2023-02-16 | Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning | Zhihao Qian et.al. | 2302.08212 | :mortar_board: | None |
2023-02-15 | DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes | Shenghao Hao et.al. | 2302.07676 | :mortar_board: | Code |
2023-02-11 | DaliID: Distortion-Adaptive Learned Invariance for Identification Models | Wes Robbins et.al. | 2302.05753 | :mortar_board: | None |
2023-02-11 | ConMAE: Contour Guided MAE for Unsupervised Vehicle Re-Identification | Jing Yang et.al. | 2302.05673 | :mortar_board: | None |
2023-02-10 | Tensor-to-scalar ratio forecasts for extended LiteBIRD frequency configurations | U. Fuskeland et.al. | 2302.05228 | :mortar_board: | None |
2023-02-09 | Deep Intra-Image Contrastive Learning for Weakly Supervised One-Step Person Search | Jiabei Wang et.al. | 2302.04607 | :mortar_board: | Code |
2023-02-07 | Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking | Ziqi Pang et.al. | 2302.03802 | :mortar_board: | Code |
2023-02-07 | Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot Interaction | Yangxiao Lu et.al. | 2302.03793 | :mortar_board: | None |
2023-02-05 | Spatio-Temporal Point Process for Multiple Object Tracking | Tao Wang et.al. | 2302.02444 | :mortar_board: | None |
2023-02-04 | X-ReID: Cross-Instance Transformer for Identity-Level Person Re-Identification | Leqi Shen et.al. | 2302.02075 | :mortar_board: | None |
2023-02-03 | Spectral Aware Softmax for Visible-Infrared Person Re-Identification | Lei Tan et.al. | 2302.01512 | :mortar_board: | None |
2023-02-02 | Exploring Invariant Representation for Visible-Infrared Person Re-Identification | Lei Tan et.al. | 2302.00884 | :mortar_board: | None |
2023-01-29 | Unsupervised Domain Adaptation on Person Re-Identification via Dual-level Asymmetric Mutual Learning | Qiong Wu et.al. | 2301.12439 | :mortar_board: | None |
2023-01-25 | An Efficient Semi-Automated Scheme for Infrastructure LiDAR Annotation | Aotian Wu et.al. | 2301.10732 | :mortar_board: | None |
2023-01-25 | Tracking Different Ant Species: An Unsupervised Domain Adaptation Framework and a Dataset for Multi-object Tracking | Chamath Abeysinghe et.al. | 2301.10559 | :mortar_board: | None |
2023-01-24 | A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data | Meenatchi Sundaram Muthu Selva Annamalai et.al. | 2301.10053 | :mortar_board: | None |
2023-01-23 | Illumination Variation Correction Using Image Synthesis For Unsupervised Domain Adaptive Person Re-Identification | Jiaqi Guo et.al. | 2301.09702 | :mortar_board: | None |
2023-01-23 | Triplet Contrastive Learning for Unsupervised Vehicle Re-identification | Fei Shen et.al. | 2301.09498 | :mortar_board: | Code |
2023-01-18 | Robust Knowledge Adaptation for Federated Unsupervised Person ReID | Jianfeng Weng et.al. | 2301.07320 | :mortar_board: | None |
2023-01-17 | Database Matching Under Noisy Synchronization Errors | Serhat Bakirtas et.al. | 2301.06796 | :mortar_board: | None |
2023-01-16 | Meta Generative Attack on Person Reidentification | A V Subramanyam et.al. | 2301.06286 | :mortar_board: | None |
2023-01-14 | Arcade Processes for Informed Martingale Interpolation and Transport | Georges Kassis et.al. | 2301.05936 | :mortar_board: | None |
2023-01-05 | Learning Feature Recovery Transformer for Occluded Person Re-identification | Boqiang Xu et.al. | 2301.01879 | :mortar_board: | Code |
2023-01-02 | Learning Invariance from Generated Variance for Unsupervised Person Re-identification | Hao Chen et.al. | 2301.00725 | :mortar_board: | Code |
2023-01-02 | A contrastive learning approach for individual re-identification in a wild fish population | Ørjan Langøy Olsen et.al. | 2301.00596 | :mortar_board: | None |
2023-01-02 | Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification | Ziyi Tang et.al. | 2301.00531 | :mortar_board: | None |
2022-12-31 | Tracking Passengers and Baggage Items using Multiple Overhead Cameras at Security Checkpoints | Abubakar Siddique et.al. | 2301.00190 | :mortar_board: | None |
2022-12-30 | Unsupervised 4D LiDAR Moving Object Segmentation in Stationary Settings with Multivariate Occupancy Time Series | Thomas Kreutz et.al. | 2212.14750 | :mortar_board: | Code |
2022-12-30 | Multisensor Multiobject Tracking With High-Dimensional Object States | Wenyu Zhang et.al. | 2212.14556 | :mortar_board: | None |
2022-12-30 | Estimating Latent Population Flows from Aggregated Data via Inversing Multi-Marginal Optimal Transport | Sikun Yang et.al. | 2212.14527 | :mortar_board: | None |
2022-12-28 | Joint Discriminative and Metric Embedding Learning for Person Re-Identification | Sinan Sabri et.al. | 2212.14107 | :mortar_board: | None |
2022-12-24 | DiP: Learning Discriminative Implicit Parts for Person Re-Identification | Dengjie Li et.al. | 2212.13906 | :mortar_board: | Code |
2022-12-25 | Human Health Indicator Prediction from Gait Video | Ziqing Li et.al. | 2212.12948 | :mortar_board: | None |
2022-12-25 | Understanding Ethics, Privacy, and Regulations in Smart Video Surveillance for Public Safety | Babak Rahimi Ardabili et.al. | 2212.12936 | :mortar_board: | None |
2022-12-23 | Mesh of Things (MoT) Network-Driven Anomaly Detection in Connected Objects | Rathinamala Vijay et.al. | 2212.12221 | :mortar_board: | None |
2022-12-22 | Spatio-Visual Fusion-Based Person Re-Identification for Overhead Fisheye Images | Mertcan Cokbas et.al. | 2212.11477 | :mortar_board: | None |
2022-12-21 | Photonic integrated beam delivery in a rubidium 3D magneto-optical trap | Andrei Isichenko et.al. | 2212.11417 | :mortar_board: | None |
2022-12-20 | Dain’s invariant for black hole initial data | Robert Sansom et.al. | 2212.10270 | :mortar_board: | None |
2022-12-20 | Tracking by Associating Clips | Sanghyun Woo et.al. | 2212.10149 | :mortar_board: | None |
2022-12-20 | On the Applicability of Synthetic Data for Re-Identification | Jérôme Rutinowski et.al. | 2212.10105 | :mortar_board: | Code |
2022-12-20 | Benchmarking person re-identification datasets and approaches for practical real-world implementations | Jose Huaman et.al. | 2212.09981 | :mortar_board: | Code |
2022-12-16 | Feature Disentanglement Learning with Switching and Aggregation for Video-based Person Re-Identification | Minjung Kim et.al. | 2212.09498 | :mortar_board: | None |
2022-12-17 | A Brief Survey on Person Recognition at a Distance | Chrisopher B. Nalty et.al. | 2212.08969 | :mortar_board: | None |
2022-12-16 | Nonequilibrium steady state in a large magneto-optical trap | Marius Gaudesius et.al. | 2212.08705 | :mortar_board: | None |
2022-12-16 | Detection-aware multi-object tracking evaluation | Juan C. SanMiguel et.al. | 2212.08536 | :mortar_board: | None |
2022-12-16 | Neural Enhanced Belief Propagation for Multiobject Tracking | Mingchao Liang et.al. | 2212.08340 | :mortar_board: | None |
2022-12-15 | Writer Retrieval and Writer Identification in Greek Papyri | Vincent Christlein et.al. | 2212.07664 | :mortar_board: | None |
2022-12-15 | Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration | Liqi Yan et.al. | 2212.07592 | :mortar_board: | None |
2022-12-14 | Blue-Detuned Magneto-Optical Trap of Molecules | Justin J. Burau et.al. | 2212.07472 | :mortar_board: | None |
2022-12-12 | CountingMOT: Joint Counting, Detection and Re-Identification for Multiple Object Tracking | Weihong Ren et.al. | 2212.05861 | :mortar_board: | None |
2022-12-11 | Mutimodal Ranking Optimization for Heterogeneous Face Re-identification | Hui Hu et.al. | 2212.05510 | :mortar_board: | None |
2022-12-09 | Occluded Person Re-Identification via Relational Adaptive Feature Correction Learning | Minjung Kim et.al. | 2212.04712 | :mortar_board: | None |
2022-12-08 | Steady-State Ultracold Plasma | B. B. Zelener et.al. | 2212.04389 | :mortar_board: | None |
2022-12-08 | Complete Solution for Vehicle Re-ID in Surround-view Camera System | Zizhang Wu et.al. | 2212.04126 | :mortar_board: | None |
2022-12-07 | Multiple Object Tracking Challenge Technical Report for Team MT_IoT | Feng Yan et.al. | 2212.03586 | :mortar_board: | None |
2022-12-06 | Sparse Message Passing Network with Feature Integration for Online Multiple Object Tracking | Bisheng Wang et.al. | 2212.02992 | :mortar_board: | None |
2022-12-05 | Generalizable Person Re-Identification via Viewpoint Alignment and Fusion | Bingliang Jiao et.al. | 2212.02398 | :mortar_board: | None |
2022-12-03 | Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation | En Yu et.al. | 2212.01568 | :mortar_board: | None |
2022-12-02 | CC-3DT: Panoramic 3D Object Tracking via Cross-Camera Fusion | Tobias Fischer et.al. | 2212.01247 | :mortar_board: | None |
2022-12-01 | Privacy-Preserving Data Synthetisation for Secure Information Sharing | Tânia Carvalho et.al. | 2212.00484 | :mortar_board: | None |
2022-12-01 | Learning Progressive Modality-shared Transformers for Effective Visible-Infrared Person Re-identification | Hu Lu et.al. | 2212.00226 | :mortar_board: | Code |
2022-11-30 | Neighbour Consistency Guided Pseudo-Label Refinement for Unsupervised Person Re-Identification | De Cheng et.al. | 2211.16847 | :mortar_board: | None |
2022-11-29 | Lifelong Person Re-Identification via Knowledge Refreshing and Consolidation | Chunlin Yu et.al. | 2211.16201 | :mortar_board: | Code |
2022-11-29 | Similarity Distribution based Membership Inference Attack on Person Re-identification | Junyao Gao et.al. | 2211.15918 | :mortar_board: | None |
2022-11-27 | Dynamic Feature Pruning and Consolidation for Occluded Person Re-Identification | Yuteng Ye et.al. | 2211.14742 | :mortar_board: | None |
2022-11-24 | Hard to Track Objects with Irregular Motions and Similar Appearances? Make It Easier by Buffering the Matching Space | Fan Yang et.al. | 2211.14317 | :mortar_board: | None |
2022-11-25 | CLIP-ReID: Exploiting Vision-Language Model for Image Re-Identification without Concrete Text Labels | Siyuan Li et.al. | 2211.13977 | :mortar_board: | Code |
2022-11-24 | ReFace: Improving Clothes-Changing Re-Identification With Face Features | Daniel Arkushin et.al. | 2211.13807 | :mortar_board: | Code |
2022-11-24 | Automated Driving Systems Data Acquisition and Processing Platform | Xin Xia et.al. | 2211.13425 | :mortar_board: | None |
2022-11-22 | Transformer Based Multi-Grained Features for Unsupervised Person Re-Identification | Jiachen Li et.al. | 2211.12280 | :mortar_board: | None |
2022-11-22 | Multimodal Data Augmentation for Visual-Infrared Person ReID with Corrupted Data | Arthur Josi et.al. | 2211.11925 | :mortar_board: | None |
2022-11-22 | Confidence-guided Centroids for Unsupervised Person Re-Identification | Yunqi Miao et.al. | 2211.11921 | :mortar_board: | None |
2022-11-21 | A Benchmark of Video-Based Clothes-Changing Person Re-Identification | Likai Wang et.al. | 2211.11165 | :mortar_board: | None |
2022-11-20 | A Unified Model for Tracking and Image-Video Detection Has More Power | Peirong Liu et.al. | 2211.11077 | :mortar_board: | None |
2022-11-20 | Invisible Backdoor Attack with Dynamic Triggers against Person Re-identification | Wenli Sun et.al. | 2211.10933 | :mortar_board: | None |
2022-11-18 | SeaTurtleID: A novel long-span dataset highlighting the importance of timestamps in wildlife re-identification | Kostas Papafitsoros et.al. | 2211.10307 | :mortar_board: | None |
2022-11-17 | MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors | Yuang Zhang et.al. | 2211.09791 | :mortar_board: | Code |
2022-11-17 | Machine learning opens a doorway for microrheology with optical tweezers in living systems | Matthew G. Smith et.al. | 2211.09689 | :mortar_board: | None |
2022-11-17 | Multi-Camera Multi-Object Tracking on the Move via Single-Stage Global Association Approach | Pha Nguyen et.al. | 2211.09663 | :mortar_board: | None |
2022-11-17 | Targeted Attention for Generalized- and Zero-Shot Learning | Abhijit Suprem et.al. | 2211.09322 | :mortar_board: | None |
2022-11-16 | Robust Online Video Instance Segmentation with Track Queries | Zitong Zhan et.al. | 2211.09108 | :mortar_board: | None |
2022-11-16 | SMILEtrack: SiMIlarity LEarning for Multiple Object Tracking | Yu-Hsiang Wang et.al. | 2211.08824 | :mortar_board: | None |
2022-11-15 | Using Auxiliary Information for Person Re-Identification – A Tutorial Overview | Tharindu Fernando et.al. | 2211.08565 | :mortar_board: | None |
2022-11-14 | SportsTrack: An Innovative Method for Tracking Athletes in Sports Scenes | Jie Wang et.al. | 2211.07173 | :mortar_board: | Code |
2022-11-13 | Learning from partially labeled data for multi-organ and tumor segmentation | Yutong Xie et.al. | 2211.06894 | :mortar_board: | None |
2022-11-12 | TAPAS: a Toolbox for Adversarial Privacy Auditing of Synthetic Data | Florimond Houssiau et.al. | 2211.06550 | :mortar_board: | Code |
2022-11-09 | Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer | Siddharth Sagar Nijhawan et.al. | 2211.05654 | :mortar_board: | None |
2022-11-10 | HSGNet: Object Re-identification with Hierarchical Similarity Graph Network | Fei Shen et.al. | 2211.05486 | :mortar_board: | None |
2022-11-09 | MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification | Daniel Davila et.al. | 2211.04656 | :mortar_board: | None |
2022-11-06 | Sequential Transformer for End-to-End Person Search | Long Chen et.al. | 2211.04323 | :mortar_board: | None |
2022-11-08 | ShaSTA: Modeling Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking | Tara Sadjadpour et.al. | 2211.03919 | :mortar_board: | None |
2022-11-07 | Body Part-Based Representation Learning for Occluded Person Re-Identification | Vladimir Somers et.al. | 2211.03679 | :mortar_board: | None |
2022-11-07 | Generalizable Re-Identification from Videos with Cycle Association | Zhongdao Wang et.al. | 2211.03663 | :mortar_board: | None |
2022-11-07 | Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID | Djebril Mekhazni et.al. | 2211.03626 | :mortar_board: | None |
2022-11-04 | Development and evaluation of automated localization and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking | David Rapado Rincon et.al. | 2211.02760 | :mortar_board: | None |
2022-10-30 | PhysioGait: Context-Aware Physiological Context Modeling for Person Re-identification Attack on Wearable Sensing | James O Sullivan et.al. | 2211.02622 | :mortar_board: | None |
2022-11-03 | Large Scale Real-World Multi-Person Tracking | Bing Shuai et.al. | 2211.02175 | :mortar_board: | None |
2022-11-03 | Privacy-preserving Deep Learning based Record Linkage | Thilina Ranbaduge et.al. | 2211.02161 | :mortar_board: | None |
2022-11-02 | Generation of Anonymous Chest Radiographs Using Latent Diffusion Models for Training Thoracic Abnormality Classification Systems | Kai Packhäuser et.al. | 2211.01323 | :mortar_board: | None |
2022-11-02 | Deep Multimodal Fusion for Generalizable Person Re-identification | Suncheng Xiang et.al. | 2211.00933 | :mortar_board: | Code |
2022-10-29 | SearchTrack: Multiple Object Tracking with Object-Customized Search and Motion-Aware Features | Zhong-Min Tsai et.al. | 2210.16572 | :mortar_board: | Code |
2022-10-26 | End-to-end Tracking with a Multi-query Transformer | Bruno Korbar et.al. | 2210.14601 | :mortar_board: | None |
2022-10-25 | Towards improved loading, cooling, and trapping of molecules in magneto-optical traps | Thomas K. Langin et.al. | 2210.14223 | :mortar_board: | None |
2022-10-25 | Jet-Loaded Cold Atomic Beam Source for Strontium | Minho Kwon et.al. | 2210.14186 | :mortar_board: | None |
2022-10-25 | Fast loading of a cold mixture of Sodium and Potassium atoms from compact and versatile cold atomic beam sources | Sagar Sutradhar et.al. | 2210.14084 | :mortar_board: | None |
2022-10-25 | Unsupervised domain-adaptive person re-identification with multi-camera constraints | S. Takeuchi et.al. | 2210.13999 | :mortar_board: | None |
2022-10-24 | Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations | Amit Galor et.al. | 2210.13570 | :mortar_board: | Code |
2022-10-23 | DMODE: Differential Monocular Object Distance Estimation Module without Class Specific Information | Pedram Agand et.al. | 2210.12596 | :mortar_board: | Code |
2022-10-23 | A dichotomous behavior of Guttman-Kaiser criterion from equi-correlated normal population | Yohji Akama et.al. | 2210.12580 | :mortar_board: | None |
2022-10-20 | End-to-End Context-Aided Unicity Matching for Person Re-identification | Min Cao et.al. | 2210.12008 | :mortar_board: | None |
2022-10-19 | RT-MOT: Confidence-Aware Real-Time Scheduling Framework for Multi-Object Tracking Tasks | Donghwa Kang et.al. | 2210.11946 | :mortar_board: | None |
2022-10-19 | RLM-Tracking: Online Multi-Pedestrian Tracking Supported by Relative Location Mapping | Kai Ren et.al. | 2210.10477 | :mortar_board: | None |
2022-10-19 | Domain generalization Person Re-identification on Attention-aware multi-operation strategery | Yingchun Guo et.al. | 2210.10409 | :mortar_board: | None |
2022-10-19 | CLIP-Driven Fine-grained Text-Image Person Re-identification | Shuanglin Yan et.al. | 2210.10276 | :mortar_board: | None |
2022-10-18 | Optical Two-dimensional Coherent Spectroscopy of Cold Atoms | Danfu Liang et.al. | 2210.10115 | :mortar_board: | None |
2022-10-18 | Risk of re-identification for shared clinical speech recordings | Daniela A. Wiepert et.al. | 2210.09975 | :mortar_board: | None |
2022-10-17 | Track Targets by Dense Spatio-Temporal Position Encoding | Jinkun Cao et.al. | 2210.09455 | :mortar_board: | None |
2022-10-17 | Joint Plasticity Learning for Camera Incremental Person Re-Identification | Zexian Yang et.al. | 2210.08710 | :mortar_board: | None |
2022-10-16 | AttTrack: Online Deep Attention Transfer for Multi-object Tracking | Keivan Nalaie et.al. | 2210.08648 | :mortar_board: | None |
2022-10-16 | Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices | Yimeng Zhang et.al. | 2210.08578 | :mortar_board: | None |
2022-10-14 | Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking? | Patrick Dendorfer et.al. | 2210.07681 | :mortar_board: | None |
2022-10-12 | QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking | Tobias Fischer et.al. | 2210.06984 | :mortar_board: | None |
2022-10-11 | Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification | Zi wang et.al. | 2210.05438 | :mortar_board: | None |
2022-10-11 | EnsembleMOT: A Step towards Ensemble Learning of Multiple Object Tracking | Yunhao Du et.al. | 2210.05278 | :mortar_board: | Code |
2022-10-07 | Specialized Re-Ranking: A Novel Retrieval-Verification Framework for Cloth Changing Person Re-Identification | Renjie Zhang et.al. | 2210.03592 | :mortar_board: | None |
2022-10-07 | PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person Search | Mustansar Fiaz et.al. | 2210.03433 | :mortar_board: | Code |
2022-10-07 | Multiple Object Tracking from appearance by hierarchically clustering tracklets | Andreu Girbau et.al. | 2210.03355 | :mortar_board: | Code |
2022-10-07 | Dual Clustering Co-teaching with Consistent Sample Mining for Unsupervised Person Re-Identification | Zeqi Chen et.al. | 2210.03339 | :mortar_board: | None |
2022-10-05 | SoccerNet 2022 Challenges Results | Silvio Giancola et.al. | 2210.02365 | :mortar_board: | None |
2022-10-05 | MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth estimation | Hanwei Zhang et.al. | 2210.02038 | :mortar_board: | None |
2022-10-04 | Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification | Kai Wang et.al. | 2210.01600 | :mortar_board: | Code |
2022-10-04 | How Image Generation Helps Visible-to-Infrared Person Re-Identification? | Honghu Pan et.al. | 2210.01585 | :mortar_board: | None |
2022-10-04 | FRIDA: Fisheye Re-Identification Dataset with Annotations | Mertcan Cokbas et.al. | 2210.01582 | :mortar_board: | None |
2022-10-03 | Interpretable Deep Tracking | Benjamin Thérien et.al. | 2210.01266 | :mortar_board: | None |
2022-09-30 | Robust Person Identification: A WiFi Vision-based Approach | Yili Ren et.al. | 2210.00127 | :mortar_board: | None |
2022-09-30 | Transformers for Object Detection in Large Point Clouds | Felicia Ruppel et.al. | 2209.15258 | :mortar_board: | None |
2022-09-30 | Physical Adversarial Attack meets Computer Vision: A Decade Survey | Hui Wei et.al. | 2209.15179 | :mortar_board: | Code |
2022-09-29 | DirectTracker: 3D Multi-Object Tracking Using Direct Image Alignment and Photometric Bundle Adjustment | Mariia Gladkova et.al. | 2209.14965 | :mortar_board: | None |
2022-09-27 | Observation Centric and Central Distance Recovery on Sports Player Tracking | Hsiang-Wei Huang et.al. | 2209.13154 | :mortar_board: | None |
2022-09-25 | D: Duplicate Detection Decontaminator for Multi-Athlete Tracking in Sports Videos | Rui He et.al. | 2209.12248 | :mortar_board: | Code |
2022-09-25 | BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video | Ali Athar et.al. | 2209.12118 | :mortar_board: | Code |
2022-09-24 | Super-resolution atomic microscopy using orbit angular momentum laser with temporal modulation | Yuan Liu et.al. | 2209.11917 | :mortar_board: | None |
2022-09-23 | Multi-Granularity Graph Pooling for Video-based Person Re-Identification | Honghu Pan et.al. | 2209.11584 | :mortar_board: | None |
2022-09-23 | Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network | Honghu Pan et.al. | 2209.11582 | :mortar_board: | None |
2022-09-23 | Deep Learning-based Anonymization of Chest Radiographs: A Utility-preserving Measure for Patient Privacy | Kai Packhäuser et.al. | 2209.11531 | :mortar_board: | None |
2022-09-23 | Grouped Adaptive Loss Weighting for Person Search | Yanling Tian et.al. | 2209.11492 | :mortar_board: | None |
2022-09-23 | Towards Frame Rate Agnostic Multi-Object Tracking | Weitao Feng et.al. | 2209.11404 | :mortar_board: | Code |
2022-09-23 | Horizon area bound and MOTS stability in locally rotationally symmetric solutions | Abbas M. Sherif et.al. | 2209.11358 | :mortar_board: | None |
2022-09-20 | Sampling Agnostic Feature Representation for Long-Term Person Re-identification | Seongyeop Yang et.al. | 2209.09574 | :mortar_board: | Code |
2022-09-19 | Visible-Infrared Person Re-Identification Using Privileged Intermediate Information | Mahdi Alehdaghi et.al. | 2209.09348 | :mortar_board: | Code |
2022-09-19 | Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification | Syeda Nyma Ferdous et.al. | 2209.08686 | :mortar_board: | None |
2022-09-18 | RVSL: Robust Vehicle Similarity Learning in Real Hazy Scenes Based on Semi-supervised Learning | Wei-Ting Chen et.al. | 2209.08630 | :mortar_board: | Code |
2022-09-18 | Bi-color atomic beam slower and magnetic field compensation for ultracold gases | Jianing Li et.al. | 2209.08479 | :mortar_board: | None |
2022-09-14 | TrADe Re-ID – Live Person Re-Identification using Tracking and Anomaly Detection | Luigy Machaca et.al. | 2209.06452 | :mortar_board: | None |
2022-09-12 | Style Variable and Irrelevant Learning for Generalizable Person Re-identification | Haobo Chen et.al. | 2209.05235 | :mortar_board: | Code |
2022-09-12 | Is Synthetic Dataset Reliable for Benchmarking Generalizable Person Re-Identification? | Cuicui Kang et.al. | 2209.05047 | :mortar_board: | None |
2022-09-11 | Local-Aware Global Attention Network for Person Re-Identification | Nathanael L. Baisa et.al. | 2209.04821 | :mortar_board: | None |
2022-09-11 | Multiple Object Tracking in Recent Times: A Literature Review | Mk Bashar et.al. | 2209.04796 | :mortar_board: | None |
2022-09-08 | PixTrack: Precise 6DoF Object Pose Tracking using NeRF Templates and Feature-metric Alignment | Prajwal Chidananda et.al. | 2209.03910 | :mortar_board: | None |
2022-09-06 | CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object Tracking with Camera-LiDAR Fusion | Li Wang et.al. | 2209.02540 | :mortar_board: | None |
2022-09-04 | On the Risks of Collecting Multidimensional Data Under Local Differential Privacy | Héber H. Arcolezi et.al. | 2209.01684 | :mortar_board: | Code |
2022-09-01 | Which anonymization technique is best for which NLP task? – It depends. A Systematic Study on Clinical Text Processing | Iyadh Ben Cheikh Larbi et.al. | 2209.00262 | :mortar_board: | None |
2022-08-30 | The Athena X-ray Integral Field Unit: a consolidated design for the system requirement review of the preliminary definition phase | Didier Barret et.al. | 2208.14562 | :mortar_board: | None |
2022-08-30 | Synthehicle: Multi-Vehicle Multi-Camera Tracking in Virtual Cities | Fabian Herzog et.al. | 2208.14167 | :mortar_board: | Code |
2022-08-27 | Actor-identified Spatiotemporal Action Detection – Detecting Who Is Doing What in Videos | Fan Yang et.al. | 2208.12940 | :mortar_board: | Code |
2022-08-25 | Identity-Sensitive Knowledge Propagation for Cloth-Changing Person Re-identification | Jianbing Wu et.al. | 2208.12023 | :mortar_board: | Code |
2022-08-25 | Skeleton Prototype Contrastive Learning with Multi-Level Graph Relation Modeling for Unsupervised Person Re-Identification | Haocong Rao et.al. | 2208.11814 | :mortar_board: | Code |
2022-08-24 | Dynamic Template Initialization for Part-Aware Person Re-ID | Kalana Abeywardena et.al. | 2208.11440 | :mortar_board: | None |
2022-08-23 | Quality Matters: Embracing Quality Clues for Robust 3D Multi-Object Tracking | Jinrong Yang et.al. | 2208.10976 | :mortar_board: | None |
2022-08-22 | Information-Theoretic Equivalence of Entropic Multi-Marginal Optimal Transport: A Theory for Multi-Agent Communication | Shuchan Wang et.al. | 2208.10256 | :mortar_board: | None |
2022-08-22 | Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object Detection and Tracking | JunYoung Gwak et.al. | 2208.10056 | :mortar_board: | None |
2022-08-21 | CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification | Qiong Wu et.al. | 2208.09844 | :mortar_board: | None |
2022-08-19 | Synthetic Data in Human Analysis: A Survey | Indu Joshi et.al. | 2208.09191 | :mortar_board: | None |
2022-08-18 | Domain Camera Adaptation and Collaborative Multiple Feature Clustering for Unsupervised Person Re-ID | Yuanpeng Tu et.al. | 2208.08624 | :mortar_board: | None |
2022-08-17 | DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations | Gabriel Van Zandycke et.al. | 2208.08190 | :mortar_board: | Code |
2022-08-17 | InterTrack: Interaction Transformer for 3D Multi-Object Tracking | John Willes et.al. | 2208.08041 | :mortar_board: | None |
2022-08-13 | Enhanced Vehicle Re-identification for ITS: A Feature Fusion approach using Deep Learning | Ashutosh Holla B et.al. | 2208.06579 | :mortar_board: | None |
2022-08-09 | Privacy-Aware Adversarial Network in Human Mobility Prediction | Yuting Zhan et.al. | 2208.05009 | :mortar_board: | None |
2022-08-08 | Occlusion-Aware Instance Segmentation via BiLayer Network Architectures | Lei Ke et.al. | 2208.04438 | :mortar_board: | Code |
2022-08-07 | Robust Multi-Object Tracking by Marginal Inference | Yifu Zhang et.al. | 2208.03727 | :mortar_board: | None |
2022-08-06 | Transformer-based assignment decision network for multiple object tracking | Athena Psalta et.al. | 2208.03571 | :mortar_board: | Code |
2022-08-05 | Accelerating the Sinkhorn algorithm for sparse multi-marginal optimal transport by fast Fourier transforms | Fatima Antarou Ba et.al. | 2208.03120 | :mortar_board: | Code |
2022-08-04 | SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset | Fatih Emre Simsek et.al. | 2208.02580 | :mortar_board: | None |
2022-08-04 | Learning Modal-Invariant and Temporal-Memory for Video-based Visible-Infrared Person Re-Identification | Xinyu Lin et.al. | 2208.02450 | :mortar_board: | Code |
2022-08-03 | PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking? | Aleksandr Kim et.al. | 2208.01957 | :mortar_board: | None |
2022-08-01 | Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification | Xulin Li et.al. | 2208.00967 | :mortar_board: | None |
2022-08-01 | Multi-spectral Vehicle Re-identification with Cross-directional Consistency Network and a High-quality Benchmark | Aihua Zheng et.al. | 2208.00632 | :mortar_board: | None |
2022-07-30 | Towards Privacy-Preserving, Real-Time and Lossless Feature Matching | Qiang Meng et.al. | 2208.00214 | :mortar_board: | None |
2022-07-29 | A Transfer Learning-Based Approach to Marine Vessel Re-Identification | Guangmiao Zeng et.al. | 2207.14500 | :mortar_board: | None |
2022-07-29 | Significant changes in EEG neural oscillations during different phases of three-dimensional multiple object tracking task (3D-MOT) imply different roles for attention and working memory | Yannick Roy et.al. | 2207.14470 | :mortar_board: | None |
2022-07-29 | Deep Learning-based Occluded Person Re-identification: A Survey | Yunjie Peng et.al. | 2207.14452 | :mortar_board: | None |
2022-07-28 | The One Where They Reconstructed 3D Humans and Environments in TV Shows | Georgios Pavlakos et.al. | 2207.14279 | :mortar_board: | None |
2022-07-28 | Video Mask Transfiner for High-Quality Video Instance Segmentation | Lei Ke et.al. | 2207.14012 | :mortar_board: | None |
2022-07-27 | Look Closer to Your Enemy: Learning to Attack via Teacher-student Mimicking | Mingejie Wang et.al. | 2207.13381 | :mortar_board: | None |
2022-07-27 | Portrait Interpretation and a Benchmark | Yixuan Fan et.al. | 2207.13315 | :mortar_board: | None |
2022-07-26 | Tracking Every Thing in the Wild | Siyuan Li et.al. | 2207.12978 | :mortar_board: | None |
2022-07-26 | TransFiner: A Full-Scale Refinement Approach for Multiple Object Tracking | Bin Sun et.al. | 2207.12967 | :mortar_board: | None |
2022-07-25 | Video object tracking based on YOLOv7 and DeepSORT | Feng Yang et.al. | 2207.12202 | :mortar_board: | None |
2022-07-25 | Domain Adaptive Person Search | Junjie Li et.al. | 2207.11898 | :mortar_board: | Code |
2022-07-24 | Spatial-Temporal Federated Learning for Lifelong Person Re-identification on Distributed Edges | Lei Zhang et.al. | 2207.11759 | :mortar_board: | Code |
2022-07-24 | Learnable Privacy-Preserving Anonymization for Pedestrian Images | Junwu Zhang et.al. | 2207.11677 | :mortar_board: | Code |
2022-07-22 | PieTrack: An MOT solution based on synthetic data training and self-supervised domain adaptation | Yirui Wang et.al. | 2207.11325 | :mortar_board: | None |
2022-07-21 | UFO: Unified Feature Optimization | Teng Xi et.al. | 2207.10341 | :mortar_board: | None |
2022-07-21 | OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search | Sanghoon Lee et.al. | 2207.10320 | :mortar_board: | None |
2022-07-20 | MOTCOM: The Multi-Object Tracking Dataset Complexity Metric | Malte Pedersen et.al. | 2207.10031 | :mortar_board: | None |
2022-07-20 | Negative Samples are at Large: Leveraging Hard-distance Elastic Loss for Re-identification | Hyungtae Lee et.al. | 2207.09884 | :mortar_board: | None |
2022-07-19 | The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting | Justin Kay et.al. | 2207.09295 | :mortar_board: | Code |
2022-07-19 | Dynamic Prototype Mask for Occluded Person Re-Identification | Lei Tan et.al. | 2207.09046 | :mortar_board: | Code |
2022-07-18 | A Certifiable Security Patch for Object Tracking in Self-Driving Systems via Historical Deviation Modeling | Xudong Pan et.al. | 2207.08556 | :mortar_board: | None |
2022-07-18 | A Semantic-aware Attention and Visual Shielding Network for Cloth-changing Person Re-identification | Zan Gao et.al. | 2207.08387 | :mortar_board: | None |
2022-07-16 | Cross Vision-RF Gait Re-identification with Low-cost RGB-D Cameras and mmWave Radars | Dongjiang Cao et.al. | 2207.07896 | :mortar_board: | None |
2022-07-16 | Learning Granularity-Unified Representations for Text-to-Image Person Re-identification | Zhiyin Shao et.al. | 2207.07802 | :mortar_board: | None |
2022-07-15 | Multi-Object Tracking and Segmentation via Neural Message Passing | Guillem Braso et.al. | 2207.07454 | :mortar_board: | Code |
2022-07-15 | Towards Privacy-Preserving Person Re-identification via Person Identify Shift | Shuguang Dou et.al. | 2207.07311 | :mortar_board: | None |
2022-07-14 | Towards Grand Unification of Object Tracking | Bin Yan et.al. | 2207.07078 | :mortar_board: | Code |
2022-07-13 | Rapid Person Re-Identification via Sub-space Consistency Regularization | Qingze Yin et.al. | 2207.05933 | :mortar_board: | None |
2022-07-12 | SpOT: Spatiotemporal Modeling for 3D Object Tracking | Colton Stearns et.al. | 2207.05856 | :mortar_board: | None |
2022-07-12 | Dynamic Gradient Reactivation for Backward Compatible Person Re-identification | Xiao Pan et.al. | 2207.05658 | :mortar_board: | None |
2022-07-12 | Tracking Objects as Pixel-wise Distributions | Zelin Zhao et.al. | 2207.05518 | :mortar_board: | Code |
2022-07-10 | Depth Perspective-aware Multiple Object Tracking | Kha Gia Quach et.al. | 2207.04551 | :mortar_board: | None |
2022-07-08 | TGRMPT: A Head-Shoulder Aided Multi-Person Tracker and a New Large-Scale Dataset for Tour-Guide Robot | Wen Wang et.al. | 2207.03726 | :mortar_board: | Code |
2022-07-08 | Frequency-based Randomization for Guaranteeing Differential Privacy in Spatial Trajectories | Fengmei Jin et.al. | 2207.03722 | :mortar_board: | None |
2022-07-07 | Privacy-Preserving Synthetic Educational Data Generation | Jill-Jênn Vie et.al. | 2207.03202 | :mortar_board: | Code |
2022-07-07 | Style Interleaved Learning for Generalizable Person Re-identification | Wentao Tan et.al. | 2207.03132 | :mortar_board: | None |
2022-07-06 | Context Sensing Attention Network for Video-based Person Re-identification | Kan Wang et.al. | 2207.02631 | :mortar_board: | None |
2022-07-06 | Unsupervised Learning for Human Sensing Using Radio Signals | Tianhong Li et.al. | 2207.02370 | :mortar_board: | None |
2022-07-05 | Video-based Surgical Skills Assessment using Long term Tool Tracking | Mona Fathollahi et.al. | 2207.02247 | :mortar_board: | None |
2022-07-04 | Adversarial Pairwise Reverse Attention for Camera Performance Imbalance in Person Re-identification: New Dataset and Metrics | Eugene P. W. Ang et.al. | 2207.01204 | :mortar_board: | None |
2022-06-29 | BoT-SORT: Robust Associations Multi-Pedestrian Tracking | Nir Aharon et.al. | 2206.14651 | :mortar_board: | Code |
2022-06-29 | SRCN3D: Sparse R-CNN 3D Surround-View Camera Object Detection and Tracking for Autonomous Driving | Yining Shi et.al. | 2206.14451 | :mortar_board: | Code |
2022-06-28 | 3D Multi-Object Tracking with Differentiable Pose Estimation | Dominik Schmauser et.al. | 2206.13785 | :mortar_board: | None |
2022-06-27 | A compact setup for loading magneto-optical trap in ultrahigh vacuum environment | Kavish Bharadwaj et.al. | 2206.13271 | :mortar_board: | None |
2022-06-23 | Learning Towards the Largest Margins | Xiong Zhou et.al. | 2206.11589 | :mortar_board: | None |
2022-06-21 | GNN-PMB: A Simple but Effective Online 3D Multi-Object Tracker without Bells and Whistles | Jianan Liu et.al. | 2206.10255 | :mortar_board: | Code |
2022-06-19 | mvHOTA: A multi-view higher order tracking accuracy metric to measure spatial and temporal associations in multi-point detection | Lalith Sharan et.al. | 2206.09372 | :mortar_board: | None |
2022-06-19 | Towards Generalizable Person Re-identification with a Bi-stream Generative Model | Xin Xu et.al. | 2206.09362 | :mortar_board: | None |
2022-06-14 | Plug-and-Play Pseudo Label Correction Network for Unsupervised Person Re-identification | Tianyi Yan et.al. | 2206.06607 | :mortar_board: | None |
2022-06-13 | A novel reconstruction attack on foreign-trade official statistics, with a Brazilian case study | Danilo Fabrino Favato et.al. | 2206.06493 | :mortar_board: | None |
2022-06-10 | An Image Processing Pipeline for Camera Trap Time-Lapse Recordings | Michael L. Hilton et.al. | 2206.05159 | :mortar_board: | Code |
2022-06-09 | Simple Cues Lead to a Strong Multi-Object Tracker | Jenny Seidenschwarz et.al. | 2206.04656 | :mortar_board: | None |
2022-06-09 | Cross-modal Local Shortest Path and Global Enhancement for Visible-Thermal Person Re-Identification | Xiaohong Wang et.al. | 2206.04401 | :mortar_board: | None |
2022-06-08 | Depth Estimation Matters Most: Improving Per-Object Depth Estimation for Monocular 3D Detection and Tracking | Longlong Jing et.al. | 2206.03666 | :mortar_board: | None |
2022-06-06 | NORPPA: NOvel Ringed seal re-identification by Pelage Pattern Aggregation | Ekaterina Nepovinnykh et.al. | 2206.02498 | :mortar_board: | Code |
2022-06-06 | Sports Re-ID: Improving Re-Identification Of Players In Broadcast Videos Of Team Sports | Bharath Comandur et.al. | 2206.02373 | :mortar_board: | None |
2022-06-05 | Towards Individual Grevy’s Zebra Identification via Deep 3D Fitting and Metric Learning | Maria Stennett et.al. | 2206.02261 | :mortar_board: | None |
2022-06-05 | SealID: Saimaa ringed seal re-identification dataset | Ekaterina Nepovinnykh et.al. | 2206.02260 | :mortar_board: | None |
VIT
Publish Date | Title | Authors | arxiv | Code | |
---|---|---|---|---|---|
2023-10-20 | Feature Selection and Hyperparameter Fine-tuning in Artificial Neural Networks for Wood Quality Classification | Mateus Roder et.al. | 2310.13490 | :mortar_board: | None |
2023-10-12 | UniPose: Detecting Any Keypoints | Jie Yang et.al. | 2310.08530 | :mortar_board: | Code |
2023-10-10 | l-dyno: framework to learn consistent visual features using robot’s motion | Kartikeya Singh et.al. | 2310.06249 | :mortar_board: | None |
2023-10-08 | Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face | Hao Zhang et.al. | 2310.05056 | :mortar_board: | None |
2023-10-02 | H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation | Yanjie Ze et.al. | 2310.01404 | :mortar_board: | Code |
2023-10-01 | Self-supervised Learning of Contextualized Local Visual Embeddings | Thalles Santos Silva et.al. | 2310.00527 | :mortar_board: | Code |
2023-09-26 | ObVi-SLAM: Long-Term Object-Visual SLAM | Amanda Adkins et.al. | 2309.15268 | :mortar_board: | Code |
2023-09-19 | LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation | Haizhou Zhang et.al. | 2309.10436 | :mortar_board: | Code |
2023-09-18 | RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy | Mert Asim Karaoglu et.al. | 2309.09563 | :mortar_board: | None |
2023-09-17 | CryoAlign: feature-based method for global and local 3D alignment of EM density maps | Bintao He et.al. | 2309.09217 | :mortar_board: | None |
2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471 | :mortar_board: | Code |
2023-09-09 | Mirror-Aware Neural Humans | Daniel Ajisafe et.al. | 2309.04750 | :mortar_board: | None |
2023-09-07 | InstructDiffusion: A Generalist Modeling Interface for Vision Tasks | Zigang Geng et.al. | 2309.03895 | :mortar_board: | None |
2023-09-04 | SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras | Himanshu Pahadia et.al. | 2309.01324 | :mortar_board: | None |
2023-09-01 | Improving the matching of deformable objects by learning to detect keypoints | Felipe Cadar et.al. | 2309.00434 | :mortar_board: | Code |
2023-08-31 | SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation | Jiaben Chen et.al. | 2308.16876 | :mortar_board: | None |
2023-08-30 | Learning Structure-from-Motion with Graph Attention Networks | Lucas Brynte et.al. | 2308.15984 | :mortar_board: | None |
2023-08-29 | A lightweight 3D dense facial landmark estimation model from position map data | Shubhajit Basak et.al. | 2308.15170 | :mortar_board: | None |
2023-08-27 | Automatic coarse co-registration of point clouds from diverse scan geometries: a test of detectors and descriptors | Francesco Pirotti et.al. | 2308.14047 | :mortar_board: | None |
2023-08-24 | VNI-Net: Vector Neurons-based Rotation-Invariant Descriptor for LiDAR Place Recognition | Gengxuan Tian et.al. | 2308.12870 | :mortar_board: | None |
2023-08-22 | LDP-Feat: Image Features with Local Differential Privacy | Francesco Pittaluga et.al. | 2308.11223 | :mortar_board: | None |
2023-08-20 | Neural Interactive Keypoint Detection | Jie Yang et.al. | 2308.10174 | :mortar_board: | Code |
2023-08-19 | ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment | Bingyang Zhou et.al. | 2308.09987 | :mortar_board: | None |
2023-08-16 | DeDoDe: Detect, Don’t Describe – Describe, Don’t Detect for Local Feature Matching | Johan Edstedt et.al. | 2308.08479 | :mortar_board: | Code |
2023-08-15 | CoDeF: Content Deformation Fields for Temporally Consistent Video Processing | Hao Ouyang et.al. | 2308.07926 | :mortar_board: | Code |
2023-08-15 | ChartDETR: A Multi-shape Detection Network for Visual Chart Recognition | Wenyuan Xue et.al. | 2308.07743 | :mortar_board: | None |
2023-08-14 | DELO: Deep Evidential LiDAR Odometry using Partial Optimal Transport | Sk Aziz Ali et.al. | 2308.07153 | :mortar_board: | None |
2023-08-10 | 2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds | Minhao Li et.al. | 2308.05667 | :mortar_board: | None |
2023-07-29 | Automated Hit-frame Detection for Badminton Match Analysis | Yu-Hang Chien et.al. | 2307.16000 | :mortar_board: | Code |
2023-07-25 | Mini-PointNetPlus: a local feature descriptor in deep learning model for 3d environment perception | Chuanyu Luo et.al. | 2307.13300 | :mortar_board: | None |
2023-07-20 | Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data | Sahar Almahfouz Nasser et.al. | 2307.10698 | :mortar_board: | Code |
2023-07-19 | SAMConvex: Fast Discrete Optimization for CT Registration using Self-supervised Anatomical Embedding and Correlation Pyramid | Zi Li et.al. | 2307.09727 | :mortar_board: | None |
2023-07-01 | SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation | Fabian Duffhauss et.al. | 2307.00306 | :mortar_board: | Code |
2023-06-27 | Detector-Free Structure from Motion | Xingyi He et.al. | 2306.15669 | :mortar_board: | Code |
2023-06-26 | CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild | Li Ding et.al. | 2306.15073 | :mortar_board: | None |
2023-06-12 | Topology Repairing of Disconnected Pulmonary Airways and Vessels: Baselines and a Dataset | Ziqiao Weng et.al. | 2306.07089 | :mortar_board: | Code |
2023-06-07 | Learning Probabilistic Coordinate Fields for Robust Correspondences | Weiyue Zhao et.al. | 2306.04231 | :mortar_board: | None |
2023-06-03 | LDEB – Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues | Amitabha Dey et.al. | 2306.02193 | :mortar_board: | None |
2023-06-02 | Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images | Marcela Mera-Trujillo et.al. | 2306.01938 | :mortar_board: | None |
2023-06-01 | A Probabilistic Relaxation of the Two-Stage Object Pose Estimation Paradigm | Onur Beker et.al. | 2306.00892 | :mortar_board: | None |
2023-05-30 | Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection | Supeng Wang et.al. | 2305.18714 | :mortar_board: | Code |
2023-05-23 | Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence | Grace Luo et.al. | 2305.14334 | :mortar_board: | None |
2023-05-15 | Non-Separable Multi-Dimensional Network Flows for Visual Computing | Viktoria Ehm et.al. | 2305.08628 | :mortar_board: | None |
2023-05-13 | Illumination-insensitive Binary Descriptor for Visual Measurement Based on Local Inter-patch Invariance | Xinyu Lin et.al. | 2305.07943 | :mortar_board: | Code |
2023-05-05 | HD2Reg: Hierarchical Descriptors and Detectors for Point Cloud Registration | Canhui Tang et.al. | 2305.03487 | :mortar_board: | Code |
2023-04-17 | Human Pose Estimation in Monocular Omnidirectional Top-View Images | Jingrui Yu et.al. | 2304.08186 | :mortar_board: | None |
2023-04-14 | CoPR: Towards Accurate Visual Localization With Continuous Place-descriptor Regression | Mubariz Zaffar et.al. | 2304.07426 | :mortar_board: | None |
2023-04-12 | SiLK – Simple Learned Keypoints | Pierre Gleize et.al. | 2304.06194 | :mortar_board: | Code |
2023-04-06 | From Saliency to DINO: Saliency-guided Vision Transformer for Few-shot Keypoint Detection | Changsheng Lu et.al. | 2304.03140 | :mortar_board: | None |
2023-03-29 | NerVE: Neural Volumetric Edges for Parametric Curve Extraction from Point Cloud | Xiangyu Zhu et.al. | 2303.16465 | :mortar_board: | None |
2023-03-24 | PanoVPR: Towards Unified Perspective-to-Equirectangular Visual Place Recognition via Sliding Windows across the Panoramic View | Ze Shi et.al. | 2303.14095 | :mortar_board: | Code |
2023-03-23 | Semantic Image Attack for Visual Model Diagnosis | Jinqi Luo et.al. | 2303.13010 | :mortar_board: | None |
2023-03-22 | Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation | Heng Yang et.al. | 2303.12246 | :mortar_board: | None |
2023-03-19 | RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network | Sangmin Yoo et.al. | 2303.10770 | :mortar_board: | None |
2023-03-17 | ShaRPy: Shape Reconstruction and Hand Pose Estimation from RGB-D with Uncertainty | Vanessa Wirth et.al. | 2303.10042 | :mortar_board: | None |
2023-03-15 | Descriptor Distillation for Efficient Multi-Robot SLAM | Xiyue Guo et.al. | 2303.08420 | :mortar_board: | None |
2023-03-15 | From Local Binary Patterns to Pixel Difference Networks for Efficient Visual Representation Learning | Zhuo Su et.al. | 2303.08414 | :mortar_board: | None |
2023-03-09 | KGNv2: Separating Scale and Pose Prediction for Keypoint-based 6-DoF Grasp Synthesis on RGB-D input | Yiye Chen et.al. | 2303.05617 | :mortar_board: | None |
2023-03-07 | External Camera-based Mobile Robot Pose Estimation for Collaborative Perception with Smart Edge Sensors | Simon Bultmann et.al. | 2303.03797 | :mortar_board: | None |
2023-02-26 | PaRK-Detect: Towards Efficient Multi-Task Satellite Imagery Road Extraction via Patch-Wise Keypoints Detection | Shenwei Xie et.al. | 2302.13263 | :mortar_board: | None |
2023-02-24 | Hybrid machine-learned homogenization: Bayesian data mining and convolutional neural networks | Julian Lißner et.al. | 2302.12545 | :mortar_board: | None |
2023-02-21 | Deep Reinforcement Learning Based on Local GNN for Goal-conditioned Deformable Object Rearranging | Yuhong Deng et.al. | 2302.10446 | :mortar_board: | None |
2023-02-12 | A Correct-and-Certify Approach to Self-Supervise Object Pose Estimators via Ensemble Self-Training | Jingnan Shi et.al. | 2302.06019 | :mortar_board: | None |
2023-02-11 | Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing | Zitong Yu et.al. | 2302.05744 | :mortar_board: | None |
2023-02-09 | MAPS: A Noise-Robust Progressive Learning Approach for Source-Free Domain Adaptive Keypoint Detection | Yuhe Ding et.al. | 2302.04589 | :mortar_board: | None |
2023-02-03 | Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation | Jie Yang et.al. | 2302.01593 | :mortar_board: | Code |
2023-02-03 | Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization | Yingying Zhu et.al. | 2302.01572 | :mortar_board: | Code |
2023-01-21 | Vision Aided Environment Semantics Extraction and Its Application in mmWave Beam Selection | Feiyang Wen et.al. | 2301.08973 | :mortar_board: | None |
2023-01-18 | OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models | Xingyi He et.al. | 2301.07673 | :mortar_board: | None |
2023-01-12 | Towards High Performance One-Stage Human Pose Estimation | Ling Li et.al. | 2301.04842 | :mortar_board: | None |
2022-12-31 | Rethinking Rotation Invariance with Point Cloud Registration | Jianhui Yu et.al. | 2301.00149 | :mortar_board: | None |
2022-12-29 | Fruit Ripeness Classification: a Survey | Matteo Rizzo et.al. | 2212.14441 | :mortar_board: | None |
2022-12-28 | NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action | Kuan-Chieh Wang et.al. | 2212.13660 | :mortar_board: | None |
2022-12-24 | HandsOff: Labeled Dataset Generation With No Additional Human Annotations | Austin Xu et.al. | 2212.12645 | :mortar_board: | None |
2022-12-13 | Learning to Detect Good Keypoints to Match Non-Rigid Objects in RGB Images | Welerson Melo et.al. | 2212.09589 | :mortar_board: | None |
2022-12-15 | Learning Markerless Robot-Depth Camera Calibration and End-Effector Pose Estimation | Bugra C. Sefercik et.al. | 2212.07567 | :mortar_board: | None |
2022-12-08 | DDM-NET: End-to-end learning of keypoint feature Detection, Description and Matching for 3D localization | Xiangyu Xu et.al. | 2212.04575 | :mortar_board: | None |
2022-12-07 | ViTPose+: Vision Transformer Foundation Model for Generic Body Pose Estimation | Yufei Xu et.al. | 2212.04246 | :mortar_board: | Code |
2022-12-07 | Designing Feature Vector Representations: A case study from Chemistry | Signe Sidwall Thygesen et.al. | 2212.03731 | :mortar_board: | None |
2022-12-06 | DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model | Jeongjun Choi et.al. | 2212.02796 | :mortar_board: | None |
2022-12-05 | Images Speak in Images: A Generalist Painter for In-Context Visual Learning | Xinlong Wang et.al. | 2212.02499 | :mortar_board: | Code |
2022-12-05 | R2FD2: Fast and Robust Matching of Multimodal Remote Sensing Image via Repeatable Feature Detector and Rotation-invariant Feature Descriptor | Bai Zhu et.al. | 2212.02277 | :mortar_board: | None |
2022-11-28 | FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network | Xinjiang Wang et.al. | 2211.15069 | :mortar_board: | None |
2022-11-27 | BALF: Simple and Efficient Blur Aware Local Feature Detector | Zhenjun Zhao et.al. | 2211.14731 | :mortar_board: | None |
2022-11-21 | Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching | Paul Roetzer et.al. | 2211.11589 | :mortar_board: | None |
2022-11-07 | Learning Feature Descriptors for Pre- and Intra-operative Point Cloud Matching for Laparoscopic Liver Registration | Zixin Yang et.al. | 2211.03688 | :mortar_board: | None |
2022-10-31 | Tree Detection and Diameter Estimation Based on Deep Learning | Vincent Grondin et.al. | 2210.17424 | :mortar_board: | Code |
2022-10-26 | Learning a Task-specific Descriptor for Robust Matching of 3D Point Clouds | Zhiyuan Zhang et.al. | 2210.14899 | :mortar_board: | None |
2022-10-23 | Few-Shot Meta Learning for Recognizing Facial Phenotypes of Genetic Disorders | Ömer Sümer et.al. | 2210.12705 | :mortar_board: | None |
2022-10-21 | Real-time Detection of 2D Tool Landmarks with Synthetic Training Data | Bram Vanherle et.al. | 2210.11991 | :mortar_board: | None |
2022-10-09 | Fusing Event-based Camera and Radar for SLAM Using Spiking Neural Networks with Continual STDP Learning | Ali Safa et.al. | 2210.04236 | :mortar_board: | None |
2022-10-04 | Centroid Distance Keypoint Detector for Colored Point Clouds | Hanzhe Teng et.al. | 2210.01298 | :mortar_board: | Code |
2022-09-28 | Category-Level Global Camera Pose Estimation with Multi-Hypothesis Point Cloud Correspondences | Jun-Jee Chao et.al. | 2209.14419 | :mortar_board: | None |
2022-09-28 | USEEK: Unsupervised SE(3)-Equivariant 3D Keypoints for Generalizable Manipulation | Zhengrong Xue et.al. | 2209.13864 | :mortar_board: | None |
2022-09-27 | Suture Thread Spline Reconstruction from Endoscopic Images for Robotic Surgery with Reliability-driven Keypoint Detection | Neelay Joglekar et.al. | 2209.13657 | :mortar_board: | Code |
2022-09-27 | Learning-Based Dimensionality Reduction for Computing Compact and Effective Local Feature Descriptors | Hao Dong et.al. | 2209.13586 | :mortar_board: | Code |
2022-09-26 | Performance Evaluation of 3D Keypoint Detectors and Descriptors on Coloured Point Clouds in Subsea Environments | Kyungmin Jung et.al. | 2209.12881 | :mortar_board: | None |
2022-09-21 | Long-Lived Accurate Keypoints in Event Streams | Philippe Chiberre et.al. | 2209.10385 | :mortar_board: | None |
2022-09-19 | Integrative Feature and Cost Aggregation with Transformers for Dense Correspondence | Sunghwan Hong et.al. | 2209.08742 | :mortar_board: | None |
2022-09-15 | Online Marker-free Extrinsic Camera Calibration using Person Keypoint Detections | Bastian Pätzold et.al. | 2209.07393 | :mortar_board: | Code |
2022-09-07 | Deep Learning-Based Automatic Diagnosis System for Developmental Dysplasia of the Hip | Yang Li et.al. | 2209.03440 | :mortar_board: | None |
2022-08-27 | Learning to SLAM on the Fly in Unknown Environments: A Continual Learning Approach for Drones in Visually Ambiguous Scenes | Ali Safa et.al. | 2208.12997 | :mortar_board: | None |
2022-08-24 | Self-Supervised Endoscopic Image Key-Points Matching | Manel Farhat et.al. | 2208.11424 | :mortar_board: | Code |
2022-08-17 | Blind-Spot Collision Detection System for Commercial Vehicles Using Multi Deep CNN Architecture | Muhammad Muzammel et.al. | 2208.08224 | :mortar_board: | None |
2022-08-08 | MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware Ambidextrous Bin Picking via Physics-based Metaverse Synthesis | Maximilian Gilles et.al. | 2208.03963 | :mortar_board: | None |
2022-08-07 | CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization | Yujiao Shi et.al. | 2208.03660 | :mortar_board: | None |
2022-07-29 | Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation | Qihao Liu et.al. | 2208.00090 | :mortar_board: | None |
2022-07-25 | Translating a Visual LEGO Manual to a Machine-Executable Plan | Ruocheng Wang et.al. | 2207.12572 | :mortar_board: | None |
2022-07-21 | Multi-modal Retinal Image Registration Using a Keypoint-Based Vessel Structure Aligning Network | Aline Sindel et.al. | 2207.10506 | :mortar_board: | None |
2022-07-15 | Human keypoint detection for close proximity human-robot interaction | Jan Docekal et.al. | 2207.07742 | :mortar_board: | None |
2022-07-15 | Adversarial Focal Loss: Asking Your Discriminator for Hard Examples | Chen Liu et.al. | 2207.07739 | :mortar_board: | None |
2022-07-13 | Rapid Person Re-Identification via Sub-space Consistency Regularization | Qingze Yin et.al. | 2207.05933 | :mortar_board: | None |
2022-07-07 | RWT-SLAM: Robust Visual SLAM for Highly Weak-textured Environments | Qihao Peng et.al. | 2207.03539 | :mortar_board: | None |
2022-07-06 | Semi-supervised Human Pose Estimation in Art-historical Images | Matthias Springstein et.al. | 2207.02976 | :mortar_board: | None |
2022-07-01 | Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature Decoupling | Jiamin Liang et.al. | 2207.00474 | :mortar_board: | None |
2022-06-24 | Motion Estimation for Large Displacements and Deformations | Qiao Chen et.al. | 2206.12464 | :mortar_board: | None |
2022-06-24 | Deep embedded clustering algorithm for clustering PACS repositories | Teo Manojlović et.al. | 2206.12417 | :mortar_board: | None |
2022-06-21 | KTN: Knowledge Transfer Network for Learning Multi-person 2D-3D Correspondences | Xuanhan Wang et.al. | 2206.10090 | :mortar_board: | Code |
2022-06-20 | Self-Supervised Consistent Quantization for Fully Unsupervised Image Retrieval | Guile Wu et.al. | 2206.09806 | :mortar_board: | None |
2022-06-15 | A Unified Sequence Interface for Vision Tasks | Ting Chen et.al. | 2206.07669 | :mortar_board: | None |
2022-06-09 | Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields | Mingtong Zhang et.al. | 2206.04669 | :mortar_board: | None |
2022-06-03 | SNAKE: Shape-aware Neural 3D Keypoint Field | Chengliang Zhong et.al. | 2206.01724 | :mortar_board: | Code |
2022-05-17 | MulT: An End-to-End Multitask Learning Transformer | Deblina Bhattacharjee et.al. | 2205.08303 | :mortar_board: | None |
2022-05-10 | ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions In-the-Wild | Chirag Raman et.al. | 2205.05177 | :mortar_board: | None |
2022-04-28 | Polarimetric imaging for the detection of synthetic models of SARS-CoV-2: a proof of concept | Emilio Gomez-Gonzalez et.al. | 2204.14050 | :mortar_board: | None |
2022-04-28 | GRIT: General Robust Image Task Benchmark | Tanmay Gupta et.al. | 2204.13653 | :mortar_board: | Code |
2022-04-26 | ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation | Yufei Xu et.al. | 2204.12484 | :mortar_board: | Code |
2022-04-26 | Unified GCNs: Towards Connecting GCNs with CNNs | Ziyan Zhang et.al. | 2204.12300 | :mortar_board: | None |
2022-04-19 | Self-Supervised Equivariant Learning for Oriented Keypoint Detection | Jongmin Lee et.al. | 2204.08613 | :mortar_board: | Code |
3D Object Detection
Publish Date | Title | Authors | arxiv | Code | |
---|---|---|---|---|---|
2023-10-21 | Concept-based Anomaly Detection in Retail Stores for Automatic Correction using Mobile Robots | Aditya Kapoor et.al. | 2310.14063 | :mortar_board: | None |
2023-10-21 | Ophthalmic Biomarker Detection Using Ensembled Vision Transformers – Winning Solution to IEEE SPS VIP Cup 2023 | H. A. Z. Sameen Shahgir et.al. | 2310.14005 | :mortar_board: | None |
2023-10-21 | Exploring Driving Behavior for Autonomous Vehicles Based on Gramian Angular Field Vision Transformer | Junwei You et.al. | 2310.13906 | :mortar_board: | None |
2023-10-19 | A Car Model Identification System for Streamlining the Automobile Sales Process | Said Togru et.al. | 2310.13198 | :mortar_board: | None |
2023-10-19 | LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for Autonomous Driving with Multi-Task Learning | Pedram Agand et.al. | 2310.13135 | :mortar_board: | Code |
2023-10-18 | Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm | S. M. Fazle Rabby Labib et.al. | 2310.13019 | :mortar_board: | None |
2023-10-19 | Predicting Ovarian Cancer Treatment Response in Histopathology using Hierarchical Vision Transformers and Multiple Instance Learning | Jack Breen et.al. | 2310.12866 | :mortar_board: | Code |
2023-10-19 | Model Merging by Uncertainty-Based Gradient Matching | Nico Daheim et.al. | 2310.12808 | :mortar_board: | None |
2023-10-19 | Mixing Histopathology Prototypes into Robust Slide-Level Representations for Cancer Subtyping | Joshua Butke et.al. | 2310.12769 | :mortar_board: | Code |
2023-10-19 | Minimalist and High-Performance Semantic Segmentation with Plain Vision Transformers | Yuanduo Hong et.al. | 2310.12755 | :mortar_board: | Code |
2023-10-19 | Heart Disease Detection using Vision-Based Transformer Models from ECG Images | Zeynep Hilal Kilimci et.al. | 2310.12630 | :mortar_board: | None |
2023-10-19 | Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps | Sidi Wu et.al. | 2310.12616 | :mortar_board: | None |
2023-10-16 | Interpreting and Controlling Vision Foundation Models via Text Explanations | Haozhe Chen et.al. | 2310.10591 | :mortar_board: | Code |
2023-10-15 | Top-K Pooling with Patch Contrastive Learning for Weakly-Supervised Semantic Segmentation | Wangyu Wu et.al. | 2310.09828 | :mortar_board: | None |
2023-10-15 | MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection | David C. Jeong et.al. | 2310.09757 | :mortar_board: | None |
2023-10-13 | Tackling Heterogeneity in Medical Federated learning via Vision Transformers | Erfan Darzi et.al. | 2310.09444 | :mortar_board: | None |
2023-10-13 | PaLI-3 Vision Language Models: Smaller, Faster, Stronger | Xi Chen et.al. | 2310.09199 | :mortar_board: | None |
2023-10-13 | Faster 3D cardiac CT segmentation with Vision Transformers | Lee Jollans et.al. | 2310.09099 | :mortar_board: | Code |
2023-10-12 | LEMON: Lossless model expansion | Yite Wang et.al. | 2310.07999 | :mortar_board: | None |
2023-10-11 | 3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers | Jieneng Chen et.al. | 2310.07781 | :mortar_board: | Code |
2023-10-11 | Accelerating Vision Transformers Based on Heterogeneous Attention Patterns | Deli Yu et.al. | 2310.07664 | :mortar_board: | None |
2023-10-11 | ProtoHPE: Prototype-guided High-frequency Patch Enhancement for Visible-Infrared Person Re-identification | Guiwei Zhang et.al. | 2310.07552 | :mortar_board: | None |
2023-10-11 | ViT-A: Legged Robot Path Planning using Vision Transformer A** | Jianwei Liu et.al. | 2310.07525 | :mortar_board: | None |
2023-10-11 | PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction | Weijie Gan et.al. | 2310.07504 | :mortar_board: | None |
2023-10-11 | Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation | Xu Zheng et.al. | 2310.07265 | :mortar_board: | None |
2023-10-10 | EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention | Yulong Shi et.al. | 2310.06629 | :mortar_board: | None |
2023-10-10 | Machine Eye for Defects: Machine Learning-Based Solution to Identify and Characterize Topological Defects in Textured Images of Nematic Materials | Haijie Ren et.al. | 2310.06406 | :mortar_board: | None |
2023-10-10 | Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling | Huangjie Zheng et.al. | 2310.06389 | :mortar_board: | None |
2023-10-10 | Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing | Wei Dong et.al. | 2310.06234 | :mortar_board: | Code |
2023-10-09 | DiPS: Discriminative Pseudo-Label Sampling with Self-Supervised Transformers for Weakly Supervised Object Localization | Shakeeb Murtaza et.al. | 2310.06196 | :mortar_board: | Code |
2023-10-09 | SimPLR: A Simple and Plain Transformer for Object Detection and Segmentation | Duy-Kien Nguyen et.al. | 2310.05920 | :mortar_board: | None |
2023-10-09 | Transformer Fusion with Optimal Transport | Moritz Imfeld et.al. | 2310.05719 | :mortar_board: | None |
2023-10-09 | ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain | Md Sohag Mia et.al. | 2310.05664 | :mortar_board: | None |
2023-10-09 | No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling | Xuwei Xu et.al. | 2310.05654 | :mortar_board: | None |
2023-10-09 | Plug n’ Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers | Xuwei Xu et.al. | 2310.05642 | :mortar_board: | None |
2023-10-09 | A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers | Matteo Bastico et.al. | 2310.05572 | :mortar_board: | Code |
2023-10-09 | RetSeg: Retention-based Colorectal Polyps Segmentation Network | Khaled ELKarazle et.al. | 2310.05446 | :mortar_board: | None |
2023-10-09 | Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers | Shiyue Cao et.al. | 2310.05400 | :mortar_board: | None |
2023-10-09 | Hierarchical Side-Tuning for Vision Transformers | Weifeng Lin et.al. | 2310.05393 | :mortar_board: | None |
2023-10-08 | Low-Resolution Self-Attention for Semantic Segmentation | Yu-Huan Wu et.al. | 2310.05026 | :mortar_board: | None |
2023-10-06 | FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning | Peiran Xu et.al. | 2310.04412 | :mortar_board: | Code |
2023-10-06 | TiC: Exploring Vision Transformer in Convolution | Song Zhang et.al. | 2310.04134 | :mortar_board: | Code |
2023-10-06 | Sub-token ViT Embedding via Stochastic Resonance Transformers | Dong Lao et.al. | 2310.03967 | :mortar_board: | None |
2023-10-05 | ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures | Haoxuan Liu et.al. | 2310.03841 | :mortar_board: | None |
2023-10-05 | Exploring DINO: Emergent Properties and Limitations for Synthetic Aperture Radar Imagery | Joseph A. Gallego-Mejia et.al. | 2310.03513 | :mortar_board: | None |
2023-10-05 | Swin-Tempo: Temporal-Aware Lung Nodule Detection in CT Scans as Video Sequences Using Swin Transformer-Enhanced UNet | Hossein Jafari et.al. | 2310.03365 | :mortar_board: | None |
2023-10-04 | Neural architecture impact on identifying temporally extended Reinforcement Learning tasks | Victor Vadakechirayath George et.al. | 2310.03161 | :mortar_board: | None |
2023-10-04 | Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition | Hamid Mohammadi et.al. | 2310.03108 | :mortar_board: | None |
2023-10-04 | Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer | Hongruixuan Chen et.al. | 2310.02674 | :mortar_board: | Code |
2023-10-04 | GET: Group Event Transformer for Event-Based Vision | Yansong Peng et.al. | 2310.02642 | :mortar_board: | Code |
2023-10-04 | ViT-ReciproCAM: Gradient and Attention-Free Visual Explanations for Vision Transformer | Seok-Yong Byun et.al. | 2310.02588 | :mortar_board: | None |
2023-10-04 | Improving Drumming Robot Via Attention Transformer Network | Yang Yi et.al. | 2310.02565 | :mortar_board: | None |
2023-10-04 | SlowFormer: Universal Adversarial Patch for Attack on Compute and Energy Efficiency of Inference Efficient Vision Transformers | KL Navaneet et.al. | 2310.02544 | :mortar_board: | Code |
2023-10-03 | Selective Feature Adapter for Dense Vision Transformers | Xueqing Deng et.al. | 2310.01843 | :mortar_board: | None |
2023-10-03 | PPT: Token Pruning and Pooling for Efficient Vision Transformers | Xinjian Wu et.al. | 2310.01812 | :mortar_board: | None |
2023-10-02 | CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction | Size Wu et.al. | 2310.01403 | :mortar_board: | Code |
2023-10-02 | Self-distilled Masked Attention guided masked image modeling with noise Regularized Teacher (SMART) for medical image analysis | Jue Jiang et.al. | 2310.01209 | :mortar_board: | None |
2023-10-01 | RegBN: Batch Normalization of Multimodal Data with Regularization | Morteza Ghahremani et.al. | 2310.00641 | :mortar_board: | Code |
2023-10-01 | Win-Win: Training High-Resolution Vision Transformers from Two Windows | Vincent Leroy et.al. | 2310.00632 | :mortar_board: | None |
2023-09-30 | MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from Chest X-ray Images | Huyen Tran et.al. | 2310.00418 | :mortar_board: | None |
2023-09-30 | Distilling Inductive Bias: Knowledge Distillation Beyond Model Compression | Gousia Habib et.al. | 2310.00369 | :mortar_board: | None |
2023-09-30 | Dual-Augmented Transformer Network for Weakly Supervised Semantic Segmentation | Jingliang Deng et.al. | 2310.00307 | :mortar_board: | None |
2023-09-29 | SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation | Zhongang Cai et.al. | 2309.17448 | :mortar_board: | None |
2023-09-28 | FLIP: Cross-domain Face Anti-spoofing with Language Guidance | Koushik Srivatsan et.al. | 2309.16649 | :mortar_board: | Code |
2023-09-28 | Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit | Blake Bordelon et.al. | 2309.16620 | :mortar_board: | None |
2023-09-28 | Vision Transformers Need Registers | Timothée Darcet et.al. | 2309.16588 | :mortar_board: | None |
2023-09-28 | HTC-DC Net: Monocular Height Estimation from Single Remote Sensing Images | Sining Chen et.al. | 2309.16486 | :mortar_board: | Code |
2023-09-28 | Channel Vision Transformers: An Image Is Worth C x 16 x 16 Words | Yujia Bao et.al. | 2309.16108 | :mortar_board: | None |
2023-09-26 | GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for Indoor Scenes | Chaoqiang Zhao et.al. | 2309.16019 | :mortar_board: | None |
2023-09-27 | CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and Favorable Transferability For ViTs | Ao Wang et.al. | 2309.15755 | :mortar_board: | None |
2023-09-27 | Improving Facade Parsing with Vision Transformers and Line Integration | Bowen Wang et.al. | 2309.15523 | :mortar_board: | Code |
2023-09-26 | Efficient Low-rank Backpropagation for Vision Transformer Adaptation | Yuedong Yang et.al. | 2309.15275 | :mortar_board: | None |
2023-09-25 | Assessment of IBM and NASA’s geospatial foundation model in flood inundation mapping | Wenwen Li et.al. | 2309.14500 | :mortar_board: | None |
2023-09-25 | Masked Image Residual Learning for Scaling Deeper Vision Transformers | Guoxi Huang et.al. | 2309.14136 | :mortar_board: | None |
2023-09-25 | PARTICLE: Part Discovery and Contrastive Learning for Fine-grained Recognition | Oindrila Saha et.al. | 2309.13822 | :mortar_board: | Code |
2023-09-24 | MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP | Prajwal Ganugula et.al. | 2309.13716 | :mortar_board: | None |
2023-09-24 | Multi-Dimensional Hyena for Spatial Inductive Bias | Itamar Zimerman et.al. | 2309.13600 | :mortar_board: | None |
2023-09-24 | Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction | Zechuan Zhang et.al. | 2309.13524 | :mortar_board: | Code |
2023-09-23 | Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers | Adam Pardyl et.al. | 2309.13353 | :mortar_board: | None |
2023-09-23 | RBFormer: Improve Adversarial Robustness of Transformer by Robust Bias | Hao Cheng et.al. | 2309.13245 | :mortar_board: | None |
2023-09-22 | BayesDLL: Bayesian Deep Learning Library | Minyoung Kim et.al. | 2309.12928 | :mortar_board: | Code |
2023-09-22 | Associative Transformer Is A Sparse Representation Learner | Yuwei Sun et.al. | 2309.12862 | :mortar_board: | None |
2023-09-22 | Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where | Zhi-Yi Chin et.al. | 2309.12757 | :mortar_board: | None |
2023-09-22 | Vision Transformers for Computer Go | Amani Sagri et.al. | 2309.12675 | :mortar_board: | None |
2023-09-21 | DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion | Zhenzhen Chu et.al. | 2309.12424 | :mortar_board: | None |
2023-09-21 | Adaptive Input-image Normalization for Solving Mode Collapse Problem in GAN-based X-ray Images | Muhammad Muneeb Saad et.al. | 2309.12245 | :mortar_board: | None |
2023-09-21 | Bayesian sparsification for deep neural networks with Bayesian model reduction | Dimitrije Marković et.al. | 2309.12095 | :mortar_board: | None |
2023-09-21 | ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers | Philipp Ausserlechner et.al. | 2309.11986 | :mortar_board: | None |
2023-09-20 | RMT: Retentive Networks Meet Vision Transformers | Qihang Fan et.al. | 2309.11523 | :mortar_board: | None |
2023-09-20 | Forgery-aware Adaptive Vision Transformer for Face Forgery Detection | Anwei Luo et.al. | 2309.11092 | :mortar_board: | None |
2023-09-19 | Interpret Vision Transformers as ConvNets with Dynamic Convolutions | Chong Zhou et.al. | 2309.10713 | :mortar_board: | None |
2023-09-19 | Latent Space Energy-based Model for Fine-grained Open Set Recognition | Wentao Bao et.al. | 2309.10711 | :mortar_board: | None |
2023-09-19 | Self-Supervised Super-Resolution Approach for Isotropic Reconstruction of 3D Electron Microscopy Images from Anisotropic Acquisition | Mohammad Khateri et.al. | 2309.10646 | :mortar_board: | None |
2023-09-19 | Exploring the Influence of Information Entropy Change in Learning Systems | Xiaowei Yu et.al. | 2309.10625 | :mortar_board: | None |
2023-09-19 | LineMarkNet: Line Landmark Detection for Valet Parking | Zizhang Wu et.al. | 2309.10475 | :mortar_board: | None |
2023-09-18 | TransientViT: A novel CNN - Vision Transformer hybrid real/bogus transient classifier for the Kilodegree Automatic Transient Survey | Zhuoyang Chen et.al. | 2309.09937 | :mortar_board: | Code |
2023-09-18 | Selective Volume Mixup for Video Action Recognition | Yi Tan et.al. | 2309.09534 | :mortar_board: | None |
2023-09-17 | MVP: Meta Visual Prompt Tuning for Few-Shot Remote Sensing Image Scene Classification | Junjie Zhu et.al. | 2309.09276 | :mortar_board: | None |
2023-09-17 | Image-level supervision and self-training for transformer-based cross-modality tumor segmentation | Malo de Boisredon et.al. | 2309.09246 | :mortar_board: | None |
2023-09-16 | MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer | Fudong Lin et.al. | 2309.09067 | :mortar_board: | Code |
2023-09-16 | RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework | Yuelei Wang et.al. | 2309.09003 | :mortar_board: | None |
2023-09-15 | Biased Attention: Do Vision Transformers Amplify Gender Bias More than Convolutional Neural Networks? | Abhishek Mandal et.al. | 2309.08760 | :mortar_board: | Code |
2023-09-15 | Replacing softmax with ReLU in Vision Transformers | Mitchell Wortsman et.al. | 2309.08586 | :mortar_board: | None |
2023-09-15 | SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels | Henry Hengyuan Zhao et.al. | 2309.08513 | :mortar_board: | Code |
2023-09-15 | Cross-Modal Synthesis of Structural MRI and Functional Connectivity Networks via Conditional ViT-GANs | Yuda Bi et.al. | 2309.08160 | :mortar_board: | None |
2023-09-15 | AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT | Fangbo Qin et.al. | 2309.08134 | :mortar_board: | None |
2023-09-14 | Interpretability-Aware Vision Transformer | Yao Qiang et.al. | 2309.08035 | :mortar_board: | Code |
2023-09-13 | Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? | Bill Psomas et.al. | 2309.06891 | :mortar_board: | Code |
2023-09-12 | Exploring Non-additive Randomness on ViT against Query-Based Black-Box Attacks | Jindong Gu et.al. | 2309.06438 | :mortar_board: | None |
2023-09-12 | Jersey Number Recognition using Keyframe Identification from Low-Resolution Broadcast Videos | Bavesh Balaji et.al. | 2309.06285 | :mortar_board: | None |
2023-09-12 | A 3M-Hybrid Model for the Restoration of Unique Giant Murals: A Case Study on the Murals of Yongle Palace | Jing Yang et.al. | 2309.06194 | :mortar_board: | None |
2023-09-12 | Feature Aggregation Network for Building Extraction from High-resolution Remote Sensing Images | Xuan Zhou et.al. | 2309.06017 | :mortar_board: | None |
2023-09-11 | Mobile Vision Transformer-based Visual Object Tracking | Goutam Yelluru Gopal et.al. | 2309.05829 | :mortar_board: | Code |
2023-09-11 | Divergences in Color Perception between Deep Neural Networks and Humans | Ethan O. Nadler et.al. | 2309.05809 | :mortar_board: | None |
2023-09-11 | CNN or ViT? Revisiting Vision Transformers Through the Lens of Convolution | Chenghao Li et.al. | 2309.05375 | :mortar_board: | None |
2023-09-10 | DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices | Guanyu Xu et.al. | 2309.05015 | :mortar_board: | None |
2023-09-09 | How to Evaluate Semantic Communications for Images with ViTScore Metric? | Tingting Zhu et.al. | 2309.04891 | :mortar_board: | None |
2023-09-09 | Unified Language-Vision Pretraining with Dynamic Discrete Visual Tokenization | Yang Jin et.al. | 2309.04669 | :mortar_board: | None |
2023-09-09 | Video and Synthetic MRI Pre-training of 3D Vision Architectures for Neuroimage Analysis | Nikhil J. Dhinagar et.al. | 2309.04651 | :mortar_board: | None |
2023-09-08 | Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts | Erik Daxberger et.al. | 2309.04354 | :mortar_board: | None |
2023-09-07 | S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens | Rizhao Cai et.al. | 2309.04038 | :mortar_board: | None |
2023-09-07 | DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions | Haochen Wang et.al. | 2309.03576 | :mortar_board: | None |
2023-09-06 | Combining pre-trained Vision Transformers and CIDER for Out Of Domain Detection | Grégor Jouet et.al. | 2309.03047 | :mortar_board: | None |
2023-09-06 | Improving diagnosis and prognosis of lung cancer using vision transformers: A scoping review | Hazrat Ali et.al. | 2309.02783 | :mortar_board: | None |
2023-09-05 | Compressing Vision Transformers for Low-Resource Visual Learning | Eric Youn et.al. | 2309.02617 | :mortar_board: | None |
2023-09-05 | Domain Adaptation for Efficiently Fine-tuning Vision Transformer with Encrypted Images | Teru Nagamori et.al. | 2309.02556 | :mortar_board: | None |
2023-09-05 | A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking | Lorenzo Papa et.al. | 2309.02031 | :mortar_board: | None |
2023-09-04 | Locality-Aware Hyperspectral Classification | Fangqin Zhou et.al. | 2309.01561 | :mortar_board: | Code |
2023-09-04 | Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN | Kin Wai Lau et.al. | 2309.01439 | :mortar_board: | None |
2023-09-04 | DAT++: Spatially Dynamic Vision Transformer with Deformable Attention | Zhuofan Xia et.al. | 2309.01430 | :mortar_board: | Code |
2023-09-04 | Leveraging Self-Supervised Vision Transformers for Neural Transfer Function Design | Dominik Engel et.al. | 2309.01408 | :mortar_board: | None |
2023-09-04 | Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization | Yiwen Cao et.al. | 2309.01331 | :mortar_board: | None |
2023-09-04 | ExMobileViT: Lightweight Classifier Extension for Mobile Vision Transformer | Gyeongdong Yang et.al. | 2309.01310 | :mortar_board: | None |
2023-09-02 | Contrastive Feature Masking Open-Vocabulary Vision Transformer | Dahun Kim et.al. | 2309.00775 | :mortar_board: | None |
2023-09-01 | Deep Joint Source-Channel Coding for Adaptive Image Transmission over MIMO Channels | Haotian Wu et.al. | 2309.00470 | :mortar_board: | None |
2023-09-01 | Interpretable Medical Imagery Diagnosis with Self-Attentive Transformers: A Review of Explainable AI for Health Care | Tin Lai et.al. | 2309.00252 | :mortar_board: | None |
2023-08-31 | Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation | Reza Azad et.al. | 2309.00121 | :mortar_board: | None |
2023-08-31 | Laplacian-Former: Overcoming the Limitations of Vision Transformers in Local Texture Detection | Reza Azad et.al. | 2309.00108 | :mortar_board: | None |
2023-08-31 | Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation | Ramtin Mojtahedi et.al. | 2308.16598 | :mortar_board: | Code |
2023-08-30 | Learning Diverse Features in Vision Transformers for Improved Generalization | Armand Mihai Nicolicioiu et.al. | 2308.16274 | :mortar_board: | Code |
2023-08-30 | Emergence of Segmentation with Minimalistic White-Box Transformers | Yaodong Yu et.al. | 2308.16271 | :mortar_board: | Code |
2023-08-29 | Efficient Model Personalization in Federated Learning via Client-Specific Prompt Generation | Fu-En Yang et.al. | 2308.15367 | :mortar_board: | None |
2023-08-29 | Imperceptible Adversarial Attack on Deep Neural Networks from Image Boundary | Fahad Alrasheedi et.al. | 2308.15344 | :mortar_board: | None |
2023-08-29 | TKwinFormer: Top k Window Attention in Vision Transformers for Feature Matching | Yun Liao et.al. | 2308.15144 | :mortar_board: | None |
2023-08-28 | PanoSwin: a Pano-style Swin Transformer for Panorama Understanding | Zhixin Ling et.al. | 2308.14726 | :mortar_board: | None |
2023-08-28 | Fast Feedforward Networks | Peter Belcak et.al. | 2308.14711 | :mortar_board: | Code |
2023-08-28 | FIRE: Food Image to REcipe generation | Prateek Chhikara et.al. | 2308.14391 | :mortar_board: | None |
2023-08-28 | GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition | Ruijie Yao et.al. | 2308.14378 | :mortar_board: | None |
2023-08-27 | A comprehensive review on Plant Leaf Disease detection using Deep learning | Sumaya Mustofa et.al. | 2308.14087 | :mortar_board: | None |
2023-08-27 | Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation | Sunandini Sanyal et.al. | 2308.14023 | :mortar_board: | None |
2023-08-26 | Fixating on Attention: Integrating Human Eye Tracking into Vision Transformers | Sharath Koorathota et.al. | 2308.13969 | :mortar_board: | None |
2023-08-26 | Unified Single-Stage Transformer Network for Efficient RGB-T Tracking | Jianqiang Xia et.al. | 2308.13764 | :mortar_board: | None |
2023-08-25 | ACC-UNet: A Completely Convolutional UNet model for the 2020s | Nabil Ibtehaz et.al. | 2308.13680 | :mortar_board: | Code |
2023-08-25 | Enhancing Landmark Detection in Cluttered Real-World Scenarios with Vision Transformers | Mohammad Javad Rajabi et.al. | 2308.13671 | :mortar_board: | None |
2023-08-25 | Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers | Matthew Dutson et.al. | 2308.13494 | :mortar_board: | Code |
2023-08-25 | An investigation into the impact of deep learning model choice on sex and race bias in cardiac MR segmentation | Tiarna Lee et.al. | 2308.13415 | :mortar_board: | None |
2023-08-25 | CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing | Jonathan Cui et.al. | 2308.13363 | :mortar_board: | None |
2023-08-25 | A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized Semantic Segmentation | Jan-Aike Termöhlen et.al. | 2308.13331 | :mortar_board: | None |
2023-08-24 | Full-dose PET Synthesis from Low-dose PET Using High-efficiency Diffusion Denoising Probabilistic Model | Shaoyan Pan et.al. | 2308.13072 | :mortar_board: | None |
2023-08-24 | Spherical Vision Transformer for 360-degree Video Saliency Prediction | Mert Cokelek et.al. | 2308.13004 | :mortar_board: | None |
2023-08-24 | Towards Hierarchical Regional Transformer-based Multiple Instance Learning | Josef Cersovsky et.al. | 2308.12634 | :mortar_board: | None |
2023-08-24 | SieveNet: Selecting Point-Based Features for Mesh Networks | Shengchao Yuan et.al. | 2308.12530 | :mortar_board: | None |
2023-08-23 | MOFO: MOtion FOcused Self-Supervision for Video Understanding | Mona Ahmadian et.al. | 2308.12447 | :mortar_board: | None |
2023-08-23 | BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection | Tinghao Xie et.al. | 2308.12439 | :mortar_board: | None |
2023-08-23 | Vision Transformer Adapters for Generalizable Multitask Learning | Deblina Bhattacharjee et.al. | 2308.12372 | :mortar_board: | None |
2023-08-23 | SPPNet: A Single-Point Prompt Network for Nuclei Image Segmentation | Qing Xu et.al. | 2308.12231 | :mortar_board: | Code |
2023-08-23 | SG-Former: Self-guided Transformer with Evolving Token Reallocation | Sucheng Ren et.al. | 2308.12216 | :mortar_board: | None |
2023-08-23 | Masking Strategies for Background Bias Removal in Computer Vision Models | Ananthu Aniraj et.al. | 2308.12127 | :mortar_board: | Code |
2023-08-23 | Local Distortion Aware Efficient Transformer Adaptation for Image Quality Assessment | Kangmin Xu et.al. | 2308.12001 | :mortar_board: | None |
2023-08-22 | SwinFace: A Multi-task Transformer for Face Recognition, Expression Recognition, Age Estimation and Attribute Estimation | Lixiong Qin et.al. | 2308.11509 | :mortar_board: | Code |
2023-08-22 | Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding | Jiantao Wu et.al. | 2308.11448 | :mortar_board: | None |
2023-08-22 | SAIPy: A Python Package for single station Earthquake Monitoring using Deep Learning | Wei Li et.al. | 2308.11428 | :mortar_board: | None |
2023-08-22 | TurboViT: Generating Fast Vision Transformers via Generative Architecture Search | Alexander Wong et.al. | 2308.11421 | :mortar_board: | None |
2023-08-22 | Exemplar-Free Continual Transformer with Convolutions | Anurag Roy et.al. | 2308.11357 | :mortar_board: | None |
2023-08-21 | Vision Transformer Pruning Via Matrix Decomposition | Tianyi Sun et.al. | 2308.10839 | :mortar_board: | None |
2023-08-21 | Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers | Natalia Frumkin et.al. | 2308.10814 | :mortar_board: | None |
2023-08-21 | Patch Is Not All You Need | Changzhen Li et.al. | 2308.10729 | :mortar_board: | None |
2023-08-21 | Spatial Transform Decoupling for Oriented Object Detection | Hongtian Yu et.al. | 2308.10561 | :mortar_board: | Code |
2023-08-21 | Joint learning of images and videos with a single Vision Transformer | Shuki Shimizu et.al. | 2308.10533 | :mortar_board: | None |
2023-08-20 | Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting | Qidong Huang et.al. | 2308.10315 | :mortar_board: | Code |
2023-08-20 | FedSIS: Federated Split Learning with Intermediate Representation Sampling for Privacy-preserving Generalized Face Presentation Attack Detection | Naif Alkhunaizi et.al. | 2308.10236 | :mortar_board: | Code |
2023-08-20 | TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective | Jun Dan et.al. | 2308.10133 | :mortar_board: | Code |
2023-08-19 | Towards a High-Performance Object Detector: Insights from Drone Detection Using ViT and CNN-based Deep Learning Models | Junyang Zhang et.al. | 2308.09899 | :mortar_board: | None |
2023-08-18 | On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers | Thomas De Min et.al. | 2308.09610 | :mortar_board: | Code |
2023-08-18 | Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers | Tobias Christian Nauen et.al. | 2308.09372 | :mortar_board: | Code |
2023-08-17 | FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning | Guangyu Sun et.al. | 2308.09160 | :mortar_board: | Code |
2023-08-17 | SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning | Hao Feng et.al. | 2308.09040 | :mortar_board: | None |
2023-08-16 | SkinDistilViT: Lightweight Vision Transformer for Skin Lesion Classification | Vlad-Constantin Lungu-Stan et.al. | 2308.08669 | :mortar_board: | None |
2023-08-15 | SEDA: Self-Ensembling ViT with Defensive Distillation and Adversarial Training for robust Chest X-rays Classification | Raza Imam et.al. | 2308.07874 | :mortar_board: | Code |
2023-08-15 | Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening | Jack Foster et.al. | 2308.07707 | :mortar_board: | Code |
2023-08-15 | Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images | Soroosh Tayebi Arasteh et.al. | 2308.07688 | :mortar_board: | None |
2023-08-15 | Block-Wise Encryption for Reliable Vision Transformer models | Hitoshi Kiya et.al. | 2308.07612 | :mortar_board: | None |
2023-08-14 | Wide-Area Geolocalization with a Limited Field of View Camera in Challenging Urban Environments | Lena M. Downes et.al. | 2308.07432 | :mortar_board: | None |
2023-08-14 | A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis | Esteve Valls Mascaro et.al. | 2308.07301 | :mortar_board: | None |
2023-08-14 | A Robust Approach Towards Distinguishing Natural and Computer Generated Images using Multi-Colorspace fused and Enriched Vision Transformer | Manjary P Gangan et.al. | 2308.07279 | :mortar_board: | Code |
2023-08-14 | Large-kernel Attention for Efficient and Robust Brain Lesion Segmentation | Liam Chalcroft et.al. | 2308.07251 | :mortar_board: | Code |
2023-08-14 | SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers | Xijun Wang et.al. | 2308.07110 | :mortar_board: | None |
2023-08-14 | Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking | Ben Kang et.al. | 2308.06904 | :mortar_board: | Code |
2023-08-13 | Modified Topological Image Preprocessing for Skin Lesion Classifications | Hong Cheng et.al. | 2308.06796 | :mortar_board: | None |
2023-08-12 | Revisiting Vision Transformer from the View of Path Ensemble | Shuning Chang et.al. | 2308.06548 | :mortar_board: | None |
2023-08-12 | Performance Analysis for Resource Constrained Decentralized Federated Learning Over Wireless Networks | Zhigang Yan et.al. | 2308.06496 | :mortar_board: | None |
2023-08-11 | Experts Weights Averaging: A New General Training Scheme for Vision Transformers | Yongqi Huang et.al. | 2308.06093 | :mortar_board: | None |
2023-08-10 | Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention | Liang Shang et.al. | 2308.05872 | :mortar_board: | None |
2023-08-10 | Temporally-Adaptive Models for Efficient Video Understanding | Ziyuan Huang et.al. | 2308.05787 | :mortar_board: | Code |
2023-08-10 | Surface Masked AutoEncoder: Self-Supervision for Cortical Imaging Data | Simon Dahan et.al. | 2308.05474 | :mortar_board: | Code |
2023-08-09 | Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping | Ali Jamali et.al. | 2308.05235 | :mortar_board: | Code |
2023-08-09 | A degree of image identification at sub-human scales could be possible with more advanced clusters | Prateek Y J et.al. | 2308.05092 | :mortar_board: | Code |
2023-08-09 | Which Tokens to Use? Investigating Token Reduction in Vision Transformers | Joakim Bruslund Haurum et.al. | 2308.04657 | :mortar_board: | None |
2023-08-08 | Unsupervised Camouflaged Object Segmentation as Domain Adaptation | Yi Zhang et.al. | 2308.04528 | :mortar_board: | Code |
2023-08-08 | All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation | Weixuan Sun et.al. | 2308.04321 | :mortar_board: | None |
2023-08-08 | Class-level Structural Relation Modelling and Smoothing for Visual Representation Learning | Zitan Chen et.al. | 2308.04142 | :mortar_board: | Code |
2023-08-07 | Communication-Efficient Framework for Distributed Image Semantic Wireless Transmission | Bingyan Xie et.al. | 2308.03713 | :mortar_board: | None |
2023-08-07 | Scaling may be all you need for achieving human-level object recognition capacity with human-like visual experience | A. Emin Orhan et.al. | 2308.03712 | :mortar_board: | Code |
2023-08-07 | Improving FHB Screening in Wheat Breeding Using an Efficient Transformer Model | Babak Azad et.al. | 2308.03670 | :mortar_board: | None |
2023-08-07 | DiT: Efficient Vision Transformers with Dynamic Token Routing | Yuchen Ma et.al. | 2308.03409 | :mortar_board: | Code |
2023-08-07 | Part-Aware Transformer for Generalizable Person Re-identification | Hao Ni et.al. | 2308.03322 | :mortar_board: | None |
2023-08-07 | FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search | Jordan Dotzel et.al. | 2308.03290 | :mortar_board: | None |
2023-08-06 | TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment | Chaofeng Chen et.al. | 2308.03060 | :mortar_board: | Code |
2023-08-06 | High-Resolution Vision Transformers for Pixel-Level Identification of Structural Components and Damage | Kareem Eltouny et.al. | 2308.03006 | :mortar_board: | None |
2023-08-06 | MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation | Lian Xu et.al. | 2308.03005 | :mortar_board: | Code |
2023-08-05 | Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution | Yong Liu et.al. | 2308.02794 | :mortar_board: | Code |
2023-08-04 | M2Former: Multi-Scale Patch Selection for Fine-Grained Visual Recognition | Jiyong Moon et.al. | 2308.02161 | :mortar_board: | None |
2023-08-04 | Breast Ultrasound Tumor Classification Using a Hybrid Multitask CNN-Transformer Network | Bryar Shareef et.al. | 2308.02101 | :mortar_board: | None |
2023-08-03 | A Multidimensional Analysis of Social Biases in Vision Transformers | Jannik Brinkmann et.al. | 2308.01948 | :mortar_board: | None |
2023-08-03 | Dynamic Token-Pass Transformers for Semantic Segmentation | Yuang Liu et.al. | 2308.01944 | :mortar_board: | None |
2023-08-02 | A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models | Bilel Guetarni et.al. | 2308.01328 | :mortar_board: | None |
2023-08-02 | Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation | Quan Tang et.al. | 2308.01045 | :mortar_board: | None |
2023-08-01 | DINO-CXR: A self supervised method based on vision transformer for chest X-ray classification | Mohammadreza Shakouri et.al. | 2308.00475 | :mortar_board: | None |
2023-08-01 | ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data | Ruiqi Yang et.al. | 2308.00454 | :mortar_board: | Code |
2023-08-01 | FLatten Transformer: Vision Transformer using Focused Linear Attention | Dongchen Han et.al. | 2308.00442 | :mortar_board: | Code |
2023-08-01 | Enhanced Security with Encrypted Vision Transformer in Federated Learning | Rei Aso et.al. | 2308.00271 | :mortar_board: | None |
2023-08-01 | Improving Pixel-based MIM by Reducing Wasted Modeling Capability | Yuan Liu et.al. | 2308.00261 | :mortar_board: | Code |
2023-08-01 | LGViT: Dynamic Early Exiting for Accelerating Vision Transformer | Guanyu Xu et.al. | 2308.00255 | :mortar_board: | None |
2023-07-31 | Performance Evaluation of Swin Vision Transformer Model using Gradient Accumulation Optimization Technique | Sanad Aburass et.al. | 2308.00197 | :mortar_board: | None |
2023-07-30 | StylePrompter: All Styles Need Is Attention | Chenyi Zhuang et.al. | 2307.16151 | :mortar_board: | Code |
2023-07-29 | HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation | Zuyan Liu et.al. | 2307.16061 | :mortar_board: | None |
2023-07-29 | CoVid-19 Detection leveraging Vision Transformers and Explainable AI | Pangoth Santhosh Kumar et.al. | 2307.16033 | :mortar_board: | None |
2023-07-27 | Self-Supervised Graph Transformer for Deepfake Detection | Aminollah Khormali et.al. | 2307.15019 | :mortar_board: | None |
2023-07-27 | IML-ViT: Image Manipulation Localization by Vision Transformer | Xiaochen Ma et.al. | 2307.14863 | :mortar_board: | Code |
2023-07-27 | Pre-training Vision Transformers with Very Limited Synthesized Images | Ryo Nakamura et.al. | 2307.14710 | :mortar_board: | Code |
2023-07-26 | MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation | Reiner Birkl et.al. | 2307.14460 | :mortar_board: | Code |
2023-07-26 | Sparse Double Descent in Vision Transformers: real or phantom threat? | Victor Quétu et.al. | 2307.14253 | :mortar_board: | Code |
2023-07-26 | Boon: A Neural Search Engine for Cross-Modal Information Retrieval | Yan Gong et.al. | 2307.14240 | :mortar_board: | None |
2023-07-26 | Adaptive Frequency Filters As Efficient Global Token Mixers | Zhipeng Huang et.al. | 2307.14008 | :mortar_board: | None |
2023-07-26 | Enhanced Security against Adversarial Examples Using a Random Ensemble of Encrypted Vision Transformer Models | Ryota Iijima et.al. | 2307.13985 | :mortar_board: | None |
2023-07-26 | Understanding Deep Neural Networks via Linear Separability of Hidden Layers | Chao Zhang et.al. | 2307.13962 | :mortar_board: | None |
2023-07-26 | Visual Prompt Flexible-Modal Face Anti-Spoofing | Zitong Yu et.al. | 2307.13958 | :mortar_board: | None |
2023-07-26 | AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets | Siyi Du et.al. | 2307.13897 | :mortar_board: | Code |
2023-07-25 | On the unreasonable vulnerability of transformers for image restoration – and an easy fix | Shashank Agnihotri et.al. | 2307.13856 | :mortar_board: | None |
2023-07-25 | Optical Flow boosts Unsupervised Localization and Segmentation | Xinyu Zhang et.al. | 2307.13640 | :mortar_board: | Code |
2023-07-25 | Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network | Chull Hwan Song et.al. | 2307.13254 | :mortar_board: | None |
2023-07-25 | Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition | Cheng Da et.al. | 2307.13244 | :mortar_board: | Code |
2023-07-24 | AMAE: Adaptation of Pre-Trained Masked Autoencoder for Dual-Distribution Anomaly Detection in Chest X-Rays | Behzad Bozorgtabar et.al. | 2307.12721 | :mortar_board: | None |
2023-07-24 | SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation | Yiqing Wang et.al. | 2307.12591 | :mortar_board: | Code |
2023-07-24 | A Good Student is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation | Jinjing Zhu et.al. | 2307.12574 | :mortar_board: | None |
2023-07-24 | Robust face anti-spoofing framework with Convolutional Vision Transformer | Yunseung Lee et.al. | 2307.12459 | :mortar_board: | None |
2023-07-23 | Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision | Menghao Li et.al. | 2307.12392 | :mortar_board: | Code |
2023-07-22 | Sparse then Prune: Toward Efficient Vision Transformers | Yogi Prasetyo et.al. | 2307.11988 | :mortar_board: | Code |
2023-07-21 | Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition | Isack Lee et.al. | 2307.11404 | :mortar_board: | Code |
2023-07-20 | Towards General Game Representations: Decomposing Games Pixels into Content and Style | Chintan Trivedi et.al. | 2307.11141 | :mortar_board: | None |
2023-07-20 | Comparison between transformers and convolutional models for fine-grained classification of insects | Rita Pucci et.al. | 2307.11112 | :mortar_board: | None |
2023-07-20 | GLSFormer: Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos | Nisarg A. Shah et.al. | 2307.11081 | :mortar_board: | Code |
2023-07-20 | Learned Thresholds Token Merging and Pruning for Vision Transformers | Maxim Bonnaerens et.al. | 2307.10780 | :mortar_board: | Code |
2023-07-20 | Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data | Sahar Almahfouz Nasser et.al. | 2307.10698 | :mortar_board: | Code |
2023-07-20 | Quantized Feature Distillation for Network Quantization | Ke Zhu et.al. | 2307.10638 | :mortar_board: | None |
2023-07-17 | Study of Vision Transformers for Covid-19 Detection from Chest X-rays | Sandeep Angara et.al. | 2307.09402 | :mortar_board: | None |
2023-07-18 | MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments | Spyros Gidaris et.al. | 2307.09361 | :mortar_board: | None |
2023-07-18 | RepViT: Revisiting Mobile CNN From ViT Perspective | Ao Wang et.al. | 2307.09283 | :mortar_board: | Code |
2023-07-18 | Light-Weight Vision Transformer with Parallel Local and Global Self-Attention | Nikolas Ebert et.al. | 2307.09120 | :mortar_board: | None |
2023-07-18 | NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF | Stefan Lionar et.al. | 2307.09112 | :mortar_board: | None |
2023-07-18 | R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut | Yingjie Niu et.al. | 2307.09050 | :mortar_board: | None |
2023-07-18 | Human Action Recognition in Still Images Using ConViT | Seyed Rohollah Hosseyni et.al. | 2307.08994 | :mortar_board: | None |
2023-07-17 | Scale-Aware Modulation Meet Transformer | Weifeng Lin et.al. | 2307.08579 | :mortar_board: | Code |
2023-07-17 | BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization | Chaoya Jiang et.al. | 2307.08504 | :mortar_board: | None |
2023-07-17 | Cumulative Spatial Knowledge Distillation for Vision Transformers | Borui Zhao et.al. | 2307.08500 | :mortar_board: | None |
2023-07-17 | ShiftNAS: Improving One-shot NAS via Probability Shift | Mingyang Zhang et.al. | 2307.08300 | :mortar_board: | Code |
2023-07-17 | Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting | Wentao Bao et.al. | 2307.08243 | :mortar_board: | None |
2023-07-16 | Domain Generalisation with Bidirectional Encoder Representations from Vision Transformers | Hamza Riaz et.al. | 2307.08117 | :mortar_board: | None |
2023-07-16 | Dense Multitask Learning to Reconfigure Comics | Deblina Bhattacharjee et.al. | 2307.08071 | :mortar_board: | None |
2023-07-16 | A Survey of Techniques for Optimizing Transformer Inference | Krishna Teja Chitty-Venkata et.al. | 2307.07982 | :mortar_board: | None |
2023-07-16 | S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality | Jinlong Li et.al. | 2307.07935 | :mortar_board: | None |
2023-07-14 | TALL: Thumbnail Layout for Deepfake Video Detection | Yuting Xu et.al. | 2307.07494 | :mortar_board: | None |
2023-07-14 | Multimodal Distillation for Egocentric Action Recognition | Gorjan Radevski et.al. | 2307.07483 | :mortar_board: | None |
2023-07-14 | BiGSeT: Binary Mask-Guided Separation Training for DNN-based Hyperspectral Anomaly Detection | Haijun Liu et.al. | 2307.07428 | :mortar_board: | None |
2023-07-14 | HEAL-SWIN: A Vision Transformer On The Sphere | Oscar Carlsson et.al. | 2307.07313 | :mortar_board: | Code |
2023-07-14 | MaxSR: Image Super-Resolution Using Improved MaxViT | Bincheng Yang et.al. | 2307.07240 | :mortar_board: | None |
2023-07-13 | Deepfake Video Detection Using Generative Convolutional Vision Transformer | Deressa Wodajo et.al. | 2307.07036 | :mortar_board: | Code |
2023-07-12 | Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | Mostafa Dehghani et.al. | 2307.06304 | :mortar_board: | None |
2023-07-12 | What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation | Gabriele Merlin et.al. | 2307.06006 | :mortar_board: | None |
2023-07-11 | PIGEON: Predicting Image Geolocations | Lukas Haas et.al. | 2307.05845 | :mortar_board: | None |
2023-07-11 | Image Reconstruction using Enhanced Vision Transformer | Nikhil Verma et.al. | 2307.05616 | :mortar_board: | None |
2023-07-10 | MiVOLO: Multi-input Transformer for Age and Gender Estimation | Maksim Kuprashevich et.al. | 2307.04616 | :mortar_board: | Code |
2023-07-10 | Source-Free Open-Set Domain Adaptation for Histopathological Images via Distilling Self-Supervised Vision Transformer | Guillaume Vray et.al. | 2307.04596 | :mortar_board: | None |
2023-07-10 | One-Shot Pruning for Fast-adapting Pre-trained Models on Devices | Haiyan Zhao et.al. | 2307.04365 | :mortar_board: | None |
2023-07-09 | Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers | Zhiyu Zhu et.al. | 2307.04129 | :mortar_board: | None |
2023-07-09 | Random Position Adversarial Patch for Vision Transformers | Mingzhen Shao et.al. | 2307.04066 | :mortar_board: | None |
2023-07-08 | Understanding the Efficacy of U-Net & Vision Transformer for Groundwater Numerical Modelling | Maria Luisa Taccari et.al. | 2307.04010 | :mortar_board: | None |
2023-07-07 | INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers | Lakshmi Nair et.al. | 2307.03712 | :mortar_board: | Code |
2023-07-07 | HoughLaneNet: Lane Detection with Deep Hough Transform and Dynamic Convolution | Jia-Qi Zhang et.al. | 2307.03494 | :mortar_board: | None |
2023-07-07 | Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation | Dahyun Kang et.al. | 2307.03407 | :mortar_board: | None |
2023-07-06 | Origin-Destination Travel Time Oracle for Map-based Services | Yan Lin et.al. | 2307.03048 | :mortar_board: | None |
2023-07-06 | Art Authentication with Vision Transformers | Ludovica Schaerf et.al. | 2307.03039 | :mortar_board: | None |
2023-07-05 | MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers | Jakob Drachmann Havtorn et.al. | 2307.02321 | :mortar_board: | None |
2023-07-05 | Interactive Image Segmentation with Cross-Modality Vision Transformers | Kun Li et.al. | 2307.02280 | :mortar_board: | Code |
2023-07-05 | MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition | Licai Sun et.al. | 2307.02227 | :mortar_board: | Code |
2023-07-05 | Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency | Md Abdul Kadir et.al. | 2307.02150 | :mortar_board: | None |
2023-07-05 | MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets | Siyi Du et.al. | 2307.02100 | :mortar_board: | Code |
2023-07-05 | Make A Long Image Short: Adaptive Token Length for Vision Transformers | Qiqi Zhou et.al. | 2307.02092 | :mortar_board: | None |
2023-07-04 | Deep Features for Contactless Fingerprint Presentation Attack Detection: Can They Be Generalized? | Hailin Li et.al. | 2307.01845 | :mortar_board: | None |
2023-07-04 | In-Domain Self-Supervised Learning Can Lead to Improvements in Remote Sensing Image Classification | Ivica Dimitrovski et.al. | 2307.01645 | :mortar_board: | None |
2023-07-03 | Streamlined Lensed Quasar Identification in Multiband Images via Ensemble Networks | Irham Taufik Andika et.al. | 2307.01090 | :mortar_board: | None |
2023-07-02 | X-MLP: A Patch Embedding-Free MLP Architecture for Vision | Xinyue Wang et.al. | 2307.00592 | :mortar_board: | None |
2023-07-01 | WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting | Pranav Jeevan et.al. | 2307.00407 | :mortar_board: | Code |
2023-07-01 | MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications | Mustafa Munir et.al. | 2307.00395 | :mortar_board: | Code |
2023-07-01 | Variation-aware Vision Transformer Quantization | Xijie Huang et.al. | 2307.00331 | :mortar_board: | Code |
2023-07-01 | More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data | Andrew Kean Gao et.al. | 2307.00213 | :mortar_board: | None |
2023-06-30 | Stitched ViTs are Flexible Vision Backbones | Zizheng Pan et.al. | 2307.00154 | :mortar_board: | Code |
2023-06-30 | Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing | Ariel N. Lee et.al. | 2306.17848 | :mortar_board: | None |
2023-06-30 | HVTSurv: Hierarchical Vision Transformer for Patient-Level Survival Prediction from Whole Slide Image | Zhuchen Shao et.al. | 2306.17373 | :mortar_board: | Code |
2023-06-29 | An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training | Zitian Chen et.al. | 2306.17165 | :mortar_board: | None |
2023-06-29 | Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation | Zhongwei Qiu et.al. | 2306.17074 | :mortar_board: | None |
2023-06-29 | Spatial Reasoning via Deep Vision Models for Robotic Sequential Manipulation | Hongyou Zhou et.al. | 2306.17053 | :mortar_board: | None |
2023-06-29 | BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models | Phuoc-Hoan Charles Le et.al. | 2306.16678 | :mortar_board: | Code |
2023-06-27 | CellViT: Vision Transformers for Precise Cell Segmentation and Classification | Fabian Hörst et.al. | 2306.15350 | :mortar_board: | Code |
2023-06-27 | Novel Hybrid-Learning Algorithms for Improved Millimeter-Wave Imaging Systems | Josiah Smith et.al. | 2306.15341 | :mortar_board: | None |
2023-06-27 | Towards predicting Pedestrian Evacuation Time and Density from Floorplans using a Vision Transformer | Patrick Berggold et.al. | 2306.15318 | :mortar_board: | None |
2023-06-26 | FeSViBS: Federated Split Learning of Vision Transformer with Block Sampling | Faris Almalik et.al. | 2306.14638 | :mortar_board: | Code |
2023-06-25 | Adaptive Window Pruning for Efficient Local Motion Deblurring | Haoying Li et.al. | 2306.14268 | :mortar_board: | None |
2023-06-23 | Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window | Jinkyu Koo et.al. | 2306.13776 | :mortar_board: | None |
2023-06-23 | ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration | Jiaqi Ma et.al. | 2306.13653 | :mortar_board: | Code |
2023-06-22 | Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing | Yelysei Bondarenko et.al. | 2306.12929 | :mortar_board: | None |
2023-06-21 | Inter-Instance Similarity Modeling for Contrastive Learning | Chengchao Shen et.al. | 2306.12243 | :mortar_board: | Code |
2023-06-21 | ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining | Dezhi Peng et.al. | 2306.12106 | :mortar_board: | Code |
2023-06-19 | RaViTT: Random Vision Transformer Tokens | Felipe A. Quezada et.al. | 2306.10959 | :mortar_board: | None |
2023-06-19 | TeleViT: Teleconnection-driven Transformers Improve Subseasonal to Seasonal Wildfire Forecasting | Ioannis Prapas et.al. | 2306.10940 | :mortar_board: | Code |
2023-06-19 | B-cos Alignment for Inherently Interpretable CNNs and Vision Transformers | Moritz Böhle et.al. | 2306.10898 | :mortar_board: | None |
2023-06-19 | Vision Transformer with Attention Map Hallucination and FFN Compaction | Haiyang Xu et.al. | 2306.10875 | :mortar_board: | None |
2023-06-16 | Group Orthogonalization Regularization For Vision Models Adaptation and Robustness | Yoav Kurtz et.al. | 2306.10001 | :mortar_board: | Code |
2023-06-16 | LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning | Jifan Zhang et.al. | 2306.09910 | :mortar_board: | Code |
2023-06-15 | Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers | Dominick Reilly et.al. | 2306.09331 | :mortar_board: | Code |
2023-06-15 | Neural Fine-Tuning Search for Few-Shot Learning | Panagiotis Eustratiadis et.al. | 2306.09295 | :mortar_board: | Code |
2023-06-15 | ViP: A Differentially Private Foundation Model for Computer Vision | Yaodong Yu et.al. | 2306.08842 | :mortar_board: | None |
2023-06-14 | Hippocampus Substructure Segmentation Using Morphological Vision Transformer Learning | Yang Lei et.al. | 2306.08723 | :mortar_board: | None |
2023-06-13 | Rethinking Polyp Segmentation from an Out-of-Distribution Perspective | Ge-Peng Ji et.al. | 2306.07792 | :mortar_board: | None |
2023-06-13 | Reviving Shift Equivariance in Vision Transformers | Peijian Ding et.al. | 2306.07470 | :mortar_board: | None |
2023-06-12 | Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training | Lorenzo Baraldi et.al. | 2306.07346 | :mortar_board: | Code |
2023-06-12 | Revisiting Token Pruning for Object Detection and Instance Segmentation | Yifei Liu et.al. | 2306.07050 | :mortar_board: | None |
2023-06-12 | Enhancing COVID-19 Diagnosis through Vision Transformer-Based Analysis of Chest X-ray Images | Sultan Zavrak et.al. | 2306.06914 | :mortar_board: | None |
2023-06-12 | Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection | Sayantan Das et.al. | 2306.06881 | :mortar_board: | None |
2023-06-11 | -Equivariant Vision Transformer | Renjun Xu et.al. | 2306.06722 | :mortar_board: | Code |
2023-06-11 | 2-D SSM: A General Spatial Layer for Visual Transformers | Ethan Baron et.al. | 2306.06635 | :mortar_board: | Code |
2023-06-10 | Vista-Morph: Unsupervised Image Registration of Visible-Thermal Facial Pairs | Catherine Ordun et.al. | 2306.06505 | :mortar_board: | None |
2023-06-10 | ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer | Haoran You et.al. | 2306.06446 | :mortar_board: | None |
2023-06-09 | SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers | Bowen Zhang et.al. | 2306.06289 | :mortar_board: | Code |
2023-06-09 | FLSL: Feature-level Self-supervised Learning | Qing Su et.al. | 2306.06203 | :mortar_board: | None |
2023-06-09 | FasterViT: Fast Vision Transformers with Hierarchical Attention | Ali Hatamizadeh et.al. | 2306.06189 | :mortar_board: | Code |
2023-06-09 | Customizing General-Purpose Foundation Models for Medical Report Generation | Bang Yang et.al. | 2306.05642 | :mortar_board: | None |
2023-06-08 | Is Attentional Channel Processing Design Required? Comprehensive Analysis Of Robustness Between Vision Transformers And Fully Attentional Networks | Abhishri Ajit Medewar et.al. | 2306.05495 | :mortar_board: | None |
2023-06-08 | Connectional-Style-Guided Contextual Representation Learning for Brain Disease Diagnosis | Gongshu Wang et.al. | 2306.05297 | :mortar_board: | None |
2023-06-08 | Improving Visual Prompt Tuning for Self-supervised Vision Transformers | Seungryong Yoo et.al. | 2306.05067 | :mortar_board: | Code |
2023-06-08 | Neighborhood Attention Makes the Encoder of ResUNet Stronger for Accurate Road Extraction | Ali Jamali et.al. | 2306.04947 | :mortar_board: | None |
2023-06-08 | Muti-Scale And Token Mergence: Make Your ViT More Efficient | Zhe Bian et.al. | 2306.04897 | :mortar_board: | None |
2023-06-07 | Optimizing ViViT Training: Time and Memory Reduction for Action Recognition | Shreyank N Gowda et.al. | 2306.04822 | :mortar_board: | None |
2023-06-07 | Revising deep learning methods in parking lot occupancy detection | Anastasia Martynova et.al. | 2306.04288 | :mortar_board: | Code |
2023-06-07 | Normalization Layers Are All That Sharpness-Aware Minimization Needs | Maximilian Mueller et.al. | 2306.04226 | :mortar_board: | Code |
2023-06-07 | Efficient Vision Transformer for Human Pose Estimation via Patch Selection | Kaleab A. Kinfu et.al. | 2306.04225 | :mortar_board: | None |
2023-06-07 | TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for Medical Image Segmentation | Tao Lei et.al. | 2306.04086 | :mortar_board: | Code |
2023-06-06 | Human-imperceptible, Machine-recognizable Images | Fusheng Hao et.al. | 2306.03679 | :mortar_board: | Code |
2023-06-06 | LegoNet: Alternating Model Blocks for Medical Image Segmentation | Ikboljon Sobirov et.al. | 2306.03494 | :mortar_board: | None |
2023-06-06 | Efficient Anomaly Detection with Budget Annotation Using Semi-Supervised Residual Transformer | Hanxi Li et.al. | 2306.03492 | :mortar_board: | None |
2023-06-06 | Clinical-Inspired Cytological Whole Slide Image Screening with Just Slide-Level Labels | Beidi Zhao et.al. | 2306.03407 | :mortar_board: | None |
2023-06-06 | CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation | Tao Lei et.al. | 2306.03373 | :mortar_board: | Code |
2023-06-05 | A Vessel-Segmentation-Based CycleGAN for Unpaired Multi-modal Retinal Image Synthesis | Aline Sindel et.al. | 2306.02901 | :mortar_board: | None |
2023-06-05 | Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance | Jinwoo Kim et.al. | 2306.02866 | :mortar_board: | Code |
2023-06-05 | On the Role of ViT and CNN in Semantic Communications: Analysis and Prototype Validation | Hanju Yoo et.al. | 2306.02759 | :mortar_board: | None |
2023-06-03 | TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain | Sagar Chakraborty et.al. | 2306.02142 | :mortar_board: | Code |
2023-06-03 | Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers | Chenyang Lu et.al. | 2306.02095 | :mortar_board: | Code |
2023-06-03 | Memorization Capacity of Multi-Head Attention in Transformers | Sadegh Mahdavi et.al. | 2306.02010 | :mortar_board: | Code |
2023-06-02 | Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work | Qiangchang Wang et.al. | 2306.01929 | :mortar_board: | None |
2023-06-02 | A Novel Vision Transformer with Residual in Self-attention for Biomedical Image Classification | Arun K. Sharma et.al. | 2306.01594 | :mortar_board: | None |
2023-06-02 | NNMobile-Net: Rethinking CNN Design for Deep Learning-Based Retinopathy Research | Wenhui Zhu et.al. | 2306.01289 | :mortar_board: | Code |
2023-06-01 | Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles | Chaitanya Ryali et.al. | 2306.00989 | :mortar_board: | Code |
2023-06-01 | DeepFake-Adapter: Dual-Level Adapter for DeepFake Detection | Rui Shao et.al. | 2306.00863 | :mortar_board: | Code |
2023-06-01 | Auto-Spikformer: Spikformer Architecture Search | Kaiwei Che et.al. | 2306.00807 | :mortar_board: | None |
2023-06-01 | DAM-Net: Global Flood Detection from SAR Imagery Using Differential Attention Metric-Based Vision Transformers | Tamer Saleh et.al. | 2306.00704 | :mortar_board: | Code |
2023-06-01 | Lightweight Vision Transformer with Bidirectional Interaction | Qihang Fan et.al. | 2306.00396 | :mortar_board: | Code |
2023-06-01 | Affinity-based Attention in Self-supervised Transformers Predicts Dynamics of Object Grouping in Humans | Hossein Adeli et.al. | 2306.00294 | :mortar_board: | Code |
2023-05-31 | Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects | Stefan Thalhammer et.al. | 2306.00129 | :mortar_board: | Code |
2023-05-31 | Diagnosis and Prognosis of Head and Neck Cancer Patients using Artificial Intelligence | Ikboljon Sobirov et.al. | 2306.00034 | :mortar_board: | None |
2023-05-31 | LOWA: Localize Objects in the Wild with Attributes | Xiaoyuan Guo et.al. | 2305.20047 | :mortar_board: | None |
2023-05-31 | DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator | Hanqing Zhu et.al. | 2305.19533 | :mortar_board: | None |
2023-05-31 | CVSNet: A Computer Implementation for Central Visual System of The Brain | Ruimin Gao et.al. | 2305.19492 | :mortar_board: | None |
2023-05-30 | Are Large Kernels Better Teachers than Transformers for ConvNets? | Tianjin Huang et.al. | 2305.19412 | :mortar_board: | Code |
2023-05-30 | Contextual Vision Transformers for Robust Representation Learning | Yujia Bao et.al. | 2305.19402 | :mortar_board: | None |
2023-05-30 | Vision Transformers for Mobile Applications: A Short Survey | Nahid Alam et.al. | 2305.19365 | :mortar_board: | None |
2023-05-30 | Prompt-based Tuning of Transformer Models for Multi-Center Medical Image Segmentation | Numan Saeed et.al. | 2305.18948 | :mortar_board: | None |
2023-05-30 | Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts | Rishov Sarkar et.al. | 2305.18691 | :mortar_board: | Code |
2023-05-29 | Solar Irradiance Anticipative Transformer | Thomas M. Mercier et.al. | 2305.18487 | :mortar_board: | Code |
2023-05-29 | DiffRate : Differentiable Compression Rate for Efficient Vision Transformers | Mengzhao Chen et.al. | 2305.17997 | :mortar_board: | Code |
2023-05-29 | Streaming Audio Transformers for Online Audio Tagging | Heinrich Dinkel et.al. | 2305.17834 | :mortar_board: | Code |
2023-05-28 | LowDINO – A Low Parameter Self Supervised Learning Model | Sai Krishna Prathapaneni et.al. | 2305.17791 | :mortar_board: | Code |
2023-05-27 | Vision Transformers for Small Histological Datasets Learned through Knowledge Distillation | Neel Kanwal et.al. | 2305.17370 | :mortar_board: | None |
2023-05-27 | Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers | Hongjie Wang et.al. | 2305.17328 | :mortar_board: | None |
2023-05-26 | COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models | Jinqi Xiao et.al. | 2305.17235 | :mortar_board: | Code |
2023-05-26 | Do We Really Need a Large Number of Visual Prompts? | Youngeun Kim et.al. | 2305.17223 | :mortar_board: | None |
2023-05-25 | Making Vision Transformers Truly Shift-Equivariant | Renan A. Rojas-Gomez et.al. | 2305.16316 | :mortar_board: | None |
2023-05-25 | Sharpness-Aware Minimization Leads to Low-Rank Features | Maksym Andriushchenko et.al. | 2305.16292 | :mortar_board: | Code |
2023-05-25 | Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification | Saisai Ding et.al. | 2305.15773 | :mortar_board: | None |
2023-05-24 | ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers | Jingfeng Yao et.al. | 2305.15272 | :mortar_board: | Code |
2023-05-24 | ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents | Christoph Auer et.al. | 2305.14962 | :mortar_board: | None |
2023-05-24 | Predicting Token Impact Towards Efficient Vision Transformer | Hong Wang et.al. | 2305.14840 | :mortar_board: | None |
2023-05-24 | Dual Path Transformer with Partition Attention | Zhengkai Jiang et.al. | 2305.14768 | :mortar_board: | None |
2023-05-24 | BinaryViT: Towards Efficient and Accurate Binary Vision Transformers | Junrui Xiao et.al. | 2305.14730 | :mortar_board: | None |
2023-05-24 | Quantifying Character Similarity with Vision Transformers | Xinmei Yang et.al. | 2305.14672 | :mortar_board: | Code |
2023-05-24 | Reinforcement Learning finetuned Vision-Code Transformer for UI-to-Code Generation | Davit Soselia et.al. | 2305.14637 | :mortar_board: | None |
2023-05-23 | Source-Free Domain Adaptation for RGB-D Semantic Segmentation with Vision Transformers | Giulia Rizzoli et.al. | 2305.14269 | :mortar_board: | None |
2023-05-22 | Efficient Large-Scale Vision Representation Learning | Eden Dolev et.al. | 2305.13399 | :mortar_board: | None |
2023-05-22 | U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech | Xin Jing et.al. | 2305.13195 | :mortar_board: | None |
2023-05-22 | DeepJSCC-l++: Robust and Bandwidth-Adaptive Wireless Image Transmission | Chenghong Bian et.al. | 2305.13161 | :mortar_board: | None |
2023-05-22 | Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design | Ibrahim Alabdulmohsin et.al. | 2305.13035 | :mortar_board: | None |
2023-05-22 | HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation | Jian Ding et.al. | 2305.13031 | :mortar_board: | None |
2023-05-22 | Why current rain denoising models fail on CycleGAN created rain images in autonomous driving | Michael Kranl et.al. | 2305.12983 | :mortar_board: | None |
2023-05-22 | VanillaNet: the Power of Minimalism in Deep Learning | Hanting Chen et.al. | 2305.12972 | :mortar_board: | Code |
2023-05-22 | TSPTQ-ViT: Two-scaled post-training quantization for vision transformer | Yu-Shan Tai et.al. | 2305.12901 | :mortar_board: | None |
2023-05-22 | Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition | Nan Li et.al. | 2305.12796 | :mortar_board: | None |
2023-05-21 | Your smartphone could act as a pulse-oximeter and as a single-lead ECG | Ahsan Mehmood et.al. | 2305.12583 | :mortar_board: | None |
2023-05-21 | Bi-ViT: Pushing the Limit of Vision Transformer Quantization | Yanjing Li et.al. | 2305.12354 | :mortar_board: | None |
2023-05-19 | Multimodal Web Navigation with Instruction-Finetuned Foundation Models | Hiroki Furuta et.al. | 2305.11854 | :mortar_board: | None |
2023-05-19 | Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery | Long Bai et.al. | 2305.11692 | :mortar_board: | Code |
2023-05-19 | SurgMAE: Masked Autoencoders for Long Surgical Video Analysis | Muhammad Abdullah Jamal et.al. | 2305.11451 | :mortar_board: | None |
2023-05-18 | How Deep Learning Sees the World: A Survey on Adversarial Attacks & Defenses | Joana C. Costa et.al. | 2305.10862 | :mortar_board: | None |
2023-05-18 | Boost Vision Transformer with GPU-Friendly Sparsity and Quantization | Chong Yu et.al. | 2305.10727 | :mortar_board: | None |
2023-05-17 | CageViT: Convolutional Activation Guided Efficient Vision Transformer | Hao Zheng et.al. | 2305.09924 | :mortar_board: | None |
2023-05-17 | A survey of the Vision Transformers and its CNN-Transformer based Variants | Asifullah Khan et.al. | 2305.09880 | :mortar_board: | None |
2023-05-16 | Blind Image Quality Assessment via Transformer Predicted Error Map and Perceptual Quality Token | Jinsong Shi et.al. | 2305.09353 | :mortar_board: | Code |
2023-05-16 | CB-HVTNet: A channel-boosted hybrid vision transformer network for lymphocyte assessment in histopathological images | Momina Liaqat Ali et.al. | 2305.09211 | :mortar_board: | None |
2023-05-16 | Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data | Keyu Li et.al. | 2305.09169 | :mortar_board: | None |
2023-05-13 | MDAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer | Yunsheng Ma et.al. | 2305.08877 | :mortar_board: | Code |
2023-05-15 | AutoRecon: Automated 3D Object Discovery and Reconstruction | Yuang Wang et.al. | 2305.08810 | :mortar_board: | None |
2023-05-15 | Enhancing Performance of Vision Transformers on Small Datasets through Local Inductive Bias Incorporation | Ibrahim Batuhan Akkaya et.al. | 2305.08551 | :mortar_board: | None |
2023-05-15 | MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation | Abdul Rehman et.al. | 2305.08396 | :mortar_board: | None |
2023-05-14 | On enhancing the robustness of Vision Transformers: Defensive Diffusion | Raza Imam et.al. | 2305.08031 | :mortar_board: | None |
2023-05-13 | GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples | Tian Gao et.al. | 2305.07931 | :mortar_board: | Code |
2023-05-13 | Meta-Polyp: a baseline for efficient Polyp segmentation | Quoc-Huy Trinh et.al. | 2305.07848 | :mortar_board: | Code |
2023-05-12 | ViT Unified: Joint Fingerprint Recognition and Presentation Attack Detection | Steven A. Grosz et.al. | 2305.07602 | :mortar_board: | None |
2023-05-11 | OneCAD: One Classifier for All image Datasets using multimodal learning | Shakti N. Wadekar et.al. | 2305.07167 | :mortar_board: | None |
2023-05-11 | Salient Mask-Guided Vision Transformer for Fine-Grained Classification | Dmitry Demidov et.al. | 2305.07102 | :mortar_board: | Code |
2023-05-11 | EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention | Xinyu Liu et.al. | 2305.07027 | :mortar_board: | Code |
2023-05-11 | Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers | Dahun Kim et.al. | 2305.07011 | :mortar_board: | None |
2023-05-11 | Extending Audio Masked Autoencoders Toward Audio Restoration | Zhi Zhong et.al. | 2305.06701 | :mortar_board: | None |
2023-05-11 | Undercover Deepfakes: Detecting Fake Segments in Videos | Sanjay Saha et.al. | 2305.06564 | :mortar_board: | Code |
2023-05-11 | Patch-wise Mixed-Precision Quantization of Vision Transformer | Junrui Xiao et.al. | 2305.06559 | :mortar_board: | None |
2023-05-08 | Joint Moment Retrieval and Highlight Detection Via Natural Language Queries | Richard Luo et.al. | 2305.04961 | :mortar_board: | Code |
2023-05-08 | BiRT: Bio-inspired Replay in Vision Transformers for Continual Learning | Kishaan Jeeveswaran et.al. | 2305.04769 | :mortar_board: | Code |
2023-05-08 | Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields | Bum Jun Kim et.al. | 2305.04722 | :mortar_board: | None |
2023-05-08 | Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting | Zhicheng Wang et.al. | 2305.04440 | :mortar_board: | None |
2023-05-05 | FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing | Ajian Liu et.al. | 2305.03277 | :mortar_board: | None |
2023-05-05 | Semantic Segmentation using Vision Transformers: A survey | Hans Thisanke et.al. | 2305.03273 | :mortar_board: | None |
2023-05-04 | AttentionViz: A Global View of Transformer Attention | Catherine Yeh et.al. | 2305.03210 | :mortar_board: | None |
2023-05-03 | Real-Time Radiance Fields for Single-Image Portrait View Synthesis | Alex Trevithick et.al. | 2305.02310 | :mortar_board: | None |
2023-05-03 | Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models | Qiufeng Wang et.al. | 2305.02279 | :mortar_board: | None |
2023-05-03 | A Vision Transformer Approach for Efficient Near-Field Irregular SAR Super-Resolution | Josiah Smith et.al. | 2305.02074 | :mortar_board: | None |
2023-05-03 | “Glitch in the Matrix!”: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization | Zhixi Cai et.al. | 2305.01979 | :mortar_board: | Code |
2023-05-02 | High-Resolution Synthetic RGB-D Datasets for Monocular Depth Estimation | Aakash Rajpal et.al. | 2305.01732 | :mortar_board: | None |
2023-05-02 | ARBEx: Attentive Feature Extraction with Reliability Balancing for Robust Facial Expression Learning | Azmine Toushik Wasi et.al. | 2305.01486 | :mortar_board: | Code |
2023-05-02 | AxWin Transformer: A Context-Aware Vision Transformer Backbone with Axial Windows | Fangjian Lin et.al. | 2305.01280 | :mortar_board: | None |
2023-05-02 | Exploring vision transformer layer choosing for semantic segmentation | Fangjian Lin et.al. | 2305.01279 | :mortar_board: | None |
2023-05-01 | What Do Self-Supervised Vision Transformers Learn? | Namuk Park et.al. | 2305.00729 | :mortar_board: | Code |
2023-05-01 | Rethinking Boundary Detection in Deep Learning Models for Medical Image Segmentation | Yi Lin et.al. | 2305.00678 | :mortar_board: | Code |
2023-04-30 | Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation | Tianxiang Hao et.al. | 2305.00603 | :mortar_board: | None |
2023-04-28 | MMViT: Multiscale Multiview Vision Transformers | Yuchen Liu et.al. | 2305.00104 | :mortar_board: | None |
2023-04-28 | An automated end-to-end deep learning-based framework for lung cancer diagnosis by detecting and classifying the lung nodules | Samiul Based Shuvo et.al. | 2305.00046 | :mortar_board: | None |
2023-04-28 | Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers | Johannes Czech et.al. | 2304.14918 | :mortar_board: | None |
2023-04-28 | PreNAS: Preferred One-Shot Learning Towards Efficient Neural Architecture Search | Haibin Wang et.al. | 2304.14636 | :mortar_board: | None |
2023-04-28 | DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation | Yousef Yeganeh et.al. | 2304.14571 | :mortar_board: | None |
2023-04-27 | Vision Conformer: Incorporating Convolutions into Vision Transformer Layers | Brian Kenji Iwana et.al. | 2304.13991 | :mortar_board: | Code |
2023-04-26 | UniNeXt: Exploring A Unified Architecture for Vision Recognition | Fangjian Lin et.al. | 2304.13700 | :mortar_board: | None |
2023-04-25 | Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations | Shashank Shekhar et.al. | 2304.13089 | :mortar_board: | None |
2023-04-25 | CompletionFormer: Depth Completion with Convolutions and Vision Transformers | Zhang Youmin et.al. | 2304.13030 | :mortar_board: | Code |
2023-04-25 | Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning | Zhongzhi Yu et.al. | 2304.12520 | :mortar_board: | None |
2023-04-24 | Rank Flow Embedding for Unsupervised and Semi-Supervised Manifold Learning | Lucas Pascotti Valem et.al. | 2304.12448 | :mortar_board: | Code |
2023-04-24 | Augmentation-based Domain Generalization for Semantic Segmentation | Manuel Schwonberg et.al. | 2304.12122 | :mortar_board: | None |
2023-04-24 | MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer | Qihao Zhao et.al. | 2304.12043 | :mortar_board: | Code |
2023-04-24 | Transformer-based stereo-aware 3D object detection from binocular images | Hanqing Sun et.al. | 2304.11906 | :mortar_board: | None |
2023-04-24 | Universal Domain Adaptation via Compressive Attention Matching | Didi Zhu et.al. | 2304.11862 | :mortar_board: | None |
2023-04-23 | Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification | Smriti Regmi et.al. | 2304.11529 | :mortar_board: | None |
2023-04-22 | Vision Transformers, a new approach for high-resolution and large-scale mapping of canopy heights | Ibrahim Fayad et.al. | 2304.11487 | :mortar_board: | None |
2023-04-22 | Self-supervised Learning by View Synthesis | Shaoteng Liu et.al. | 2304.11330 | :mortar_board: | None |
2023-04-21 | Deep-Learning-based Fast and Accurate 3D CT Deformable Image Registration in Lung Cancer | Yuzhen Ding et.al. | 2304.11135 | :mortar_board: | None |
2023-04-21 | DeformableFormer: Classification of Endoscopic Ultrasound Guided Fine Needle Biopsy in Pancreatic Diseases | Taiji Kurami et.al. | 2304.10791 | :mortar_board: | None |
2023-04-21 | Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers | Siyuan Wei et.al. | 2304.10716 | :mortar_board: | Code |
2023-04-20 | HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with vision transformer | Hao Xiang et.al. | 2304.10628 | :mortar_board: | None |
2023-04-20 | Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget | Johannes Lehner et.al. | 2304.10520 | :mortar_board: | Code |
2023-04-19 | LipsFormer: Introducing Lipschitz Continuity to Vision Transformers | Xianbiao Qi et.al. | 2304.09856 | :mortar_board: | Code |
2023-04-19 | Transformer-Based Visual Segmentation: A Survey | Xiangtai Li et.al. | 2304.09854 | :mortar_board: | Code |
2023-04-19 | CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding | Dilxat Muhtar et.al. | 2304.09670 | :mortar_board: | Code |
2023-04-19 | Boosting Semantic Segmentation with Semantic Boundaries | Haruya Ishikawa et.al. | 2304.09427 | :mortar_board: | Code |
2023-04-18 | Fibroglandular Tissue Segmentation in Breast MRI using Vision Transformers – A multi-institutional evaluation | Gustav Müller-Franzes et.al. | 2304.08972 | :mortar_board: | Code |
2023-04-18 | AutoTaskFormer: Searching Vision Transformers for Multi-task Learning | Yang Liu et.al. | 2304.08756 | :mortar_board: | None |
2023-04-17 | Synthetic Data from Diffusion Models Improves ImageNet Classification | Shekoofeh Azizi et.al. | 2304.08466 | :mortar_board: | None |
2023-04-17 | Efficient Video Action Detection with Token Dropout and Context Refinement | Lei Chen et.al. | 2304.08451 | :mortar_board: | None |
2023-04-17 | Transformer with Selective Shuffled Position Embedding using ROI-Exchange Strategy for Early Detection of Knee Osteoarthritis | Zhe Wang et.al. | 2304.08364 | :mortar_board: | None |
2023-04-17 | The Universe is worth pixels: Convolution Neural Network and Vision Transformers for Cosmology | Se Yeon Hwang et.al. | 2304.08192 | :mortar_board: | None |
2023-04-17 | ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection | Jeeseung Park et.al. | 2304.08114 | :mortar_board: | None |
2023-04-16 | A Data-Centric Solution to NonHomogeneous Dehazing via Vision Transformer | Yangyi Liu et.al. | 2304.07874 | :mortar_board: | Code |
2023-04-15 | MA-ViT: Modality-Agnostic Vision Transformers for Face Anti-Spoofing | Ajian Liu et.al. | 2304.07549 | :mortar_board: | None |
2023-04-14 | Uncovering the Inner Workings of STEGO for Safe Unsupervised Semantic Segmentation | Alexander Koenig et.al. | 2304.07314 | :mortar_board: | None |
2023-04-14 | CAD-RADS scoring of coronary CT angiography with Multi-Axis Vision Transformer: a clinically-inspired deep learning pipeline | Alessia Gerbasi et.al. | 2304.07277 | :mortar_board: | None |
2023-04-14 | Sub-meter resolution canopy height maps using self-supervised learning and a vision transformer trained on Aerial and GEDI Lidar | Jamie Tolan et.al. | 2304.07213 | :mortar_board: | None |
2023-04-14 | Preserving Locality in Vision Transformers for Class Incremental Learning | Bowen Zheng et.al. | 2304.06971 | :mortar_board: | None |
2023-04-13 | SpectFormer: Frequency and Attention is what you need in a Vision Transformer | Badri N. Patro et.al. | 2304.06446 | :mortar_board: | None |
2023-04-13 | VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking | Angelos Nalmpantis et.al. | 2304.06391 | :mortar_board: | Code |
2023-04-13 | Converting ECG Signals to Images for Efficient Image-text Retrieval via Encoding | Jielin Qiu et.al. | 2304.06286 | :mortar_board: | None |
2023-04-13 | RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows | Zhemin Zhang et.al. | 2304.06250 | :mortar_board: | None |
2023-04-12 | Towards Evaluating Explanations of Vision Transformers for Medical Imaging | Piotr Komorowski et.al. | 2304.06133 | :mortar_board: | Code |
2023-04-12 | RECLIP: Resource-efficient CLIP by Training with Small Images | Runze Li et.al. | 2304.06028 | :mortar_board: | None |
2023-04-12 | Rail Detection: An Efficient Row-based Network and A New Benchmark | Xinpeng Li et.al. | 2304.05667 | :mortar_board: | Code |
2023-04-12 | RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer | Jiahao Wang et.al. | 2304.05659 | :mortar_board: | None |
2023-04-12 | CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks | Yi Li et.al. | 2304.05653 | :mortar_board: | Code |
2023-04-11 | MC-ViViT: Multi-branch Classifier-ViViT to Detect Mild Cognitive Impairment in Older Adults using Facial Videos | Jian Sun et.al. | 2304.05292 | :mortar_board: | None |
2023-04-11 | A Billion-scale Foundation Model for Remote Sensing Images | Keumgang Cha et.al. | 2304.05215 | :mortar_board: | None |
2023-04-11 | Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture | Jun Wang et.al. | 2304.05212 | :mortar_board: | None |
2023-04-11 | WEAR: A Multimodal Dataset for Wearable and Egocentric Video Activity Recognition | Marius Bock et.al. | 2304.05088 | :mortar_board: | None |
2023-04-11 | Life Regression based Patch Slimming for Vision Transformers | Jiawei Chen et.al. | 2304.04926 | :mortar_board: | None |
2023-04-10 | ViT-Calibrator: Decision Stream Calibration for Vision Transformer | Lin Chen et.al. | 2304.04354 | :mortar_board: | None |
2023-04-09 | ForamViT-GAN: Exploring New Paradigms in Deep Learning for Micropaleontological Image Analysis | Ivan Ferreira-Chacua et.al. | 2304.04291 | :mortar_board: | None |
2023-04-09 | Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention | Xuran Pan et.al. | 2304.04237 | :mortar_board: | Code |
2023-04-07 | A Cross-Scale Hierarchical Transformer with Correspondence-Augmented Attention for inferring Bird’s-Eye-View Semantic Segmentation | Naiyu Fang et.al. | 2304.03650 | :mortar_board: | None |
2023-04-07 | PSLT: A Light-weight Vision Transformer with Ladder Self-Attention and Progressive Shift | Gaojie Wu et.al. | 2304.03481 | :mortar_board: | None |
2023-04-06 | Former: Unified etrieval and eranking Transformer for Place Recognition | Sijie Zhu et.al. | 2304.03410 | :mortar_board: | None |
2023-04-06 | From Saliency to DINO: Saliency-guided Vision Transformer for Few-shot Keypoint Detection | Changsheng Lu et.al. | 2304.03140 | :mortar_board: | None |
2023-04-06 | InterFormer: Real-time Interactive Image Segmentation | You Huang et.al. | 2304.02942 | :mortar_board: | Code |
2023-04-06 | Towards an Effective and Efficient Transformer for Rain-by-snow Weather Removal | Tao Gao et.al. | 2304.02860 | :mortar_board: | Code |
2023-04-06 | MULLER: Multilayer Laplacian Resizer for Vision | Zhengzhong Tu et.al. | 2304.02859 | :mortar_board: | None |
2023-04-05 | Training Strategies for Vision Transformers for Object Detection | Apoorv Singh et.al. | 2304.02186 | :mortar_board: | None |
2023-04-04 | Strong Baselines for Parameter Efficient Few-Shot Fine-tuning | Samyadeep Basu et.al. | 2304.01917 | :mortar_board: | None |
2023-04-04 | EPVT: Environment-aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition | Siyuan Yan et.al. | 2304.01508 | :mortar_board: | None |
2023-04-04 | Attention Map Guided Transformer Pruning for Edge Device | Junzhu Mao et.al. | 2304.01452 | :mortar_board: | Code |
2023-04-03 | WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation | Lianghui Zhu et.al. | 2304.01184 | :mortar_board: | Code |
2023-04-03 | ViT-DAE: Transformer-driven Diffusion Autoencoder for Histopathology Image Analysis | Xuan Xu et.al. | 2304.01053 | :mortar_board: | None |
2023-04-01 | Vision Transformers with Mixed-Resolution Tokenization | Tomer Ronen et.al. | 2304.00287 | :mortar_board: | Code |
2023-03-31 | Hierarchical Vision Transformers for Cardiac Ejection Fraction Estimation | Lhuqita Fazry et.al. | 2304.00177 | :mortar_board: | Code |
2023-03-31 | Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? | Arjun Majumdar et.al. | 2303.18240 | :mortar_board: | None |
2023-03-31 | LaCViT: A Label-aware Contrastive Training Framework for Vision Transformers | Zijun Long et.al. | 2303.18013 | :mortar_board: | None |
2023-03-31 | Exploring the Limits of Deep Image Clustering using Pretrained Models | Nikolas Adaloglou et.al. | 2303.17896 | :mortar_board: | None |
2023-03-31 | Visual Anomaly Detection via Dual-Attention Transformer and Discriminative Flow | Haiming Yao et.al. | 2303.17882 | :mortar_board: | None |
2023-03-31 | Rethinking Local Perception in Lightweight Vision Transformer | Qihang Fan et.al. | 2303.17803 | :mortar_board: | None |
2023-03-30 | If At First You Don’t Succeed: Test Time Re-ranking for Zero-shot, Cross-domain Retrieval | Finlay G. C. Hudson et.al. | 2303.17703 | :mortar_board: | None |
2023-03-30 | Whether and When does Endoscopy Domain Pretraining Make Sense? | Dominik Batić et.al. | 2303.17636 | :mortar_board: | None |
2023-03-30 | SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer | Xuanyao Chen et.al. | 2303.17605 | :mortar_board: | None |
2023-03-30 | MobileInst: Video Instance Segmentation on the Mobile | Renhong Zhang et.al. | 2303.17594 | :mortar_board: | None |
2023-03-30 | Streaming Video Model | Yucheng Zhao et.al. | 2303.17228 | :mortar_board: | Code |
2023-03-30 | ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing | Xiaodan Li et.al. | 2303.17096 | :mortar_board: | Code |
2023-03-29 | Visually Wired NFTs: Exploring the Role of Inspiration in Non-Fungible Tokens | Lucio La Cava et.al. | 2303.17031 | :mortar_board: | None |
2023-03-29 | T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals | James Giroux et.al. | 2303.16940 | :mortar_board: | None |
2023-03-29 | Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation | Md Mostafijur Rahman et.al. | 2303.16892 | :mortar_board: | Code |
2023-03-29 | Self-accumulative Vision Transformer for Bone Age Assessment Using the Sauvegrain Method | Hong-Jun Choi et.al. | 2303.16557 | :mortar_board: | None |
2023-03-28 | ASIC: Aligning Sparse in-the-wild Image Collections | Kamal Gupta et.al. | 2303.16201 | :mortar_board: | None |
2023-03-28 | Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization | Jianping Zhang et.al. | 2303.15754 | :mortar_board: | None |
2023-03-28 | TFS-ViT: Token-Level Feature Stylization for Domain Generalization | Mehrdad Noori et.al. | 2303.15698 | :mortar_board: | Code |
2023-03-27 | Learning Expressive Prompting With Residuals for Vision Transformers | Rajshekhar Das et.al. | 2303.15591 | :mortar_board: | None |
2023-03-27 | Core-Periphery Principle Guided Redesign of Self-Attention in Transformers | Xiaowei Yu et.al. | 2303.15569 | :mortar_board: | None |
2023-03-27 | MoViT: Memorizing Vision Transformers for Medical Image Analysis | Yiqing Shen et.al. | 2303.15553 | :mortar_board: | None |
2023-03-24 | Image Deblurring by Exploring In-depth Properties of Transformer | Pengwei Liang et.al. | 2303.15198 | :mortar_board: | None |
2023-03-27 | Vision Transformer with Quadrangle Attention | Qiming Zhang et.al. | 2303.15105 | :mortar_board: | Code |
2023-03-27 | Leveraging Hidden Positives for Unsupervised Semantic Segmentation | Hyun Seok Seong et.al. | 2303.15014 | :mortar_board: | Code |
2023-03-27 | Transformer-based Multi-Instance Learning for Weakly Supervised Object Detection | Zhaofei Wang et.al. | 2303.14999 | :mortar_board: | None |
2023-03-26 | Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers | Zhou Huang et.al. | 2303.14816 | :mortar_board: | Code |
2023-03-26 | Contrastive Transformer: Contrastive Learning Scheme with Transformer innate Patches | Sander Riisøen Jyhne et.al. | 2303.14806 | :mortar_board: | None |
2023-03-25 | Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection | Hwanjun Song et.al. | 2303.14386 | :mortar_board: | None |
2023-03-25 | Multi-view knowledge distillation transformer for human action recognition | Ying-Chen Lin et.al. | 2303.14358 | :mortar_board: | None |
2023-03-25 | Towards Accurate Post-Training Quantization for Vision Transformer | Yifu Ding et.al. | 2303.14341 | :mortar_board: | None |
2023-03-24 | FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization | Pavan Kumar Anasosalu Vasu et.al. | 2303.14189 | :mortar_board: | None |
2023-03-24 | Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers | Cong Wei et.al. | 2303.13755 | :mortar_board: | None |
2023-03-24 | How Does Attention Work in Vision Transformers? A Visual Analytics Attempt | Yiran Li et.al. | 2303.13731 | :mortar_board: | None |
2023-03-23 | Scaled Quantization for the Vision Transformer | Yangyang Chang et.al. | 2303.13601 | :mortar_board: | None |
2023-03-23 | Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective | Jinjing Zhu et.al. | 2303.13434 | :mortar_board: | None |
2023-03-23 | A Permutable Hybrid Network for Volumetric Medical Image Segmentation | Yi Lin et.al. | 2303.13111 | :mortar_board: | None |
2023-03-23 | MMFormer: Multimodal Transformer Using Multiscale Self-Attention for Remote Sensing Image Classification | Bo Zhang et.al. | 2303.13101 | :mortar_board: | None |
2023-03-23 | Top-Down Visual Attention from Analysis by Synthesis | Baifeng Shi et.al. | 2303.13043 | :mortar_board: | None |
2023-03-23 | MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer | Yunsong Zhou et.al. | 2303.13018 | :mortar_board: | None |
2023-03-22 | TRON: Transformer Neural Network Acceleration with Non-Coherent Silicon Photonics | Salma Afifi et.al. | 2303.12914 | :mortar_board: | None |
2023-03-22 | Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction | Jemin Lee et.al. | 2303.12557 | :mortar_board: | None |
2023-03-22 | Multiscale Attention via Wavelet Neural Operators for Vision Transformers | Anahita Nekoozadeh et.al. | 2303.12398 | :mortar_board: | None |
2023-03-21 | Machine Learning for Brain Disorders: Transformers and Visual Transformers | Robin Courant et.al. | 2303.12068 | :mortar_board: | None |
2023-03-18 | Vision Transformer-based Model for Severity Quantification of Lung Pneumonia Using Chest X-ray Images | Bouthaina Slika et.al. | 2303.11935 | :mortar_board: | None |
2023-03-21 | The Multiscale Surface Vision Transformer | Simon Dahan et.al. | 2303.11909 | :mortar_board: | Code |
2023-03-21 | CLIP-ReIdent: Contrastive Training for Player Re-Identification | Konrad Habel et.al. | 2303.11855 | :mortar_board: | None |
2023-03-20 | Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding | Jihao Liu et.al. | 2303.11325 | :mortar_board: | None |
2023-03-20 | Robustifying Token Attention for Vision Transformers | Yong Guo et.al. | 2303.11126 | :mortar_board: | None |
2023-03-17 | LION: Implicit Vision Prompt Tuning | Haixin Wang et.al. | 2303.09992 | :mortar_board: | None |
2023-03-16 | Vision Transformer for Action Units Detection | Tu Vu et.al. | 2303.09917 | :mortar_board: | None |
2023-03-16 | Rehearsal-Free Domain Continual Face Anti-Spoofing: Generalize More and Forget Less | Rizhao Cai et.al. | 2303.09914 | :mortar_board: | None |
2023-03-17 | Dual-path Adaptation from Image to Video Transformers | Jungin Park et.al. | 2303.09857 | :mortar_board: | Code |
2023-03-17 | Denoising Diffusion Autoencoders are Unified Self-supervised Learners | Weilai Xiang et.al. | 2303.09769 | :mortar_board: | None |
2023-03-17 | ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices | Chen Tang et.al. | 2303.09730 | :mortar_board: | None |
2023-03-15 | ViTO: Vision Transformer-Operator | Oded Ovadia et.al. | 2303.08891 | :mortar_board: | None |
2023-03-15 | DeepMIM: Deep Supervision for Masked Image Modeling | Sucheng Ren et.al. | 2303.08817 | :mortar_board: | Code |
2023-03-15 | BiFormer: Vision Transformer with Bi-Level Routing Attention | Lei Zhu et.al. | 2303.08810 | :mortar_board: | Code |
2023-03-15 | Query-guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch | Aditay Tripathi et.al. | 2303.08784 | :mortar_board: | None |
2023-03-15 | Making Vision Transformers Efficient from A Token Sparsification View | Shuning Chang et.al. | 2303.08685 | :mortar_board: | None |
2023-03-14 | Learning to Grow Artificial Hippocampi in Vision Transformers for Resilient Lifelong Learning | Chinmay Savadikar et.al. | 2303.08250 | :mortar_board: | None |
2023-03-14 | Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer’s Disease Detection | Nikhil J. Dhinagar et.al. | 2303.08216 | :mortar_board: | None |
2023-03-14 | Quaternion Orthogonal Transformer for Facial Expression Recognition in the Wild | Yu Zhou et.al. | 2303.07831 | :mortar_board: | Code |
2023-03-14 | OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav | Karmesh Yadav et.al. | 2303.07798 | :mortar_board: | None |
2023-03-14 | CAT: Causal Audio Transformer for Audio Classification | Xiaoyu Liu et.al. | 2303.07626 | :mortar_board: | None |
2023-03-14 | AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+ | Xiao Wang et.al. | 2303.07598 | :mortar_board: | Code |
2023-03-14 | WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminative Analysis | Yiye Chen et.al. | 2303.07543 | :mortar_board: | None |
2023-03-13 | Pretrained ViTs Yield Versatile Representations For Medical Images | Christos Matsoukas et.al. | 2303.07034 | :mortar_board: | Code |
2023-03-13 | CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale Attention | Wenxiao Wang et.al. | 2303.06908 | :mortar_board: | Code |
2023-03-13 | ST360IQ: No-Reference Omnidirectional Image Quality Assessment with Spherical Vision Transformers | Nafiseh Jabbari Tofighi et.al. | 2303.06907 | :mortar_board: | Code |
2023-03-13 | Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning | Yun-Hao Cao et.al. | 2303.06870 | :mortar_board: | Code |
2023-03-11 | Token Sparsification for Faster Medical Image Segmentation | Lei Zhou et.al. | 2303.06522 | :mortar_board: | Code |
2023-03-11 | Xformer: Hybrid X-Shaped Transformer for Image Denoising | Jiale Zhang et.al. | 2303.06440 | :mortar_board: | None |
2023-03-11 | Stabilizing Transformer Training by Preventing Attention Entropy Collapse | Shuangfei Zhai et.al. | 2303.06296 | :mortar_board: | None |
2023-03-10 | Contrastive Language-Image Pretrained (CLIP) Models are Powerful Out-of-Distribution Detectors | Felix Michels et.al. | 2303.05828 | :mortar_board: | None |
2023-03-10 | Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image Segmentation | Ho Hin Lee et.al. | 2303.05785 | :mortar_board: | None |
2023-03-10 | Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers | Vandad Davoodnia et.al. | 2303.05691 | :mortar_board: | None |
2023-03-08 | UT-Net: Combining U-Net and Transformer for Joint Optic Disc and Cup Segmentation and Glaucoma Detection | Rukhshanda Hussain et.al. | 2303.04939 | :mortar_board: | None |
2023-03-08 | X-Pruner: eXplainable Pruning for Vision Transformers | Lu Yu et.al. | 2303.04935 | :mortar_board: | None |
2023-03-08 | Centroid-centered Modeling for Efficient Vision Transformer Pre-training | Xin Yan et.al. | 2303.04664 | :mortar_board: | None |
2023-03-08 | HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices | Lotfi Abdelkrim Mecharbat et.al. | 2303.04440 | :mortar_board: | None |
2023-03-08 | SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking | Liangliang Yao et.al. | 2303.04378 | :mortar_board: | Code |
2023-03-08 | SANDFORMER: CNN and Transformer under Gated Fusion for Sand Dust Image Restoration | Jun Shi et.al. | 2303.04365 | :mortar_board: | None |
2023-03-07 | Prediction of transonic flow over supercritical airfoils using geometric-encoding and deep-learning strategies | Zhiwen Deng et.al. | 2303.03695 | :mortar_board: | None |
2023-03-07 | Weakly Supervised Caveline Detection For AUV Navigation Inside Underwater Caves | Boxiao Yu et.al. | 2303.03670 | :mortar_board: | None |
2023-03-07 | PreFallKD: Pre-Impact Fall Detection via CNN-ViT Knowledge Distillation | Tin-Han Chi et.al. | 2303.03634 | :mortar_board: | None |
2023-03-06 | ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents | Sana Khamekhem Jemni et.al. | 2303.03127 | :mortar_board: | None |
2023-03-06 | UniHCP: A Unified Model for Human-Centric Perceptions | Yuanzheng Ci et.al. | 2303.02936 | :mortar_board: | None |
2023-03-04 | A Fast Training-Free Compression Framework for Vision Transformers | Jung Hwan Heo et.al. | 2303.02331 | :mortar_board: | Code |
2023-03-05 | DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network | Xuan Shen et.al. | 2303.02165 | :mortar_board: | Code |
2023-03-03 | Retinal Image Restoration using Transformer and Cycle-Consistent Generative Adversarial Network | Alnur Alimanov et.al. | 2303.01939 | :mortar_board: | Code |
2023-03-03 | Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification | Alessandro Wollek et.al. | 2303.01871 | :mortar_board: | None |
2023-03-02 | Self-attention in Vision Transformers Performs Perceptual Grouping, Not Attention | Paria Mehrani et.al. | 2303.01542 | :mortar_board: | None |
2023-03-02 | Image as Set of Points | Xu Ma et.al. | 2303.01494 | :mortar_board: | Code |
2023-03-02 | Token Contrast for Weakly-Supervised Semantic Segmentation | Lixiang Ru et.al. | 2303.01267 | :mortar_board: | Code |
2023-03-02 | Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves | Sora Takashima et.al. | 2303.01112 | :mortar_board: | None |
2023-03-02 | Learning to Grow Pretrained Models for Efficient Transformer Training | Peihao Wang et.al. | 2303.00980 | :mortar_board: | None |
2023-03-02 | Enhancing General Face Forgery Detection via Vision Transformer with Low-Rank Adaptation | Chenqi Kong et.al. | 2303.00917 | :mortar_board: | None |
2023-03-01 | AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images | Ramin Nakhli et.al. | 2303.00865 | :mortar_board: | Code |
2023-02-28 | Generic-to-Specific Distillation of Masked Autoencoders | Wei Huang et.al. | 2302.14771 | :mortar_board: | Code |
2023-02-28 | Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors | Ji Hou et.al. | 2302.14746 | :mortar_board: | None |
2023-02-28 | DC-Former: Diverse and Compact Transformer for Person Re-Identification | Wen Li et.al. | 2302.14335 | :mortar_board: | Code |
2023-02-28 | Rethink Long-tailed Recognition with Vision Transforms | Zhengzhuo Xu et.al. | 2302.14284 | :mortar_board: | None |
2023-02-28 | Augmented Transformers with Adaptive n-grams Embedding for Multilingual Scene Text Recognition | Xueming Yan et.al. | 2302.14261 | :mortar_board: | None |
2023-02-28 | Remote Sensing Scene Classification with Masked Image Modeling (MIM) | Liya Wang et.al. | 2302.14256 | :mortar_board: | None |
2023-02-27 | UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction | Zhenwei Zhu et.al. | 2302.13987 | :mortar_board: | None |
2023-02-27 | Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution | Long Sun et.al. | 2302.13800 | :mortar_board: | Code |
2023-02-26 | Autonomous Intelligent Navigation for Flexible Endoscopy Using Monocular Depth Guidance and 3-D Shape Planning | Yiang Lu et.al. | 2302.13219 | :mortar_board: | None |
2023-02-24 | Amortised Invariance Learning for Contrastive Self-Supervision | Ruchika Chavhan et.al. | 2302.12712 | :mortar_board: | None |
2023-02-24 | A Convolutional Vision Transformer for Semantic Segmentation of Side-Scan Sonar Data | Hayat Rajani et.al. | 2302.12416 | :mortar_board: | Code |
2023-02-23 | Boosting Adversarial Transferability using Dynamic Cues | Muzammal Naseer et.al. | 2302.12252 | :mortar_board: | None |
2023-02-23 | StudyFormer : Attention-Based and Dynamic Multi View Classifier for X-ray images | Lucas Wannenmacher et.al. | 2302.11840 | :mortar_board: | None |
2023-02-22 | Magnification Invariant Medical Image Analysis: A Comparison of Convolutional Networks, Vision Transformers, and Token Mixers | Pranav Jeevan et.al. | 2302.11488 | :mortar_board: | None |
2023-02-22 | Transformer-Based Sensor Fusion for Autonomous Driving: A Survey | Apoorv Singh et.al. | 2302.11481 | :mortar_board: | None |
2023-02-22 | Human MotionFormer: Transferring Human Motions with Vision Transformers | Hongyu Liu et.al. | 2302.11306 | :mortar_board: | None |
2023-02-22 | A residual dense vision transformer for medical image super-resolution with segmentation-based perceptual loss fine-tuning | Jin Zhu et.al. | 2302.11184 | :mortar_board: | None |
2023-02-22 | Deep Active Learning in the Presence of Label Noise: A Survey | Moseli Mots’oehli et.al. | 2302.11075 | :mortar_board: | None |
2023-02-21 | SF2Former: Amyotrophic Lateral Sclerosis Identification From Multi-center MRI Data Using Spatial and Frequency Fusion Transformer | Rafsanjany Kushol et.al. | 2302.10859 | :mortar_board: | Code |
2023-02-21 | Bokeh Rendering Based on Adaptive Depth Calibration Network | Lu Liu et.al. | 2302.10808 | :mortar_board: | None |
2023-02-21 | MaskedKD: Efficient Distillation of Vision Transformers with Masked Images | Seungwoo Son et.al. | 2302.10494 | :mortar_board: | None |
2023-02-21 | ApproxABFT: Approximate Algorithm-Based Fault Tolerance for Vision Transformers | Xinghua Xue et.al. | 2302.10469 | :mortar_board: | None |
2023-02-21 | Reliability Analysis of Vision Transformers | Xinghua Xue et.al. | 2302.10468 | :mortar_board: | None |
2023-02-21 | Time to Embrace Natural Language Processing (NLP)-based Digital Pathology: Benchmarking NLP- and Convolutional Neural Network-based Deep Learning Pipelines | Min Cen et.al. | 2302.10406 | :mortar_board: | None |
2023-02-19 | MedViT: A Robust Vision Transformer for Generalized Medical Image Classification | Omid Nejati Manzari et.al. | 2302.09462 | :mortar_board: | None |
2023-02-18 | VITAL: Vision Transformer Neural Networks for Accurate Smartphone Heterogeneity Resilient Indoor Localization | Danish Gufran et.al. | 2302.09443 | :mortar_board: | None |
2023-02-18 | Hyneter: Hybrid Network Transformer for Object Detection | Dong Chen et.al. | 2302.09365 | :mortar_board: | None |
2023-02-18 | Meta Style Adversarial Training for Cross-Domain Few-Shot Learning | Yuqian Fu et.al. | 2302.09309 | :mortar_board: | None |
2023-02-17 | ViTA: A Vision Transformer Inference Accelerator for Edge Applications | Shashank Nag et.al. | 2302.09108 | :mortar_board: | None |
2023-02-17 | MCAE: Masked Contrastive Autoencoder for Face Anti-Spoofing | Tianyi Zheng et.al. | 2302.08674 | :mortar_board: | None |
2023-02-16 | Efficiency 360: Efficient Vision Transformers | Badri N. Patro et.al. | 2302.08374 | :mortar_board: | Code |
2023-02-16 | TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual Vision Transformer for Fast Arbitrary One-Shot Image Generation | Yunliang Jiang et.al. | 2302.08047 | :mortar_board: | None |
2023-02-15 | TFormer: A Transmission-Friendly ViT Model for IoT Devices | Zhichao Lu et.al. | 2302.07734 | :mortar_board: | None |
2023-02-14 | Robust Representation Learning with Self-Distillation for Domain Generalization | Ankur Singh et.al. | 2302.06874 | :mortar_board: | None |
2023-02-14 | DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models | Shidong Cao et.al. | 2302.06826 | :mortar_board: | Code |
2023-02-13 | A Comprehensive Study of Modern Architectures and Regularization Approaches on CheXpert5000 | Sontje Ihler et.al. | 2302.06684 | :mortar_board: | None |
2023-02-12 | A Theoretical Understanding of shallow Vision Transformers: Learning, Generalization, and Sample Complexity | Hongkang Li et.al. | 2302.06015 | :mortar_board: | None |
2023-02-12 | Self-supervised Pseudo-colorizing of Masked Cells | Royden Wagner et.al. | 2302.05968 | :mortar_board: | Code |
2023-02-12 | Generalized Few-Shot Continual Learning with Contrastive Mixture of Adapters | Yawen Cui et.al. | 2302.05936 | :mortar_board: | Code |
2023-02-11 | Synaptic Stripping: How Pruning Can Bring Dead Neurons Back To Life | Tim Whitaker et.al. | 2302.05818 | :mortar_board: | None |
2023-02-11 | Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing | Zitong Yu et.al. | 2302.05744 | :mortar_board: | None |
2023-02-10 | Scaling Vision Transformers to 22 Billion Parameters | Mostafa Dehghani et.al. | 2302.05442 | :mortar_board: | None |
2023-02-09 | Reversible Vision Transformers | Karttikeya Mangalam et.al. | 2302.04869 | :mortar_board: | Code |
2023-02-09 | IH-ViT: Vision Transformer-based Integrated Circuit Appear-ance Defect Detection | Xiaoibin Wang et.al. | 2302.04521 | :mortar_board: | None |
2023-02-08 | Adapting Pre-trained Vision Transformers from 2D to 3D through Weight Inflation Improves Medical Image Segmentation | Yuhui Zhang et.al. | 2302.04303 | :mortar_board: | Code |
2023-02-08 | Predicting Thrombectomy Recanalization from CT Imaging Using Deep Learning Models | Haoyue Zhang et.al. | 2302.04143 | :mortar_board: | None |
2023-02-08 | Cross-Layer Retrospective Retrieving via Layer Attention | Yanwen Fang et.al. | 2302.03985 | :mortar_board: | Code |
2023-02-08 | SwinCross: Cross-modal Swin Transformer for Head-and-Neck Tumor Segmentation in PET/CT Images | Gary Y. Li et.al. | 2302.03861 | :mortar_board: | None |
2023-02-07 | Understanding Why ViT Trains Badly on Small Datasets: An Intuitive Perspective | Haoran Zhu et.al. | 2302.03751 | :mortar_board: | Code |
2023-02-07 | Deep Class-Incremental Learning: A Survey | Da-Wei Zhou et.al. | 2302.03648 | :mortar_board: | Code |
2023-02-06 | Spatial Functa: Scaling Functa to ImageNet Classification and Generation | Matthias Bauer et.al. | 2302.03130 | :mortar_board: | None |
2023-02-06 | AIM: Adapting Image Models for Efficient Video Action Recognition | Taojiannan Yang et.al. | 2302.03024 | :mortar_board: | None |
2023-02-06 | V1T: large-scale mouse V1 response prediction using a Vision Transformer | Bryan M. Li et.al. | 2302.03023 | :mortar_board: | None |
2023-02-04 | Oscillation-free Quantization for Low-bit Vision Transformers | Shih-Yang Liu et.al. | 2302.02210 | :mortar_board: | None |
2023-02-04 | Knowledge Distillation in Vision Transformers: A Critical Review | Gousia Habib et.al. | 2302.02108 | :mortar_board: | None |
2023-02-03 | DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition | Jiayu Jiao et.al. | 2302.01791 | :mortar_board: | Code |
2023-02-02 | Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective | Michael E. Sander et.al. | 2302.01425 | :mortar_board: | None |
2023-02-02 | Dual PatchNorm | Manoj Kumar et.al. | 2302.01327 | :mortar_board: | None |
2023-02-02 | Mnemosyne: Learning to Train Transformers with Transformers | Deepali Jain et.al. | 2302.01128 | :mortar_board: | None |
2023-02-02 | LesionAid: Vision Transformers-based Skin Lesion Generation and Classification | Ghanta Sai Krishna et.al. | 2302.01104 | :mortar_board: | None |
2023-02-02 | Vision Transformer-based Feature Extraction for Generalized Zero-Shot Learning | Jiseob Kim et.al. | 2302.00875 | :mortar_board: | None |
2023-02-01 | Efficient Scopeformer: Towards Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection | Yassine Barhoumi et.al. | 2302.00220 | :mortar_board: | None |
2023-01-31 | Real Estate Property Valuation using Self-Supervised Vision Transformers | Mahdieh Yazdani et.al. | 2302.00117 | :mortar_board: | None |
2023-01-31 | Fairness-aware Vision Transformer via Debiased Self-Attention | Yao Qiang et.al. | 2301.13803 | :mortar_board: | None |
2023-01-31 | Inference Time Evidences of Adversarial Attacks for Forensic on Transformers | Hugo Lemarchant et.al. | 2301.13356 | :mortar_board: | None |
2023-01-30 | SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation | Qiang Wan et.al. | 2301.13156 | :mortar_board: | Code |
2023-01-30 | DepGraph: Towards Any Structural Pruning | Gongfan Fang et.al. | 2301.12900 | :mortar_board: | Code |
2023-01-29 | Graph Mixer Networks | Ahmet Sarıgün et.al. | 2301.12493 | :mortar_board: | Code |
2023-01-29 | Towards Verifying the Geometric Robustness of Large-scale Neural Networks | Fu Wang et.al. | 2301.12456 | :mortar_board: | Code |
2023-01-29 | PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer | Jiayu Shang et.al. | 2301.12422 | :mortar_board: | Code |
2023-01-29 | Towards Vision Transformer Unrolling Fixed-Point Algorithm: a Case Study on Image Restoration | Peng Qiao et.al. | 2301.12332 | :mortar_board: | None |
2023-01-28 | Aerial Image Object Detection With Vision Transformer Detector (ViTDet) | Liya Wang et.al. | 2301.12058 | :mortar_board: | None |
2023-01-27 | Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks | Haiyan Zhao et.al. | 2301.11560 | :mortar_board: | None |
2023-01-27 | Robust Transformer with Locality Inductive Bias and Feature Normalization | Omid Nejati Manzari et.al. | 2301.11553 | :mortar_board: | None |
2023-01-26 | Compact Transformer Tracker with Correlative Masked Modeling | Zikai Song et.al. | 2301.10938 | :mortar_board: | Code |
2023-01-26 | Facial Emotion Recognition | Arpita Vats et.al. | 2301.10906 | :mortar_board: | None |
2023-01-25 | Out of Distribution Performance of State of Art Vision Model | Md Salman Rahman et.al. | 2301.10750 | :mortar_board: | None |
2023-01-25 | Connecting metrics for shape-texture knowledge in computer vision | Tiago Oliveira et.al. | 2301.10608 | :mortar_board: | None |
2023-01-24 | RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving | Angelika Ando et.al. | 2301.10222 | :mortar_board: | None |
2023-01-24 | Model soups to increase inference without increasing compute time | Charles Dansereau et.al. | 2301.10092 | :mortar_board: | Code |
2023-01-23 | Combined Use of Federated Learning and Image Encryption for Privacy-Preserving Image Classification with Vision Transformer | Teru Nagamori et.al. | 2301.09255 | :mortar_board: | None |
2023-01-20 | Holistically Explainable Vision Transformers | Moritz Böhle et.al. | 2301.08669 | :mortar_board: | None |
2023-01-20 | Image Memorability Prediction with Vision Transformers | Thomas Hagen et.al. | 2301.08647 | :mortar_board: | None |
2023-01-19 | Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture | Mahmoud Assran et.al. | 2301.08243 | :mortar_board: | None |
2023-01-18 | ViT-AE++: Improving Vision Transformer Autoencoder for Self-supervised Medical Image Representations | Chinmay Prabhakar et.al. | 2301.07382 | :mortar_board: | None |
2023-01-17 | Long Range Pooling for 3D Large-Scale Scene Understanding | Xiang-Li Li et.al. | 2301.06962 | :mortar_board: | None |
2023-01-16 | Flow imaging as an alternative to pressure transducers through vision transformers and convolutional neural networks | Renato F. Miotto et.al. | 2301.06410 | :mortar_board: | None |
2023-01-15 | TextileNet: A Material Taxonomy-based Fashion Textile Dataset | Shu Zhong et.al. | 2301.06160 | :mortar_board: | Code |
2023-01-13 | Efficient Activation Function Optimization through Surrogate Modeling | Garrett Bingham et.al. | 2301.05785 | :mortar_board: | Code |
2023-01-13 | GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer | Miao Yin et.al. | 2301.05345 | :mortar_board: | None |
2023-01-12 | ViTs for SITS: Vision Transformers for Satellite Image Time Series | Michail Tarasiou et.al. | 2301.04944 | :mortar_board: | None |
2023-01-11 | Head-Free Lightweight Semantic Segmentation with Linear Transformer | Bo Dong et.al. | 2301.04648 | :mortar_board: | Code |
2023-01-11 | Dynamic Background Reconstruction via Transformer for Infrared Small Target Detection | Jingchao Peng et.al. | 2301.04497 | :mortar_board: | None |
2023-01-11 | Deep Learning Model with Attention Mechanism for Super-resolution of Wireless Channel Characteristics | Haoyang Zhang et.al. | 2301.04479 | :mortar_board: | None |
2023-01-10 | Vision Transformers Are Good Mask Auto-Labelers | Shiyi Lan et.al. | 2301.03992 | :mortar_board: | None |
2023-01-10 | Dynamic Grained Encoder for Vision Transformers | Lin Song et.al. | 2301.03831 | :mortar_board: | Code |
2023-01-09 | Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review | Reza Azad et.al. | 2301.03505 | :mortar_board: | Code |
2023-01-08 | STPrivacy: Spatio-Temporal Tubelet Sparsification and Anonymization for Privacy-preserving Action Recognition | Ming Li et.al. | 2301.03046 | :mortar_board: | None |
2023-01-06 | Exploring Efficient Few-shot Adaptation for Vision Transformers | Chengming Xu et.al. | 2301.02419 | :mortar_board: | Code |
2023-01-05 | Skip-Attention: Improving Vision Transformers by Paying Less Attention | Shashanka Venkataramanan et.al. | 2301.02240 | :mortar_board: | None |
2023-01-05 | MS-DINO: Efficient Distributed Training of Vision Transformer Foundation Model in Medical Domain through Masked Sampling | Sangjoon Park et.al. | 2301.02064 | :mortar_board: | None |
2023-01-05 | Enabling Augmented Segmentation and Registration in Ultrasound-Guided Spinal Surgery via Realistic Ultrasound Synthesis from Diagnostic CT Volume | Ang Li et.al. | 2301.01940 | :mortar_board: | None |
2023-01-04 | Semi-MAE: Masked Autoencoders for Semi-supervised Vision Transformers | Haojie Yu et.al. | 2301.01431 | :mortar_board: | None |
2023-01-03 | Explainability and Robustness of Deep Visual Classification Models | Jindong Gu et.al. | 2301.01343 | :mortar_board: | None |
2023-01-03 | TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models | Sucheng Ren et.al. | 2301.01296 | :mortar_board: | Code |
2023-01-03 | A New Perspective to Boost Vision Transformer for Medical Image Classification | Yuexiang Li et.al. | 2301.00989 | :mortar_board: | None |
2023-01-03 | Detecting Severity of Diabetic Retinopathy from Fundus Images using Ensembled Transformers | Chandranath Adak et.al. | 2301.00973 | :mortar_board: | None |
2023-01-02 | Lightweight Image Inpainting by Stripe Window Transformer with Joint Attention to CNN | Tsung-Jung Liu et.al. | 2301.00553 | :mortar_board: | Code |
2023-01-01 | Goal-guided Transformer-enabled Reinforcement Learning for Efficient Autonomous Navigation | Wenhui Huang et.al. | 2301.00362 | :mortar_board: | None |
2022-12-29 | AttEntropy: Segmenting Unknown Objects in Complex Scenes using the Spatial Attention Entropy of Semantic Segmentation Transformers | Krzysztof Lis et.al. | 2212.14397 | :mortar_board: | None |
2022-12-28 | RevealED: Uncovering Pro-Eating Disorder Content on Twitter Using Deep Learning | Jonathan Feldman et.al. | 2212.13949 | :mortar_board: | None |
2022-12-28 | Exploring Vision Transformers as Diffusion Learners | He Cao et.al. | 2212.13771 | :mortar_board: | None |
2022-12-28 | OVO: One-shot Vision Transformer Search with Online distillation | Zimian Wei et.al. | 2212.13766 | :mortar_board: | None |
2022-12-28 | Representation Separation for Semantic Segmentation with Vision Transformers | Yuanduo Hong et.al. | 2212.13764 | :mortar_board: | None |
2022-12-27 | Semi-supervised multiscale dual-encoding method for faulty traffic data detection | Yongcan Huang et.al. | 2212.13596 | :mortar_board: | None |
2022-12-26 | SMMix: Self-Motivated Image Mixing for Vision Transformers | Mengzhao Chen et.al. | 2212.12977 | :mortar_board: | Code |
2022-12-23 | A Close Look at Spatial Modeling: From Attention to Convolution | Xu Ma et.al. | 2212.12552 | :mortar_board: | Code |
2022-12-23 | PanoViT: Vision Transformer for Room Layout Estimation from a Single Panoramic Image | Weichao Shen et.al. | 2212.12156 | :mortar_board: | None |
2022-12-21 | What Makes for Good Tokenizers in Vision Transformer? | Shengju Qian et.al. | 2212.11115 | :mortar_board: | None |
2022-12-21 | Investigation of Network Architecture for Multimodal Head-and-Neck Tumor Segmentation | Ye Li et.al. | 2212.10724 | :mortar_board: | None |
2022-12-20 | Visual Transformers for Primates Classification and Covid Detection | Steffen Illium et.al. | 2212.10093 | :mortar_board: | None |
2022-12-20 | Conditioned Generative Transformers for Histopathology Image Synthetic Augmentation | Meng Li et.al. | 2212.09977 | :mortar_board: | None |
2022-12-16 | Rethinking Cooking State Recognition with Vision Transformers | Akib Mohammed Khan et.al. | 2212.08586 | :mortar_board: | None |
2022-12-16 | Morphological Classification of Radio Galaxies with wGAN-supported Augmentation | Lennart Rustige et.al. | 2212.08504 | :mortar_board: | Code |
2022-12-16 | RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers | Zhikai Li et.al. | 2212.08254 | :mortar_board: | None |
2022-12-15 | Rethinking Vision Transformers for MobileNet Size and Speed | Yanyu Li et.al. | 2212.08059 | :mortar_board: | Code |
2022-12-15 | FlexiViT: One Model for All Patch Sizes | Lucas Beyer et.al. | 2212.08013 | :mortar_board: | Code |
2022-12-15 | Vision Transformers are Parameter-Efficient Audio-Visual Learners | Yan-Bo Lin et.al. | 2212.07983 | :mortar_board: | Code |
2022-12-15 | Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation | Loic Themyr et.al. | 2212.07890 | :mortar_board: | None |
2022-12-15 | Detecting Bone Lesions in X-Ray Under Diverse Acquisition Conditions | Tal Zimbalist et.al. | 2212.07792 | :mortar_board: | None |
2022-12-13 | GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation | Chenhongyi Yang et.al. | 2212.06795 | :mortar_board: | Code |
2022-12-13 | What do Vision Transformers Learn? A Visual Exploration | Amin Ghiasi et.al. | 2212.06727 | :mortar_board: | Code |
2022-12-13 | OAMixer: Object-aware Mixing Layer for Vision Transformers | Hyunwoo Kang et.al. | 2212.06595 | :mortar_board: | Code |
2022-12-12 | You Only Need a Good Embeddings Extractor to Fix Spurious Correlations | Raghav Mehta et.al. | 2212.06254 | :mortar_board: | None |
2022-12-12 | Masked autoencoders are effective solution to transformer data-hungry | Jiawei Mao et.al. | 2212.05677 | :mortar_board: | Code |
2022-12-11 | Recurrent Vision Transformers for Object Detection with Event Cameras | Mathias Gehrig et.al. | 2212.05598 | :mortar_board: | None |
2022-12-11 | PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery | Sheng Zhang et.al. | 2212.05590 | :mortar_board: | Code |
2022-12-11 | Vision Transformer with Attentive Pooling for Robust Facial Expression Recognition | Fanglei Xue et.al. | 2212.05463 | :mortar_board: | None |
2022-12-10 | Position Embedding Needs an Independent Layer Normalization | Runyi Yu et.al. | 2212.05262 | :mortar_board: | None |
2022-12-09 | Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints | Aran Komatsuzaki et.al. | 2212.05055 | :mortar_board: | Code |
2022-12-09 | AugNet: Dynamic Test-Time Augmentation via Differentiable Functions | Shohei Enomoto et.al. | 2212.04681 | :mortar_board: | None |
2022-12-09 | Mitigation of Spatial Nonstationarity with Vision Transformers | Lei Liu et.al. | 2212.04633 | :mortar_board: | None |
2022-12-07 | ViTPose+: Vision Transformer Foundation Model for Generic Body Pose Estimation | Yufei Xu et.al. | 2212.04246 | :mortar_board: | Code |
2022-12-08 | Group Generalized Mean Pooling for Vision Transformer | Byungsoo Ko et.al. | 2212.04114 | :mortar_board: | None |
2022-12-07 | Multimodal Vision Transformers with Forced Attention for Behavior Analysis | Tanay Agrawal et.al. | 2212.03968 | :mortar_board: | None |
2022-12-07 | Teaching Matters: Investigating the Role of Supervision in Vision Transformers | Matthew Walmer et.al. | 2212.03862 | :mortar_board: | Code |
2022-12-06 | Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning | Cheng-Hao Tu et.al. | 2212.03220 | :mortar_board: | None |
2022-12-06 | FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer | Shibo Jie et.al. | 2212.03145 | :mortar_board: | Code |
2022-12-06 | Event-based Monocular Dense Depth Estimation with Recurrent Transformers | Xu Liu et.al. | 2212.02791 | :mortar_board: | None |
2022-12-06 | Semantic-aware Message Broadcasting for Efficient Unsupervised Domain Adaptation | Xin Li et.al. | 2212.02739 | :mortar_board: | Code |
2022-12-06 | Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications | Kavya Sreedhar et.al. | 2212.02687 | :mortar_board: | None |
2022-12-05 | 3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes | Alara Dirik et.al. | 2212.02184 | :mortar_board: | None |
2022-12-05 | Learning Imbalanced Data with Vision Transformers | Zhengzhuo Xu et.al. | 2212.02015 | :mortar_board: | Code |
2022-12-03 | Exploring Stochastic Autoregressive Image Modeling for Visual Representation | Yu Qi et.al. | 2212.01610 | :mortar_board: | Code |
2022-12-01 | ResFormer: Scaling ViTs with Multi-Resolution Training | Rui Tian et.al. | 2212.00776 | :mortar_board: | None |
2022-11-29 | Transformer-based Hand Gesture Recognition via High-Density EMG Signals: From Instantaneous Recognition to Fusion of Motor Unit Spike Trains | Mansooreh Montazerin et.al. | 2212.00743 | :mortar_board: | None |
2022-11-30 | Part-based Face Recognition with Vision Transformers | Zhonglin Sun et.al. | 2212.00057 | :mortar_board: | None |
2022-11-29 | Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing | Nataniel Ruiz et.al. | 2211.16499 | :mortar_board: | None |
2022-11-29 | RGB no more: Minimally-decoded JPEG Vision Transformers | Jeongsoo Park et.al. | 2211.16421 | :mortar_board: | None |
2022-11-29 | Lightweight Structure-Aware Attention for Visual Understanding | Heeseung Kwon et.al. | 2211.16289 | :mortar_board: | None |
2022-11-29 | Metal-conscious Embedding for CBCT Projection Inpainting | Fuxin Fan et.al. | 2211.16219 | :mortar_board: | None |
2022-11-29 | NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers | Yijiang Liu et.al. | 2211.16056 | :mortar_board: | None |
2022-11-29 | LUMix: Improving Mixup by Better Modelling Label Uncertainty | Shuyang Sun et.al. | 2211.15846 | :mortar_board: | None |
2022-11-28 | Good helper is around you: Attention-driven Masked Image Modeling | Zhengqi Liu et.al. | 2211.15362 | :mortar_board: | Code |
2022-11-27 | Semantic-Aware Local-Global Vision Transformer | Jiatong Zhang et.al. | 2211.14705 | :mortar_board: | None |
2022-11-26 | Game Theoretic Mixed Experts for Combinational Adversarial Machine Learning | Ethan Rathbun et.al. | 2211.14669 | :mortar_board: | None |
2022-11-26 | Towards Better Input Masking for Convolutional Neural Networks | Sriram Balasubramanian et.al. | 2211.14646 | :mortar_board: | None |
2022-11-26 | PatchGT: Transformer over Non-trainable Clusters for Learning Graph Representations | Han Gao et.al. | 2211.14425 | :mortar_board: | Code |
2022-11-25 | Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations | Tan Yu et.al. | 2211.14255 | :mortar_board: | None |
2022-11-25 | MPCViT: Searching for MPC-friendly Vision Transformer with Heterogeneous Attention | Wenxuan Zeng et.al. | 2211.13955 | :mortar_board: | None |
2022-11-25 | Spatial-Temporal Attention Network for Open-Set Fine-Grained Image Recognition | Jiayin Sun et.al. | 2211.13940 | :mortar_board: | None |
2022-11-25 | TAOTF: A Two-stage Approximately Orthogonal Training Framework in Deep Neural Networks | Taoyong Cui et.al. | 2211.13902 | :mortar_board: | None |
2022-11-25 | AFR-Net: Attention-Driven Fingerprint Recognition Network | Steven A. Grosz et.al. | 2211.13897 | :mortar_board: | None |
2022-11-25 | Adaptive Attention Link-based Regularization for Vision Transformers | Heegon Jin et.al. | 2211.13852 | :mortar_board: | None |
2022-11-24 | Efficient Zero-shot Visual Search via Target and Context-aware Transformer | Zhiwei Ding et.al. | 2211.13470 | :mortar_board: | None |
2022-11-23 | SVFormer: Semi-supervised Video Transformer for Action Recognition | Zhen Xing et.al. | 2211.13222 | :mortar_board: | Code |
2022-11-23 | CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning | James Seale Smith et.al. | 2211.13218 | :mortar_board: | None |
2022-11-23 | Indian Commercial Truck License Plate Detection and Recognition for Weighbridge Automation | Siddharth Agrawal et.al. | 2211.13194 | :mortar_board: | None |
2022-11-23 | ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation | Sara Atito et.al. | 2211.13189 | :mortar_board: | None |
2022-11-23 | Data Augmentation Vision Transformer for Fine-grained Image Classification | Chao Hu et.al. | 2211.12879 | :mortar_board: | None |
2022-11-22 | Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization | Zifan Wang et.al. | 2211.12624 | :mortar_board: | None |
2022-11-22 | MagicPony: Learning Articulated 3D Animals in the Wild | Shangzhe Wu et.al. | 2211.12497 | :mortar_board: | None |
2022-11-22 | TranViT: An Integrated Vision Transformer Framework for Discrete Transit Travel Time Range Prediction | Awad Abdelhalim et.al. | 2211.12322 | :mortar_board: | None |
2022-11-22 | Generalizable Industrial Visual Anomaly Detection with Self-Induction Vision Transformer | Haiming Yao et.al. | 2211.12311 | :mortar_board: | None |
2022-11-22 | Gated Class-Attention with Cascaded Feature Drift Compensation for Exemplar-free Continual Learning of Vision Transformers | Marco Cotogni et.al. | 2211.12292 | :mortar_board: | Code |
2022-11-22 | Transformer Based Multi-Grained Features for Unsupervised Person Re-Identification | Jiachen Li et.al. | 2211.12280 | :mortar_board: | None |
2022-11-22 | Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition | Qibin Hou et.al. | 2211.11943 | :mortar_board: | Code |
2022-11-21 | Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers | Sifan Long et.al. | 2211.11315 | :mortar_board: | None |
2022-11-21 | On the Robustness, Generalization, and Forgetting of Shape-Texture Debiased Continual Learning | Zenglin Shi et.al. | 2211.11174 | :mortar_board: | None |
2022-11-21 | Vision Transformer with Super Token Sampling | Huaibo Huang et.al. | 2211.11167 | :mortar_board: | None |
2022-11-20 | Overfreezing Meets Overparameterization: A Double Descent Perspective on Transfer Learning of Deep Neural Networks | Yehuda Dar et.al. | 2211.11074 | :mortar_board: | None |
2022-11-20 | Hybrid Transformer Based Feature Fusion for Self-Supervised Monocular Depth Estimation | Snehal Singh Tomar et.al. | 2211.11066 | :mortar_board: | None |
2022-11-19 | Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training | Zhenglun Kong et.al. | 2211.10801 | :mortar_board: | None |
2022-11-18 | Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference | Haoran You et.al. | 2211.10526 | :mortar_board: | None |
2022-11-18 | Improved Cross-view Completion Pre-training for Stereo Matching | Philippe Weinzaepfel et.al. | 2211.10408 | :mortar_board: | None |
2022-11-18 | Vision Transformers in Medical Imaging: A Review | Emerald U. Henry et.al. | 2211.10043 | :mortar_board: | None |
2022-11-17 | EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones | Yulin Wang et.al. | 2211.09703 | :mortar_board: | Code |
2022-11-17 | CPT-V: A Contrastive Approach to Post-Training Quantization of Vision Transformers | Natalia Frumkin et.al. | 2211.09643 | :mortar_board: | None |
2022-11-17 | UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer | Kunchang Li et.al. | 2211.09552 | :mortar_board: | Code |
2022-11-17 | Detecting Arbitrary Keypoints on Limbs and Skis with Sparse Partly Correct Segmentation Masks | Katja Ludwig et.al. | 2211.09446 | :mortar_board: | Code |
2022-11-17 | How to Fine-Tune Vision Models with SGD | Ananya Kumar et.al. | 2211.09359 | :mortar_board: | None |
2022-11-16 | Differentially Private Optimizers Can Learn Adversarially Robust Models | Yuan Zhang et.al. | 2211.08942 | :mortar_board: | None |
2022-11-13 | Demystify Self-Attention in Vision Transformers from a Semantic Perspective: Analysis and Application | Leijie Wu et.al. | 2211.08543 | :mortar_board: | None |
2022-11-15 | HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers | Peiyan Dong et.al. | 2211.08110 | :mortar_board: | None |
2022-11-15 | ShadowDiffusion: Diffusion-based Shadow Removal using Classifier-driven Attention and Structure Preservation | Yeying Jin et.al. | 2211.08089 | :mortar_board: | None |
2022-11-15 | Using Human Perception to Regularize Transfer Learning | Justin Dulay et.al. | 2211.07885 | :mortar_board: | None |
2022-11-14 | CabViT: Cross Attention among Blocks for Vision Transformer | Haokui Zhang et.al. | 2211.07198 | :mortar_board: | Code |
2022-11-14 | Unsupervised Galaxy Morphological Visual Representation with Deep Contrastive Learning | Shoulin Wei et.al. | 2211.07168 | :mortar_board: | Code |
2022-11-14 | BiViT: Extremely Compressed Binary Vision Transformer | Yefei He et.al. | 2211.07091 | :mortar_board: | None |
2022-11-12 | MultiCrossViT: Multimodal Vision Transformer for Schizophrenia Prediction using Structural MRI and Functional Network Connectivity Data | Yuda Bi et.al. | 2211.06726 | :mortar_board: | None |
2022-11-12 | AU-Aware Vision Transformers for Biased Facial Expression Recognition | Shuyi Mao et.al. | 2211.06609 | :mortar_board: | None |
2022-11-12 | End-to-End Machine Learning Framework for Facial AU Detection in Intensive Care Units | Subhash Nerella et.al. | 2211.06570 | :mortar_board: | None |
2022-11-11 | A Comprehensive Survey of Transformers for Computer Vision | Sonain Jamil et.al. | 2211.06004 | :mortar_board: | None |
2022-11-10 | Demystify Transformers & Convolutions in Modern Image Deep Networks | Jifeng Dai et.al. | 2211.05781 | :mortar_board: | Code |
2022-11-10 | InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions | Wenhai Wang et.al. | 2211.05778 | :mortar_board: | Code |
2022-11-09 | Training a Vision Transformer from scratch in less than 24 hours with 1 GPU | Saghar Irandoust et.al. | 2211.05187 | :mortar_board: | None |
2022-11-09 | ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention | Jyotikrishna Dass et.al. | 2211.05109 | :mortar_board: | None |
2022-11-09 | Pure Transformer with Integrated Experts for Scene Text Recognition | Yew Lee Tan et.al. | 2211.04963 | :mortar_board: | None |
2022-11-09 | Masked Vision-Language Transformers for Scene Text Recognition | Jie Wu et.al. | 2211.04785 | :mortar_board: | Code |
2022-11-08 | Splitting expands the application range of Vision Transformer – variable Vision Transformer (vViT) | Takuma Usuzaki et.al. | 2211.03992 | :mortar_board: | None |
2022-11-07 | CoNMix for Source-free Single and Multi-target Domain Adaptation | Vikash Kumar et.al. | 2211.03876 | :mortar_board: | None |
2022-11-07 | Novel Muscle Monitoring by Radiomyography(RMG) and Application to Hand Gesture Recognition | Zijing Zhang et.al. | 2211.03767 | :mortar_board: | None |
2022-11-07 | Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining | Qiang Chen et.al. | 2211.03594 | :mortar_board: | None |
2022-11-07 | Efficient Multi-order Gated Aggregation Network | Siyuan Li et.al. | 2211.03295 | :mortar_board: | Code |
2022-11-06 | ViT-CX: Causal Explanation of Vision Transformers | Weiyan Xie et.al. | 2211.03064 | :mortar_board: | None |
2022-11-04 | RCDPT: Radar-Camera fusion Dense Prediction Transformer | Chen-Chou Lo et.al. | 2211.02432 | :mortar_board: | None |
2022-11-04 | SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision Transformers | A. Arezzo et.al. | 2211.02366 | :mortar_board: | Code |
2022-11-04 | Boosting Binary Neural Networks via Dynamic Thresholds Learning | Jiehua Zhang et.al. | 2211.02292 | :mortar_board: | None |
2022-11-03 | Rethinking Hierarchies in Pre-trained Plain Vision Transformer | Yufei Xu et.al. | 2211.01785 | :mortar_board: | None |
2022-11-03 | Evaluating a Synthetic Image Dataset Generated with Stable Diffusion | Andreas Stöckl et.al. | 2211.01777 | :mortar_board: | None |
2022-11-02 | The Lottery Ticket Hypothesis for Vision Transformers | Xuan Shen et.al. | 2211.01484 | :mortar_board: | None |
2022-11-02 | Attention-based Neural Cellular Automata | Mattie Tesfaldet et.al. | 2211.01233 | :mortar_board: | None |
2022-11-02 | RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild | Weiyao Wang et.al. | 2211.01165 | :mortar_board: | None |
2022-11-02 | WITT: A Wireless Image Transmission Transformer for Semantic Communications | Ke Yang et.al. | 2211.00937 | :mortar_board: | Code |
2022-11-01 | ViT-DeiT: An Ensemble Model for Breast Cancer Histopathological Images Classification | Amira Alotaibi et.al. | 2211.00749 | :mortar_board: | None |
2022-10-31 | Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation | Simone Rossetti et.al. | 2210.17400 | :mortar_board: | Code |
2022-10-31 | ViT-LSLA: Vision Transformer with Light Self-Limited-Attention | Zhenzhe Hechen et.al. | 2210.17115 | :mortar_board: | None |
2022-10-30 | ViTASD: Robust Vision Transformer Baselines for Autism Spectrum Disorder Facial Diagnosis | Xu Cao et.al. | 2210.16943 | :mortar_board: | Code |
2022-10-30 | Foreign Object Debris Detection for Airport Pavement Images based on Self-supervised Localization and Vision Transformer | Travis Munyer et.al. | 2210.16901 | :mortar_board: | None |
2022-10-30 | Exemplar Guided Deep Neural Network for Spatial Transcriptomics Analysis of Gene Expression Prediction | Yan Yang et.al. | 2210.16721 | :mortar_board: | Code |
2022-10-29 | ImplantFormer: Vision Transformer based Implant Position Regression Using Dental CBCT Data | Xinquan Yang et.al. | 2210.16467 | :mortar_board: | None |
2022-10-28 | Multimodal Transformer for Parallel Concatenated Variational Autoencoders | Stephen D. Liang et.al. | 2210.16174 | :mortar_board: | None |
2022-10-28 | Federated Learning for Chronic Obstructive Pulmonary Disease Classification with Partial Personalized Attention Mechanism | Yiqing Shen et.al. | 2210.16142 | :mortar_board: | None |
2022-10-28 | Differentially Private CutMix for Split Learning with Vision Transformer | Seungeun Oh et.al. | 2210.15986 | :mortar_board: | None |
2022-10-28 | Grafting Vision Transformers | Jongwoo Park et.al. | 2210.15943 | :mortar_board: | None |
2022-10-27 | Fully-attentive and interpretable: vision and video vision transformers for pain detection | Giacomo Fiorentini et.al. | 2210.15769 | :mortar_board: | Code |
2022-10-27 | PatchRot: A Self-Supervised Technique for Training Vision Transformers | Sachin Chhabra et.al. | 2210.15722 | :mortar_board: | Code |
2022-10-27 | Masked Transformer for image Anomaly Localization | Axel De Nardin et.al. | 2210.15540 | :mortar_board: | None |
2022-10-27 | Li3DeTr: A LiDAR based 3D Detection Transformer | Gopi Krishna Erabati et.al. | 2210.15365 | :mortar_board: | None |
2022-10-27 | Vision Transformer for Adaptive Image Transmission over MIMO Channels | Haotian Wu et.al. | 2210.15347 | :mortar_board: | None |
2022-10-27 | MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous Driving | Gopi Krishna Erabati et.al. | 2210.15316 | :mortar_board: | None |
2022-10-27 | Spatio-Temporal Hybrid Fusion of CAE and SWIn Transformers for Lung Cancer Malignancy Prediction | Sadaf Khademi et.al. | 2210.15297 | :mortar_board: | None |
2022-10-27 | ViT-CAT: Parallel Vision Transformers with Cross Attention Fusion for Popularity Prediction in MEC Networks | Zohreh HajiAkhondi-Meybodi et.al. | 2210.15125 | :mortar_board: | None |
2022-10-27 | Masked Vision-Language Transformer in Fashion | Ge-Peng Ji et.al. | 2210.15110 | :mortar_board: | Code |
2022-10-26 | MViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design | Hanxue Liang et.al. | 2210.14793 | :mortar_board: | Code |
2022-10-26 | TPFNet: A Novel Text In-painting Transformer for Text Removal | Onkar Susladkar et.al. | 2210.14461 | :mortar_board: | Code |
2022-10-25 | Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets | Xiangyu Chen et.al. | 2210.14319 | :mortar_board: | None |
2022-10-25 | Learning Explicit Object-Centric Representations with Vision Transformers | Oscar Vikström et.al. | 2210.14139 | :mortar_board: | None |
2022-10-25 | Minutiae-Guided Fingerprint Embeddings via Vision Transformers | Steven A. Grosz et.al. | 2210.13994 | :mortar_board: | None |
2022-10-24 | The Robustness Limits of SoTA Vision Models to Natural Variation | Mark Ibrahim et.al. | 2210.13604 | :mortar_board: | None |
2022-10-23 | Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future | Guo-Jun Qi et.al. | 2210.13463 | :mortar_board: | None |
2022-10-23 | Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification | Junfei Xiao et.al. | 2210.12843 | :mortar_board: | Code |
2022-10-23 | UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection | Wanyi Zhuang et.al. | 2210.12752 | :mortar_board: | None |
2022-10-23 | Accelerated Linearized Laplace Approximation for Bayesian Deep Learning | Zhijie Deng et.al. | 2210.12642 | :mortar_board: | Code |
2022-10-22 | S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention | Chiyu Zhang et.al. | 2210.12381 | :mortar_board: | None |
2022-10-22 | Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets | Xiangyu Chen et.al. | 2210.12333 | :mortar_board: | Code |
2022-10-21 | High-Fidelity Visual Structural Inspections through Transformers and Learnable Resizers | Kareem Eltouny et.al. | 2210.12175 | :mortar_board: | None |
2022-10-21 | Face Pyramid Vision Transformer | Khawar Islam et.al. | 2210.11974 | :mortar_board: | Code |
2022-10-21 | Boosting vision transformers for image retrieval | Chull Hwan Song et.al. | 2210.11909 | :mortar_board: | Code |
2022-10-20 | GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network | Jheng-Wei Su et.al. | 2210.11419 | :mortar_board: | None |
2022-10-20 | General Image Descriptors for Open World Image Retrieval using ViT CLIP | Marcos V. Conde et.al. | 2210.11141 | :mortar_board: | Code |
2022-10-20 | SimpleClick: Interactive Image Segmentation with Simple Vision Transformers | Qin Liu et.al. | 2210.11006 | :mortar_board: | Code |
2022-10-19 | A Unified View of Masked Image Modeling | Zhiliang Peng et.al. | 2210.10615 | :mortar_board: | None |
2022-10-19 | Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval | Abhra Chaudhuri et.al. | 2210.10486 | :mortar_board: | None |
2022-10-19 | Multi-view Gait Recognition based on Siamese Vision Transformer | Yanchen Yang et.al. | 2210.10421 | :mortar_board: | None |
2022-10-18 | Number-Adaptive Prototype Learning for 3D Point Cloud Semantic Segmentation | Yangheng Zhao et.al. | 2210.09948 | :mortar_board: | None |
2022-10-18 | Sequence and Circle: Exploring the Relationship Between Patches | Zhengyang Yu et.al. | 2210.09871 | :mortar_board: | None |
2022-10-18 | Decoupling Features in Hierarchical Propagation for Video Object Segmentation | Zongxin Yang et.al. | 2210.09782 | :mortar_board: | Code |
2022-10-18 | ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design | Haoran You et.al. | 2210.09573 | :mortar_board: | None |
2022-10-18 | Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation | Ruijun Li et.al. | 2210.09549 | :mortar_board: | None |
2022-10-14 | oViT: An Accurate Second-Order Pruning Framework for Vision Transformers | Denis Kuznedelev et.al. | 2210.09223 | :mortar_board: | None |
2022-10-17 | Histopathological Image Classification based on Self-Supervised Vision Transformer and Weak Labels | Ahmet Gokberk Gul et.al. | 2210.09021 | :mortar_board: | None |
2022-10-16 | Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers | Tao Tang et.al. | 2210.08458 | :mortar_board: | Code |
2022-10-16 | Scratching Visual Transformer’s Back with Uniform Attention | Nam Hyeon-Woo et.al. | 2210.08457 | :mortar_board: | None |
2022-10-15 | Transformer-based dimensionality reduction | Ruisheng Ran et.al. | 2210.08288 | :mortar_board: | None |
2022-10-15 | Distributionally Robust Multiclass Classification and Applications in Deep Image Classifiers | Ruidi Chen et.al. | 2210.08198 | :mortar_board: | None |
2022-10-15 | Linear Video Transformer with Feature Fixation | Kaiyue Lu et.al. | 2210.08164 | :mortar_board: | None |
2022-10-14 | Optimizing Vision Transformers for Medical Image Segmentation and Few-Shot Domain Adaptation | Qianying Liu et.al. | 2210.08066 | :mortar_board: | None |
2022-10-14 | Vision Transformer Visualization: What Neurons Tell and How Neurons Behave? | Van-Anh Nguyen et.al. | 2210.07646 | :mortar_board: | Code |
2022-10-14 | When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture | Yichuan Mo et.al. | 2210.07540 | :mortar_board: | Code |
2022-10-13 | How to Train Vision Transformer on Small-scale Datasets? | Hanan Gani et.al. | 2210.07240 | :mortar_board: | Code |
2022-10-13 | Feature-Proxy Transformer for Few-Shot Segmentation | Jian-Wei Zhang et.al. | 2210.06908 | :mortar_board: | Code |
2022-10-13 | Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer | Yanjing Li et.al. | 2210.06707 | :mortar_board: | Code |
2022-10-12 | S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces | Eric Nguyen et.al. | 2210.06583 | :mortar_board: | None |
2022-10-12 | Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers | Jochem Loedeman et.al. | 2210.06466 | :mortar_board: | Code |
2022-10-12 | Token-Label Alignment for Vision Transformers | Han Xiao et.al. | 2210.06455 | :mortar_board: | Code |
2022-10-12 | Foundation Transformers | Hongyu Wang et.al. | 2210.06423 | :mortar_board: | None |
2022-10-12 | Distilling Knowledge from Language Models for Video-based Action Anticipation | Sayontan Ghosh et.al. | 2210.05991 | :mortar_board: | None |
2022-10-12 | GGViT:Multistream Vision Transformer Network in Face2Face Facial Reenactment Detection | Haotian Wu et.al. | 2210.05990 | :mortar_board: | None |
2022-10-12 | Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets | Zhiying Lu et.al. | 2210.05958 | :mortar_board: | Code |
2022-10-12 | Towards Theoretically Inspired Neural Initialization Optimization | Yibo Yang et.al. | 2210.05956 | :mortar_board: | Code |
2022-10-12 | Dynamic Clustering Network for Unsupervised Semantic Segmentation | Kehan Li et.al. | 2210.05944 | :mortar_board: | None |
2022-10-12 | SegViT: Semantic Segmentation with Plain Vision Transformers | Bowen Zhang et.al. | 2210.05844 | :mortar_board: | None |
2022-10-11 | SaiT: Sparse Vision Transformers through Adaptive Token Pruning | Ling Li et.al. | 2210.05832 | :mortar_board: | None |
2022-10-11 | OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions | Chengkun Wang et.al. | 2210.05557 | :mortar_board: | Code |
2022-10-11 | What does a deep neural network confidently perceive? The effective dimension of high certainty class manifolds and their low confidence boundaries | Stanislav Fort et.al. | 2210.05546 | :mortar_board: | Code |
2022-10-11 | UGformer for Robust Left Atrium and Scar Segmentation Across Scanners | Tianyi Liu et.al. | 2210.05151 | :mortar_board: | None |
2022-10-10 | Revisiting adapters with adversarial training | Sylvestre-Alvise Rebuffi et.al. | 2210.04886 | :mortar_board: | None |
2022-10-10 | Visual Prompt Tuning for Test-time Domain Adaptation | Yunhe Gao et.al. | 2210.04831 | :mortar_board: | None |
2022-10-09 | Students taught by multimodal teachers are superior action recognizers | Gorjan Radevski et.al. | 2210.04331 | :mortar_board: | None |
2022-10-09 | Strong Gravitational Lensing Parameter Estimation with Vision Transformer | Kuan-Wei Huang et.al. | 2210.04143 | :mortar_board: | Code |
2022-10-08 | Fast-ParC: Position Aware Global Kernel for ConvNets and ViTs | Tao Yang et.al. | 2210.04020 | :mortar_board: | None |
2022-10-07 | Game-Theoretic Understanding of Misclassification | Kosuke Sumiyasu et.al. | 2210.03349 | :mortar_board: | None |
2022-10-07 | Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks | Yen-Cheng Liu et.al. | 2210.03265 | :mortar_board: | None |
2022-10-06 | Gastrointestinal Disorder Detection with a Transformer Based Approach | A. K. M. Salman Hosain et.al. | 2210.03168 | :mortar_board: | None |
2022-10-06 | Real-World Robot Learning with Masked Visual Pre-training | Ilija Radosavovic et.al. | 2210.03109 | :mortar_board: | None |
2022-10-06 | Structure Representation Network and Uncertainty Feedback Learning for Dense Non-Uniform Fog Removal | Yeying Jin et.al. | 2210.03061 | :mortar_board: | Code |
2022-10-06 | SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data | Ching-Yun Ko et.al. | 2210.02989 | :mortar_board: | None |
2022-10-06 | The Lie Derivative for Measuring Learned Equivariance | Nate Gruver et.al. | 2210.02984 | :mortar_board: | Code |
2022-10-06 | Vision Transformer Based Model for Describing a Set of Images as a Story | Zainy M. Malakan et.al. | 2210.02762 | :mortar_board: | None |
2022-10-05 | Centralized Feature Pyramid for Object Detection | Yu Quan et.al. | 2210.02093 | :mortar_board: | Code |
2022-10-05 | Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders | Youngwan Lee et.al. | 2210.02077 | :mortar_board: | None |
2022-10-04 | Multi-view Human Body Mesh Translator | Xiangjian Jiang et.al. | 2210.01886 | :mortar_board: | None |
2022-10-04 | Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling | Yunsung Lee et.al. | 2210.01370 | :mortar_board: | None |
2022-10-03 | Introducing Vision Transformer for Alzheimer’s Disease classification task with 3D input | Zilun Zhang et.al. | 2210.01177 | :mortar_board: | None |
2022-10-03 | Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning | Weicong Liang et.al. | 2210.01035 | :mortar_board: | None |
2022-10-03 | Visual Prompt Tuning for Generative Transfer Learning | Kihyuk Sohn et.al. | 2210.00990 | :mortar_board: | None |
2022-10-03 | Attention Distillation: self-supervised vision transformer students need more guidance | Kai Wang et.al. | 2210.00944 | :mortar_board: | None |
2022-10-03 | A Strong Transfer Baseline for RGB-D Fusion in Vision Transformers | Georgios Tziafas et.al. | 2210.00843 | :mortar_board: | None |
2022-10-02 | Deep-OCTA: Ensemble Deep Learning Approaches for Diabetic Retinopathy Analysis on OCTA Images | Junlin Hou et.al. | 2210.00515 | :mortar_board: | Code |
2022-10-01 | CAST: Concurrent Recognition and Segmentation with Adaptive Segment Tokens | Tsung-Wei Ke et.al. | 2210.00314 | :mortar_board: | None |
2022-10-01 | EAPruning: Evolutionary Pruning for Vision Transformers and CNNs | Qingyuan Li et.al. | 2210.00181 | :mortar_board: | None |
2022-09-30 | Impact of Face Image Quality Estimation on Presentation Attack Detection | Carlos Aravena et.al. | 2209.15489 | :mortar_board: | None |
2022-09-30 | Diffusion-based Image Translation using Disentangled Style and Content Representation | Gihyun Kwon et.al. | 2209.15264 | :mortar_board: | Code |
2022-09-30 | Dual Progressive Transformations for Weakly Supervised Semantic Segmentation | Dongjian Huo et.al. | 2209.15211 | :mortar_board: | Code |
2022-09-30 | MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features | Shakti N. Wadekar et.al. | 2209.15159 | :mortar_board: | Code |
2022-09-29 | 3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation | Ho Hin Lee et.al. | 2209.15076 | :mortar_board: | Code |
2022-09-29 | Effective Vision Transformer Training: A Data-Centric Perspective | Benjia Zhou et.al. | 2209.15006 | :mortar_board: | None |
2022-09-29 | Dilated Neighborhood Attention Transformer | Ali Hassani et.al. | 2209.15001 | :mortar_board: | Code |
2022-09-28 | UNesT: Local Spatial Representation Learning with Hierarchical Transformer for Efficient Medical Segmentation | Xin Yu et.al. | 2209.14378 | :mortar_board: | Code |
2022-09-28 | 360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance | Shreyas Kulkarni et.al. | 2209.14265 | :mortar_board: | Code |
2022-09-28 | Exploring the Relationship between Architecture and Adversarially Robust Generalization | Shiyu Tang et.al. | 2209.14105 | :mortar_board: | None |
2022-09-28 | Motion Transformer for Unsupervised Image Animation | Jiale Tao et.al. | 2209.14024 | :mortar_board: | Code |
2022-09-28 | DeViT: Deformed Vision Transformers in Video Inpainting | Jiayin Cai et.al. | 2209.13925 | :mortar_board: | None |
2022-09-28 | Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention | Xiangcheng Liu et.al. | 2209.13802 | :mortar_board: | None |
2022-09-28 | Attacking Compressed Vision Transformers | Swapnil Parekh et.al. | 2209.13785 | :mortar_board: | None |
2022-09-28 | MTU-Net: Multi-level TransUNet for Space-based Infrared Tiny Ship Detection | Tianhao Wu et.al. | 2209.13756 | :mortar_board: | Code |
2022-09-27 | FG-UAP: Feature-Gathering Universal Adversarial Perturbation | Zhixing Ye et.al. | 2209.13113 | :mortar_board: | None |
2022-09-26 | Generalized Parametric Contrastive Learning | Jiequan Cui et.al. | 2209.12400 | :mortar_board: | Code |
2022-09-25 | All are Worth Words: a ViT Backbone for Score-based Diffusion Models | Fan Bao et.al. | 2209.12152 | :mortar_board: | None |
2022-09-23 | Wide-Area Geolocalization with a Limited Field of View Camera | Lena M. Downes et.al. | 2209.11854 | :mortar_board: | None |
2022-09-23 | NasHD: Efficient ViT Architecture Performance Ranking using Hyperdimensional Computing | Dongning Ma et.al. | 2209.11356 | :mortar_board: | None |
2022-09-22 | Colonoscopy Landmark Detection using Vision Transformers | Aniruddha Tamhane et.al. | 2209.11304 | :mortar_board: | None |
2022-09-20 | Traffic Accident Risk Forecasting using Contextual Vision Transformers | Khaled Saleh et.al. | 2209.11180 | :mortar_board: | None |
2022-09-22 | Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning | Manuel Goulão et.al. | 2209.10901 | :mortar_board: | Code |
2022-09-21 | PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification | Wenhao Tang et.al. | 2209.10074 | :mortar_board: | None |
2022-09-19 | Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection | Yunsheng Ma et.al. | 2209.09178 | :mortar_board: | Code |
2022-09-19 | Panoramic Vision Transformer for Saliency Detection in 360° Videos | Heeseung Yun et.al. | 2209.08956 | :mortar_board: | None |
2022-09-19 | Estimating Brain Age with Global and Local Dependencies | Yanwu Yang et.al. | 2209.08933 | :mortar_board: | None |
2022-09-19 | HiMFR: A Hybrid Masked Face Recognition Through Face Inpainting | Md Imran Hosen et.al. | 2209.08930 | :mortar_board: | Code |
2022-09-19 | Attentive Symmetric Autoencoder for Brain MRI Segmentation | Junjia Huang et.al. | 2209.08887 | :mortar_board: | Code |
2022-09-19 | Axially Expanded Windows for Local-Global Interaction in Vision Transformers | Zhemin Zhang et.al. | 2209.08726 | :mortar_board: | None |
2022-09-19 | Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification | Syeda Nyma Ferdous et.al. | 2209.08686 | :mortar_board: | None |
2022-09-16 | PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation | Haoyu Ma et.al. | 2209.08194 | :mortar_board: | Code |
2022-09-16 | Quantum Vision Transformers | El Amine Cherrat et.al. | 2209.08167 | :mortar_board: | None |
2022-09-16 | Self-Supervised Learning of Phenotypic Representations from Cell Images with Weak Labels | Jan Oscar Cross-Zamirski et.al. | 2209.07819 | :mortar_board: | None |
2022-09-16 | ConvFormer: Closing the Gap Between CNN and Vision Transformers | Zimian Wei et.al. | 2209.07738 | :mortar_board: | None |
2022-09-16 | A Mosquito is Worth 16x16 Larvae: Evaluation of Deep Learning Architectures for Mosquito Larvae Classification | Aswin Surya et.al. | 2209.07718 | :mortar_board: | Code |
2022-09-16 | Hybrid Window Attention Based Transformer Architecture for Brain Tumor Segmentation | Himashi Peiris et.al. | 2209.07704 | :mortar_board: | Code |
2022-09-15 | Medical Image Segmentation using LeViT-UNet++: A Case Study on GI Tract Data | Praneeth Nemani et.al. | 2209.07515 | :mortar_board: | None |
2022-09-15 | Hydra Attention: Efficient Attention with Many Heads | Daniel Bolya et.al. | 2209.07484 | :mortar_board: | None |
2022-09-15 | On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition | Farrukh Rahman et.al. | 2209.07474 | :mortar_board: | None |
2022-09-15 | A Light Recipe to Train Robust Vision Transformers | Edoardo Debenedetti et.al. | 2209.07399 | :mortar_board: | Code |
2022-09-15 | Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer? | Yi Wang et.al. | 2209.07026 | :mortar_board: | Code |
2022-09-15 | PriorLane: A Prior Knowledge Enhanced Lane Detection Approach Based on Transformer | Qibo Qiu et.al. | 2209.06994 | :mortar_board: | Code |
2022-09-14 | On the interplay of adversarial robustness and architecture components: patches, convolution and attention | Francesco Croce et.al. | 2209.06953 | :mortar_board: | None |
2022-09-14 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | Xi Chen et.al. | 2209.06794 | :mortar_board: | None |
2022-09-14 | Transformers and CNNs both Beat Humans on SBIR | Omar Seddati et.al. | 2209.06629 | :mortar_board: | None |
2022-09-13 | DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer | Dafeng Zhang et.al. | 2209.06040 | :mortar_board: | None |
2022-09-13 | A lightweight Transformer-based model for fish landmark detection | Alzayat Saleh et.al. | 2209.05777 | :mortar_board: | None |
2022-09-13 | Vision Transformers for Action Recognition: A Survey | Anwaar Ulhaq et.al. | 2209.05700 | :mortar_board: | None |
2022-09-13 | PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers | Zhikai Li et.al. | 2209.05687 | :mortar_board: | Code |
2022-09-13 | ComplETR: Reducing the cost of annotations for object detection in dense scenes with vision transformers | Achin Jain et.al. | 2209.05654 | :mortar_board: | None |
2022-09-07 | Transfer Learning and Vision Transformer based State-of-Health prediction of Lithium-Ion Batteries | Pengyu Fu et.al. | 2209.05253 | :mortar_board: | None |
2022-09-12 | Vision Transformer with Convolutional Encoder-Decoder for Hand Gesture Recognition using 24 GHz Doppler Radar | Kavinda Kehelella et.al. | 2209.05032 | :mortar_board: | None |
2022-09-09 | EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography | Rand Muhtaseb et.al. | 2209.04242 | :mortar_board: | None |
2022-09-07 | Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers | Kevin Miao et.al. | 2209.03745 | :mortar_board: | None |
2022-09-08 | Multi-Granularity Prediction for Scene Text Recognition | Peng Wang et.al. | 2209.03592 | :mortar_board: | None |
2022-09-08 | Video Vision Transformers for Violence Detection | Sanskar Singh et.al. | 2209.03561 | :mortar_board: | None |
2022-09-07 | Securing the Spike: On the Transferabilty and Security of Spiking Neural Networks to Adversarial Examples | Nuo Xu et.al. | 2209.03358 | :mortar_board: | None |
2022-09-06 | Fusion of Satellite Images and Weather Data with Transformer Networks for Downy Mildew Disease Detection | William Maillet et.al. | 2209.02797 | :mortar_board: | None |
2022-09-06 | ViTKD: Practical Guidelines for ViT feature knowledge distillation | Zhendong Yang et.al. | 2209.02432 | :mortar_board: | Code |
2022-09-06 | Transformer-CNN Cohort: Semi-supervised Semantic Segmentation by the Best of Both Students | Xu Zheng et.al. | 2209.02178 | :mortar_board: | None |
2022-09-04 | Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomography | Thomas Z. Li et.al. | 2209.01676 | :mortar_board: | Code |
2022-08-31 | MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition | Yunhao Wang et.al. | 2209.01620 | :mortar_board: | None |
2022-09-03 | Vision Transformers and YoloV5 based Driver Drowsiness Detection Framework | Ghanta Sai Krishna et.al. | 2209.01401 | :mortar_board: | None |
2022-09-02 | Transformers in Remote Sensing: A Survey | Abdulaziz Amer Aleissaee et.al. | 2209.01206 | :mortar_board: | None |
2022-08-31 | EViT: Privacy-Preserving Image Retrieval via Encrypted Vision Transformer in Cloud Computing | Qihua Feng et.al. | 2208.14657 | :mortar_board: | Code |
2022-08-31 | SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization | Hongbo Sun et.al. | 2208.14607 | :mortar_board: | Code |
2022-08-29 | Open-Set Semi-Supervised Object Detection | Yen-Cheng Liu et.al. | 2208.13722 | :mortar_board: | None |
2022-08-28 | An Unsupervised Learning-based Framework for Effective Representation Extraction of Reactor Accidents | Chengyuan Li et.al. | 2208.13147 | :mortar_board: | None |
2022-08-28 | ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers | Yutong Xie et.al. | 2208.13138 | :mortar_board: | None |
2022-08-28 | An Access Control Method with Secret Key for Semantic Segmentation Models | Teru Nagamori et.al. | 2208.13135 | :mortar_board: | None |
2022-08-27 | TrojViT: Trojan Insertion in Vision Transformers | Mengxin Zheng et.al. | 2208.13049 | :mortar_board: | None |
2022-08-26 | VMFormer: End-to-End Video Matting with Transformer | Jiachen Li et.al. | 2208.12801 | :mortar_board: | None |
2022-08-24 | On a Built-in Conflict between Deep Learning and Systematic Generalization | Yuanpeng Li et.al. | 2208.11633 | :mortar_board: | Code |
2022-08-20 | An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics | Aly Mostafa et.al. | 2208.11484 | :mortar_board: | None |
2022-08-24 | A Deep Learning Approach Using Masked Image Modeling for Reconstruction of Undersampled K-spaces | Kyler Larsen et.al. | 2208.11472 | :mortar_board: | Code |
2022-08-24 | Federated Self-Supervised Contrastive Learning and Masked Autoencoder for Dermatological Disease Diagnosis | Yawen Wu et.al. | 2208.11278 | :mortar_board: | None |
2022-08-23 | FocusFormer: Focusing on What We Need via Architecture Sampler | Jing Liu et.al. | 2208.10861 | :mortar_board: | None |
2022-08-22 | Predicting microsatellite instability and key biomarkers in colorectal cancer from H&E-stained images: Achieving SOTA with Less Data using Swin Transformer | Bangwei Guo et.al. | 2208.10495 | :mortar_board: | None |
2022-08-22 | ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition | Mengqi Xue et.al. | 2208.10431 | :mortar_board: | Code |
2022-08-20 | Analyzing Adversarial Robustness of Vision Transformers against Spatial and Spectral Attacks | Gihyun Kim et.al. | 2208.09602 | :mortar_board: | None |
2022-08-19 | A Dual Modality Approach For (Zero-Shot) Multi-Label Classification | Shichao Xu et.al. | 2208.09562 | :mortar_board: | None |
2022-08-19 | Accelerating Vision Transformer Training via a Patch Sampling Schedule | Bradley McDanel et.al. | 2208.09520 | :mortar_board: | Code |
2022-08-18 | The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs | Chris Rockwell et.al. | 2208.08988 | :mortar_board: | None |
2022-08-18 | Prompt Vision Transformer for Domain Generalization | Zangwei Zheng et.al. | 2208.08914 | :mortar_board: | None |
2022-08-17 | Conviformers: Convolutionally guided Vision Transformer | Mohit Vaishnav et.al. | 2208.08900 | :mortar_board: | None |
2022-08-17 | Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation | Chengxi Zeng et.al. | 2208.08315 | :mortar_board: | Code |
2022-08-17 | Transformer Vs. MLP-Mixer Exponential Expressive Gap For NLP Problems | Dan Navon et.al. | 2208.08191 | :mortar_board: | None |
2022-08-17 | Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs | Finn Behrendt et.al. | 2208.08166 | :mortar_board: | None |
2022-08-16 | ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos | James Wensel et.al. | 2208.07929 | :mortar_board: | None |
2022-08-16 | Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model | Xiulong Yang et.al. | 2208.07791 | :mortar_board: | None |
2022-08-10 | PatchDropout: Economizing Vision Transformers Using Patch Dropout | Yue Liu et.al. | 2208.07220 | :mortar_board: | None |
2022-08-15 | A Vision Transformer-Based Approach to Bearing Fault Classification via Vibration Signals | Abid Hasan Zim et.al. | 2208.07070 | :mortar_board: | None |
2022-08-15 | Self-Supervised Vision Transformers for Malware Detection | Sachith Seneviratne et.al. | 2208.07049 | :mortar_board: | Code |
2022-08-14 | Shuffle Instances-based Vision Transformer for Pancreatic Cancer ROSE Image Classification | Tianyi Zhang et.al. | 2208.06833 | :mortar_board: | Code |
2022-08-13 | Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models | Xingyu Xie et.al. | 2208.06677 | :mortar_board: | None |
2022-08-12 | When CNN Meet with ViT: Towards Semi-Supervised Learning for Multi-Class Medical Image Semantic Segmentation | Ziyang Wang et.al. | 2208.06449 | :mortar_board: | Code |
2022-08-12 | BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers | Zhiliang Peng et.al. | 2208.06366 | :mortar_board: | None |
2022-08-11 | Shifted Windows Transformers for Medical Image Quality Assessment | Caner Ozer et.al. | 2208.06034 | :mortar_board: | None |
2022-08-11 | Semi-supervised Vision Transformers at Scale | Zhaowei Cai et.al. | 2208.05688 | :mortar_board: | None |
2022-08-10 | Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization | Zhengang Li et.al. | 2208.05163 | :mortar_board: | None |
2022-08-10 | Ghost-free High Dynamic Range Imaging with Context-aware Transformer | Zhen Liu et.al. | 2208.05114 | :mortar_board: | Code |
2022-08-09 | CoViT: Real-time phylogenetics for the SARS-CoV-2 pandemic using Vision Transformers | Zuher Jahshan et.al. | 2208.05004 | :mortar_board: | Code |
2022-08-07 | U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration? | Xi Jia et.al. | 2208.04939 | :mortar_board: | None |
2022-08-09 | How Well Do Vision Transformers (VTs) Transfer To The Non-Natural Image Domain? An Empirical Study Involving Art Classification | Vincent Tonkes et.al. | 2208.04693 | :mortar_board: | Code |
2022-08-08 | Occlusion-Aware Instance Segmentation via BiLayer Network Architectures | Lei Ke et.al. | 2208.04438 | :mortar_board: | Code |
2022-08-08 | 3D Vision with Transformers: A Survey | Jean Lahoud et.al. | 2208.04309 | :mortar_board: | Code |
2022-08-08 | Understanding Masked Image Modeling via Learning Occlusion Invariant Feature | Xiangwen Kong et.al. | 2208.04164 | :mortar_board: | None |
2022-08-08 | Efficient Neural Net Approaches in Metal Casting Defect Detection | Rohit Lal et.al. | 2208.04150 | :mortar_board: | None |
2022-08-08 | Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model | Di Wang et.al. | 2208.03987 | :mortar_board: | Code |
2022-08-06 | MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer | Chaoqiang Zhao et.al. | 2208.03543 | :mortar_board: | Code |
2022-08-06 | Analysing the Memorability of a Procedural Crime-Drama TV Series, CSI | Sean Cummins et.al. | 2208.03479 | :mortar_board: | None |
2022-08-04 | Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification | Faris Almalik et.al. | 2208.02851 | :mortar_board: | Code |
2022-08-04 | DropKey | Bonan Li et.al. | 2208.02646 | :mortar_board: | None |
2022-08-04 | MVSFormer: Multi-View Stereo with Pre-trained Vision Transformers and Temperature-based Depth | Chenjie Cao et.al. | 2208.02541 | :mortar_board: | None |
2022-08-03 | GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning | Benyuan Sun et.al. | 2208.02148 | :mortar_board: | None |
2022-08-03 | SSformer: A Lightweight Transformer for Semantic Segmentation | Wentao Shi et.al. | 2208.02034 | :mortar_board: | Code |
2022-08-03 | Multi-Feature Vision Transformer via Self-Supervised Representation Learning for Improvement of COVID-19 Diagnosis | Xiao Qi et.al. | 2208.01843 | :mortar_board: | Code |
2022-08-03 | Learning Prior Feature and Attention Enhanced Image Inpainting | Chenjie Cao et.al. | 2208.01837 | :mortar_board: | Code |
2022-08-02 | Two-Stream Transformer Architecture for Long Video Understanding | Edward Fish et.al. | 2208.01753 | :mortar_board: | None |
2022-08-02 | A Novel Transformer Network with Shifted Window Cross-Attention for Spatiotemporal Weather Forecasting | Alabi Bojesomo et.al. | 2208.01252 | :mortar_board: | None |
2022-08-01 | Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem | Zheng Wang et.al. | 2208.00906 | :mortar_board: | Code |
2022-07-25 | : Debiased Dual Distilled Transformer for Incremental Learning | Abdelrahman Mohamed et.al. | 2208.00777 | :mortar_board: | None |
2022-08-01 | TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation | Reza Azad et.al. | 2208.00713 | :mortar_board: | Code |
2022-07-29 | Restoring Vision in Adverse Weather Conditions with Patch-Based Denoising Diffusion Models | Ozan Özdenizci et.al. | 2207.14626 | :mortar_board: | Code |
2022-07-29 | ScaleFormer: Revisiting the Transformer-based Backbones from a Scale-wise Perspective for Medical Image Segmentation | Huimin Huang et.al. | 2207.14552 | :mortar_board: | None |
2022-07-28 | HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions | Yongming Rao et.al. | 2207.14284 | :mortar_board: | Code |
2022-07-28 | DnSwin: Toward Real-World Denoising via Continuous Wavelet Sliding-Transformer | Hao Li et.al. | 2207.13861 | :mortar_board: | None |
2022-07-24 | Online Continual Learning with Contrastive Vision Transformer | Zhen Wang et.al. | 2207.13516 | :mortar_board: | None |
2022-07-27 | Deep Clustering with Features from Self-Supervised Pretraining | Xingzhi Zhou et.al. | 2207.13364 | :mortar_board: | None |
2022-07-27 | Convolutional Embedding Makes Hierarchical Vision Transformer Stronger | Cong Wang et.al. | 2207.13317 | :mortar_board: | None |
2022-07-25 | Self-Distilled Vision Transformer for Domain Generalization | Maryam Sultana et.al. | 2207.12392 | :mortar_board: | Code |
2022-07-22 | Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers | Samay Lakhani et.al. | 2207.12148 | :mortar_board: | None |
2022-07-25 | Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer | Yingyi Chen et.al. | 2207.11971 | :mortar_board: | None |
2022-07-25 | Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation | Jiaming Zhang et.al. | 2207.11860 | :mortar_board: | Code |
2022-07-24 | Improved Super Resolution of MR Images Using CNNs and Vision Transformers | Dwarikanath Mahapatra et.al. | 2207.11748 | :mortar_board: | None |
2022-07-24 | Affective Behaviour Analysis Using Pretrained Model with Facial Priori | Yifan Li et.al. | 2207.11679 | :mortar_board: | None |
2022-07-24 | MAR: Masked Autoencoders for Efficient Action Recognition | Zhiwu Qing et.al. | 2207.11660 | :mortar_board: | None |
2022-07-22 | Facial Expression Recognition using Vanilla ViT backbones with MAE Pretraining | Jia Li et.al. | 2207.11081 | :mortar_board: | None |
2022-07-21 | Focused Decoding Enables 3D Anatomical Detection by Transformers | Bastian Wittmann et.al. | 2207.10774 | :mortar_board: | Code |
2022-07-21 | TinyViT: Fast Pretraining Distillation for Small Vision Transformers | Kan Wu et.al. | 2207.10666 | :mortar_board: | Code |
2022-07-21 | Towards Efficient Adversarial Training on Vision Transformers | Boxi Wu et.al. | 2207.10498 | :mortar_board: | None |
2022-07-21 | An Efficient Spatio-Temporal Pyramid Transformer for Action Detection | Yuetian Weng et.al. | 2207.10448 | :mortar_board: | None |
2022-07-21 | A Wavelet Transform and self-supervised learning-based framework for bearing fault diagnosis with limited labeled data | Yuhong Jin et.al. | 2207.10432 | :mortar_board: | None |
2022-07-21 | SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks | Chien-Yu Lin et.al. | 2207.10237 | :mortar_board: | Code |
2022-07-20 | MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis | Yaqian Liang et.al. | 2207.10228 | :mortar_board: | None |
2022-07-20 | Locality Guidance for Improving Vision Transformers on Tiny Datasets | Kehan Li et.al. | 2207.10026 | :mortar_board: | Code |
2022-07-20 | ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network | Nikolaos Gkalelis et.al. | 2207.09927 | :mortar_board: | None |
2022-07-20 | Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks | Jianfeng Huang et.al. | 2207.09792 | :mortar_board: | None |
2022-07-20 | AU-Supervised Convolutional Vision Transformers for Synthetic Facial Expression Recognition | Shuyi Mao et.al. | 2207.09777 | :mortar_board: | Code |
2022-07-20 | On the Versatile Uses of Partial Distance Correlation in Deep Learning | Xingjian Zhen et.al. | 2207.09684 | :mortar_board: | Code |
2022-07-19 | Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for COVID-19 Screening With Chest Radiography | Kai Ma et.al. | 2207.09312 | :mortar_board: | None |
2022-07-18 | Is Integer Arithmetic Enough for Deep Learning Training? | Alireza Ghaffari et.al. | 2207.08822 | :mortar_board: | None |
2022-07-18 | Adversarial Pixel Restoration as a Pretext Task for Transferable Perturbations | Hashmat Shadab Malik et.al. | 2207.08803 | :mortar_board: | Code |
2022-07-18 | Multi-manifold Attention for Vision Transformers | Dimitrios Konstantinidis et.al. | 2207.08569 | :mortar_board: | None |
2022-07-18 | TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers | Jihao Liu et.al. | 2207.08409 | :mortar_board: | Code |
2022-07-17 | Security Evaluation of Compressible Image Encryption for Privacy-Preserving Image Classification against Ciphertext-only Attacks | Tatsuya Chuman et.al. | 2207.08109 | :mortar_board: | None |
2022-07-16 | SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection | Antonio Barbalau et.al. | 2207.08003 | :mortar_board: | None |
2022-07-16 | Explainable vision transformer enabled convolutional neural network for plant disease identification: PlantXViT | Poornima Singh Thakur et.al. | 2207.07919 | :mortar_board: | None |
2022-07-15 | Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP | Zhicai Wang et.al. | 2207.07284 | :mortar_board: | Code |
2022-07-15 | Lightweight Vision Transformer with Cross Feature Attention | Youpeng Zhao et.al. | 2207.07268 | :mortar_board: | None |
2022-07-14 | Convolutional Bypasses Are Better Vision Transformer Adapters | Shibo Jie et.al. | 2207.07039 | :mortar_board: | Code |
2022-07-14 | iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer | Sanghyeon Lee et.al. | 2207.06831 | :mortar_board: | None |
2022-07-14 | Deepfake Video Detection with Spatiotemporal Dropout Transformer | Daichi Zhang et.al. | 2207.06612 | :mortar_board: | None |
2022-07-13 | Trans4Map: Revisiting Holistic Top-down Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers | Chang Chen et.al. | 2207.06205 | :mortar_board: | Code |
2022-07-12 | Vision Transformer for NeRF-Based View Synthesis from a Single Input Image | Kai-En Lin et.al. | 2207.05736 | :mortar_board: | None |
2022-07-12 | MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing | Sixiang Chen et.al. | 2207.05621 | :mortar_board: | None |
2022-07-12 | LightViT: Towards Light-Weight Convolution-Free Vision Transformers | Tao Huang et.al. | 2207.05557 | :mortar_board: | Code |
2022-07-12 | Long-term Leap Attention, Short-term Periodic Shift for Video Classification | Hao Zhang et.al. | 2207.05526 | :mortar_board: | None |
2022-07-12 | Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios | Jiashi Li et.al. | 2207.05501 | :mortar_board: | None |
2022-07-12 | Image and Model Transformation with Secret Key for Vision Transformer | Hitoshi Kiya et.al. | 2207.05366 | :mortar_board: | None |
2022-07-12 | eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised Semantic Segmentation | Lu Yu et.al. | 2207.05358 | :mortar_board: | None |
2022-07-12 | Outpainting by Queries | Kai Yao et.al. | 2207.05312 | :mortar_board: | Code |
2022-07-12 | Trusted Multi-Scale Classification Framework for Whole Slide Image | Ming Feng et.al. | 2207.05290 | :mortar_board: | None |
2022-07-11 | Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning | Ting Yao et.al. | 2207.04978 | :mortar_board: | Code |
2022-07-11 | Dual Vision Transformer | Ting Yao et.al. | 2207.04976 | :mortar_board: | Code |
2022-07-11 | TNT: Vision Transformer for Turbulence Simulations | Yuchen Dang et.al. | 2207.04616 | :mortar_board: | None |
2022-07-10 | Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion | Ashutosh Agarwal et.al. | 2207.04535 | :mortar_board: | Code |
2022-07-10 | Facilitated machine learning for image-based fruit quality assessment in developing countries | Manuel Knott et.al. | 2207.04523 | :mortar_board: | None |
2022-07-08 | Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain | Tong Zhang et.al. | 2207.03860 | :mortar_board: | None |
2022-07-08 | VidConv: A modernized 2D ConvNet for Efficient Video Recognition | Chuong H. Nguyen et.al. | 2207.03782 | :mortar_board: | None |
2022-07-07 | More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity | Shiwei Liu et.al. | 2207.03620 | :mortar_board: | Code |
2022-07-05 | Softmax-free Linear Transformers | Jiachen Lu et.al. | 2207.03341 | :mortar_board: | Code |
2022-07-07 | Vision Transformers: State of the Art and Research Challenges | Bo-Kai Ruan et.al. | 2207.03041 | :mortar_board: | None |
2022-07-05 | Generalization to translation shifts: a study in architectures and augmentations | Suriya Gunasekar et.al. | 2207.02349 | :mortar_board: | None |
2022-07-05 | TractoFormer: A Novel Fiber-level Whole Brain Tractography Analysis Framework Using Spectral Embedding and Vision Transformers | Fan Zhang et.al. | 2207.02327 | :mortar_board: | None |
2022-07-05 | Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention | Gary Leung et.al. | 2207.02126 | :mortar_board: | None |
2022-07-05 | Transformer based Models for Unsupervised Anomaly Segmentation in Brain MR Images | Ahmed Ghorbel et.al. | 2207.02059 | :mortar_board: | Code |
2022-07-05 | CNN-based Local Vision Transformer for COVID-19 Diagnosis | Hongyan Xu et.al. | 2207.02027 | :mortar_board: | None |
2022-07-04 | Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks | Yongming Rao et.al. | 2207.01580 | :mortar_board: | Code |
2022-07-04 | I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference | Zhikai Li et.al. | 2207.01405 | :mortar_board: | None |
2022-07-03 | You Only Need One Detector: Unified Object Detector for Different Modalities based on Vision Transformers | Xiaoke Shen et.al. | 2207.01071 | :mortar_board: | None |
2022-07-01 | Polarized Color Image Denoising using Pocoformer | Zhuoxiao Li et.al. | 2207.00215 | :mortar_board: | None |
2022-07-01 | Rethinking Query-Key Pairwise Interactions in Vision Transformers | Cheng Li et.al. | 2207.00188 | :mortar_board: | None |
2022-06-30 | PVT-COV19D: Pyramid Vision Transformer for COVID-19 Diagnosis | Lilang Zheng et.al. | 2206.15069 | :mortar_board: | None |
2022-06-29 | LViT: Language meets Vision Transformer in Medical Image Segmentation | Zihan Li et.al. | 2206.14718 | :mortar_board: | Code |
2022-06-29 | The Lighter The Better: Rethinking Transformers in Medical Image Segmentation Through Adaptive Pruning | Xian Lin et.al. | 2206.14413 | :mortar_board: | Code |
2022-06-28 | Masked World Models for Visual Control | Younggyo Seo et.al. | 2206.14244 | :mortar_board: | None |
2022-06-28 | Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment | Takeshi Kojima et.al. | 2206.13951 | :mortar_board: | Code |
2022-06-28 | Cross-Forgery Analysis of Vision Transformers and CNNs for Deepfake Image Detection | Davide Alessandro Coccomini et.al. | 2206.13829 | :mortar_board: | None |
2022-06-23 | QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer | Jinmiao Huang et.al. | 2206.13231 | :mortar_board: | None |
2022-06-27 | Video2StyleGAN: Encoding Video in Latent Space for Manipulation | Jiyang Yu et.al. | 2206.13078 | :mortar_board: | None |
2022-06-26 | Vision Transformer for Contrastive Clustering | Hua-Bao Ling et.al. | 2206.12925 | :mortar_board: | None |
2022-06-24 | Defending Backdoor Attacks on Vision Transformer via Patch Processing | Khoa D. Doan et.al. | 2206.12381 | :mortar_board: | None |
2022-06-22 | Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation | Ming Li et.al. | 2206.10845 | :mortar_board: | None |
2022-06-21 | Scaling up Kernels in 3D CNNs | Yukang Chen et.al. | 2206.10555 | :mortar_board: | Code |
2022-06-21 | Vicinity Vision Transformer | Weixuan Sun et.al. | 2206.10552 | :mortar_board: | Code |
2022-06-21 | Faster Diffusion Cardiac MRI with Deep Learning-based breath hold reduction | Michael Tanzer et.al. | 2206.10543 | :mortar_board: | None |
2022-06-21 | Transformers Improve Breast Cancer Diagnosis from Unregistered Multi-View Mammograms | Xuxin Chen et.al. | 2206.10096 | :mortar_board: | None |
2022-06-20 | Global Context Vision Transformers | Ali Hatamizadeh et.al. | 2206.09959 | :mortar_board: | Code |
2022-06-19 | EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm | Jiangning Zhang et.al. | 2206.09325 | :mortar_board: | Code |
2022-06-18 | Replacing Labeled Real-image Datasets with Auto-generated Contours | Hirokatsu Kataoka et.al. | 2206.09132 | :mortar_board: | None |
2022-06-17 | SimA: Simple Softmax-free Attention for Vision Transformers | Soroush Abbasi Koohpayegani et.al. | 2206.08898 | :mortar_board: | Code |
2022-06-17 | Multi-Contextual Predictions with Vision Transformer for Video Anomaly Detection | Joo-Yeon Lee et.al. | 2206.08568 | :mortar_board: | None |
2022-06-17 | Rectify ViT Shortcut Learning by Visual Saliency | Chong Ma et.al. | 2206.08567 | :mortar_board: | None |
2022-06-16 | Backdoor Attacks on Vision Transformers | Akshayvarun Subramanya et.al. | 2206.08477 | :mortar_board: | Code |
2022-06-16 | IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes | Rui Zhu et.al. | 2206.08423 | :mortar_board: | None |
2022-06-16 | OmniMAE: Single Model Masked Pretraining on Images and Videos | Rohit Girdhar et.al. | 2206.08356 | :mortar_board: | None |
2022-06-16 | Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency | Viraj Prabhu et.al. | 2206.08222 | :mortar_board: | Code |
2022-06-16 | Patch-level Representation Learning for Self-supervised Vision Transformers | Sukmin Yun et.al. | 2206.07990 | :mortar_board: | Code |
2022-06-15 | What makes domain generalization hard? | Spandan Madan et.al. | 2206.07802 | :mortar_board: | None |
2022-06-15 | Masked Siamese ConvNets | Li Jing et.al. | 2206.07700 | :mortar_board: | None |
2022-06-15 | A Simple Data Mixing Prior for Improving Self-Supervised Learning | Sucheng Ren et.al. | 2206.07692 | :mortar_board: | Code |
2022-06-15 | SP-ViT: Learning 2D Spatial Priors for Vision Transformers | Yuxuan Zhou et.al. | 2206.07662 | :mortar_board: | None |
2022-06-15 | Rethinking Generalization in Few-Shot Classification | Markus Hiller et.al. | 2206.07267 | :mortar_board: | Code |
2022-06-14 | Stand-Alone Inter-Frame Attention in Video Models | Fuchen Long et.al. | 2206.06931 | :mortar_board: | Code |
2022-06-14 | Efficient Decoder-free Object Detection with Transformers | Peixian Chen et.al. | 2206.06829 | :mortar_board: | Code |
2022-06-14 | Peripheral Vision Transformer | Juhong Min et.al. | 2206.06801 | :mortar_board: | None |
2022-06-14 | Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO | Javier Rando et.al. | 2206.06761 | :mortar_board: | Code |
2022-06-14 | TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer | Jiajun Deng et.al. | 2206.06619 | :mortar_board: | Code |
2022-06-13 | Multimodal Learning with Transformers: A Survey | Peng Xu et.al. | 2206.06488 | :mortar_board: | None |
3D Representations
Publish Date | Title | Authors | arxiv | Code | |
---|---|---|---|---|---|
2023-10-23 | Ghost on the Shell: An Expressive Representation of General 3D Shapes | Zhen Liu et.al. | 2310.15168 | :mortar_board: | None |
2023-10-22 | Learning Generalizable Manipulation Policies with Object-Centric 3D Representations | Yifeng Zhu et.al. | 2310.14386 | :mortar_board: | None |
2023-10-18 | Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts | Xinhua Cheng et.al. | 2310.11784 | :mortar_board: | None |
2023-10-14 | JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues | Jiayi Ji et.al. | 2310.09503 | :mortar_board: | Code |
2023-10-12 | PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm | Haoyi Zhu et.al. | 2310.08586 | :mortar_board: | Code |
2023-10-11 | Orbital Polarimetric Tomography of a Flare Near the Sagittarius A Supermassive Black Hole* | Aviad Levis et.al. | 2310.07687 | :mortar_board: | None |
2023-10-10 | Uni3D: Exploring Unified 3D Representation at Scale | Junsheng Zhou et.al. | 2310.06773 | :mortar_board: | Code |
2023-09-29 | TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields | Tianyu Huang et.al. | 2309.17175 | :mortar_board: | None |
2023-09-29 | HAvatar: High-fidelity Head Avatar via Facial Model Conditioned Neural Radiance Field | Xiaochen Zhao et.al. | 2309.17128 | :mortar_board: | None |
2023-09-28 | ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning | Qiao Gu et.al. | 2309.16650 | :mortar_board: | None |
2023-09-26 | ITEM3D: Illumination-Aware Directional Texture Editing for 3D Models | Shengqi Liu et.al. | 2309.14872 | :mortar_board: | None |
2023-09-24 | MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field | Zijiang Yang et.al. | 2309.13607 | :mortar_board: | None |
2023-09-19 | SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving | Xiangchao Yan et.al. | 2309.10527 | :mortar_board: | Code |
2023-09-14 | Large-Vocabulary 3D Diffusion Model with Transformer | Ziang Cao et.al. | 2309.07920 | :mortar_board: | None |
2023-09-14 | CoRF : Colorizing Radiance Fields using Knowledge Distillation | Ankit Dhiman et.al. | 2309.07668 | :mortar_board: | None |
2023-09-12 | Learning Disentangled Avatars with Hybrid 3D Representations | Yao Feng et.al. | 2309.06441 | :mortar_board: | None |
2023-09-11 | Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips | Yufei Ye et.al. | 2309.05663 | :mortar_board: | None |
2023-09-11 | PAg-NeRF: Towards fast and efficient end-to-end panoptic 3D representations for agricultural robotics | Claus Smitt et.al. | 2309.05339 | :mortar_board: | None |
2023-09-10 | 3D Implicit Transporter for Temporally Consistent Keypoint Discovery | Chengliang Zhong et.al. | 2309.05098 | :mortar_board: | Code |
2023-09-09 | Graph Vertex Model | Tanmoy Sarkar et.al. | 2309.04818 | :mortar_board: | Code |
2023-09-04 | Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization | Xianghui Yang et.al. | 2309.01512 | :mortar_board: | None |
2023-08-14 | OccNet: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions | Miao Fan et.al. | 2308.16160 | :mortar_board: | None |
2023-08-28 | HoloFusion: Towards Photo-realistic 3D Generative Modeling | Animesh Karnewar et.al. | 2308.14244 | :mortar_board: | None |
2023-08-27 | Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views | Zi-Xin Zou et.al. | 2308.14078 | :mortar_board: | None |
2023-08-21 | UniMAE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving | Jian Zou et.al. | 2308.10421 | :mortar_board: | Code |
2023-08-20 | Strata-NeRF : Neural Radiance Fields for Stratified Scenes | Ankit Dhiman et.al. | 2308.10337 | :mortar_board: | None |
2023-08-18 | Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training | Xiaoyang Wu et.al. | 2308.09718 | :mortar_board: | Code |
2023-08-18 | Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition | Xuanyu Yi et.al. | 2308.09694 | :mortar_board: | None |
2023-08-18 | MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection | Junkai Xu et.al. | 2308.09421 | :mortar_board: | Code |
2023-08-17 | Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes | Zehan Wang et.al. | 2308.08769 | :mortar_board: | None |
2023-08-14 | 3D Analytics: Opportunities and Guidelines for Information Systems Research | Gunther Gust et.al. | 2308.08560 | :mortar_board: | None |
2023-08-16 | TeCH: Text-guided Reconstruction of Lifelike Clothed Humans | Yangyi Huang et.al. | 2308.08545 | :mortar_board: | Code |
2023-08-14 | Neural radiance fields in the industrial and robotics domain: applications, research opportunities and use cases | Eugen Šlapak et.al. | 2308.07118 | :mortar_board: | Code |
2023-08-10 | FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models | Guangkai Xu et.al. | 2308.05733 | :mortar_board: | None |
2023-08-06 | Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation | Haowei Wang et.al. | 2308.02982 | :mortar_board: | Code |
2023-08-05 | Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis | Yuxin Wang et.al. | 2308.02840 | :mortar_board: | None |
2023-08-05 | NeRFs: The Search for the Best 3D Representation | Ravi Ramamoorthi et.al. | 2308.02751 | :mortar_board: | None |
2023-07-28 | VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation | Zekun Qi et.al. | 2307.16605 | :mortar_board: | Code |
2023-07-31 | JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery | Jiahao Li et.al. | 2307.16377 | :mortar_board: | Code |
2023-07-27 | Learning Full-Head 3D GANs from a Single-View Portrait Dataset | Yiqian Wu et.al. | 2307.14770 | :mortar_board: | None |
2023-07-20 | PAPR: Proximity Attention Point Rendering | Yanshu Zhang et.al. | 2307.11086 | :mortar_board: | None |
2023-07-18 | Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells | Xinyi Ye et.al. | 2307.09160 | :mortar_board: | Code |
2023-07-18 | NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF | Stefan Lionar et.al. | 2307.09112 | :mortar_board: | None |
2023-07-12 | Semantic Communications System with Model Division Multiple Access and Controllable Coding Rate for Point Cloud | Xiaoyi Liu et.al. | 2307.06027 | :mortar_board: | None |
2023-07-11 | Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives | Tom Monnier et.al. | 2307.05473 | :mortar_board: | None |
2023-06-28 | Points for Energy Renovation (PointER): A LiDAR-Derived Point Cloud Dataset of One Million English Buildings Linked to Energy Characteristics | Sebastian Krapf et.al. | 2306.16020 | :mortar_board: | Code |
2023-06-27 | Meshes Meet Voxels: Abdominal Organ Segmentation via Diffeomorphic Deformations | Fabian Bongratz et.al. | 2306.15515 | :mortar_board: | None |
2023-06-26 | RVT: Robotic View Transformer for 3D Object Manipulation | Ankit Goyal et.al. | 2306.14896 | :mortar_board: | Code |
2023-06-19 | UniG3D: A Unified 3D Object Generation Dataset | Qinghong Sun et.al. | 2306.10730 | :mortar_board: | None |
2023-06-15 | CAD-Estate: Large-scale CAD Model Annotation in RGB Videos | Kevis-Kokitsi Maninis et.al. | 2306.09011 | :mortar_board: | None |
2023-06-11 | On the Efficacy of 3D Point Cloud Reinforcement Learning | Zhan Ling et.al. | 2306.06799 | :mortar_board: | Code |
2023-06-09 | GANeRF: Leveraging Discriminators to Optimize Neural Radiance Fields | Barbara Roessle et.al. | 2306.06044 | :mortar_board: | None |
2023-06-08 | Tracking Objects with 3D Representation from Videos | Jiawei He et.al. | 2306.05416 | :mortar_board: | None |
2023-06-05 | ZIGNeRF: Zero-shot 3D Scene Representation with Invertible Generative Neural Radiance Fields | Kanghyeok Ko et.al. | 2306.02741 | :mortar_board: | None |
2023-06-05 | Learning from Multi-View Representation for Point-Cloud Pre-Training | Siming Yan et.al. | 2306.02558 | :mortar_board: | None |
2023-06-03 | Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution | Yiji Cheng et.al. | 2306.02083 | :mortar_board: | None |
2023-05-26 | BEV-IO: Enhancing Bird’s-Eye-View 3D Detection with Instance Occupancy | Zaibin Zhang et.al. | 2305.16829 | :mortar_board: | None |
2023-05-19 | Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields | Jingbo Zhang et.al. | 2305.11588 | :mortar_board: | None |
2023-05-18 | OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding | Minghua Liu et.al. | 2305.10764 | :mortar_board: | None |
2023-05-15 | Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models | Zhimin Chen et.al. | 2305.08776 | :mortar_board: | None |
2023-05-14 | ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding | Le Xue et.al. | 2305.08275 | :mortar_board: | Code |
2023-05-09 | DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects | Chen Bao et.al. | 2305.05706 | :mortar_board: | None |
2023-05-03 | Real-Time Radiance Fields for Single-Image Portrait View Synthesis | Alex Trevithick et.al. | 2305.02310 | :mortar_board: | None |
2023-04-27 | Learning a Diffusion Prior for NeRFs | Guandao Yang et.al. | 2304.14473 | :mortar_board: | None |
2023-04-26 | Ray Conditioning: Trading Photo-consistency for Photo-realism in Multi-view Image Generation | Eric Ming Chen et.al. | 2304.13681 | :mortar_board: | None |
2023-04-25 | PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling | Zhe Li et.al. | 2304.13006 | :mortar_board: | Code |
2023-04-25 | Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur | Peng Dai et.al. | 2304.12652 | :mortar_board: | None |
2023-04-22 | 3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes | Haotian Xue et.al. | 2304.11470 | :mortar_board: | None |
2023-04-22 | NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation | Baao Xie et.al. | 2304.11342 | :mortar_board: | None |
2023-04-19 | Single-View View Synthesis with Self-Rectified Pseudo-Stereo | Yang Zhou et.al. | 2304.09527 | :mortar_board: | None |
2023-04-16 | Likelihood-Based Generative Radiance Field with Latent Space Energy-Based Model for 3D-Aware Disentangled Image Representation | Yaxuan Zhu et.al. | 2304.07918 | :mortar_board: | None |
2023-04-14 | UVA: Towards Unified Volumetric Avatar for View Synthesis, Pose rendering, Geometry and Texture Editing | Jinlong Fan et.al. | 2304.06969 | :mortar_board: | None |
2023-04-13 | Learning Controllable 3D Diffusion Models from Single-view Images | Jiatao Gu et.al. | 2304.06700 | :mortar_board: | None |
2023-04-13 | Survey on LiDAR Perception in Adverse Weather Conditions | Mariella Dreissig et.al. | 2304.06312 | :mortar_board: | None |
2023-04-11 | TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain | Alexey I. Boyko et.al. | 2304.05342 | :mortar_board: | None |
2023-04-11 | MRVM-NeRF: Mask-Based Pretraining for Neural Radiance Fields | Ganlin Yang et.al. | 2304.04962 | :mortar_board: | None |
2023-03-31 | Exploiting synchrotron X-ray tomography for a novel insight into flax-fibre defects ultrastructure | Delphine Quereilhac et.al. | 2303.18127 | :mortar_board: | None |
2023-03-29 | TriVol: Point Cloud Rendering via Triple Volumes | Tao Hu et.al. | 2303.16485 | :mortar_board: | Code |
2023-03-24 | Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning | Xiaoyang Wu et.al. | 2303.14191 | :mortar_board: | Code |
2023-03-24 | BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects | Bowen Wen et.al. | 2303.14158 | :mortar_board: | None |
2023-03-24 | NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images | Mingwu Zheng et.al. | 2303.14092 | :mortar_board: | Code |
2023-03-24 | SPONGE: Sequence Planning with Deformable-ON-Rigid Contact Prediction from Geometric Features | Tran Nguyen Le et.al. | 2303.14012 | :mortar_board: | None |
2023-03-24 | TEGLO: High Fidelity Canonical Texture Mapping from Single-View Images | Vishal Vinod et.al. | 2303.13743 | :mortar_board: | None |
2023-03-23 | NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM | Hidenobu Matsuki et.al. | 2303.13654 | :mortar_board: | None |
2023-03-22 | NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions | Mohamad Shahbazi et.al. | 2303.12865 | :mortar_board: | Code |
2023-03-22 | CLIP: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data | Yihan Zeng et.al. | 2303.12417 | :mortar_board: | None |
2023-03-21 | SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation | Juil Koo et.al. | 2303.12236 | :mortar_board: | None |
2023-03-21 | Vox-E: Text-guided Voxel Editing of 3D Objects | Etai Sella et.al. | 2303.12048 | :mortar_board: | None |
2023-03-20 | 3D Concept Learning and Reasoning from Multi-View Images | Yining Hong et.al. | 2303.11327 | :mortar_board: | None |
2023-03-20 | Learning to Generate 3D Representations of Building Roofs Using Single-View Aerial Imagery | Maxim Khomiakov et.al. | 2303.11215 | :mortar_board: | None |
2023-03-16 | NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes | Marie-Julie Rakotosaona et.al. | 2303.09431 | :mortar_board: | None |
2023-03-16 | Mimic3D: Thriving 3D-Aware GANs via 3D-to-2D Imitation | Xingyu Chen et.al. | 2303.09036 | :mortar_board: | None |
2023-03-14 | MeshDiffusion: Score-based Generative 3D Mesh Modeling | Zhen Liu et.al. | 2303.08133 | :mortar_board: | None |
2023-03-12 | StereoTac: a Novel Visuotactile Sensor that Combines Tactile Sensing with 3D Vision | Etienne Roberge et.al. | 2303.06542 | :mortar_board: | None |
2023-03-11 | FAC: 3D Representation Learning via Foreground Aware Feature Contrast | Kangcheng Liu et.al. | 2303.06388 | :mortar_board: | Code |
2023-03-09 | 3D Video Loops from Asynchronous Input | Li Ma et.al. | 2303.05312 | :mortar_board: | None |
2023-03-08 | Neural Vector Fields: Implicit Representation by Explicit Learning | Xianghui Yang et.al. | 2303.04341 | :mortar_board: | None |
2023-03-03 | Multi-Plane Neural Radiance Fields for Novel View Synthesis | Youssef Abdelkareem et.al. | 2303.01736 | :mortar_board: | None |
2023-02-28 | CLR-GAM: Contrastive Point Cloud Learning with Guided Augmentation and Feature Mapping | Srikanth Malla et.al. | 2302.14306 | :mortar_board: | None |
2023-02-27 | Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training | Ziyu Guo et.al. | 2302.14007 | :mortar_board: | None |
2023-02-26 | Makeup Extraction of 3D Representation via Illumination-Aware Image Decomposition | Xingchao Yang et.al. | 2302.13279 | :mortar_board: | None |
2023-02-16 | Spectral 3D Computer Vision – A Review | Yajie Sun et.al. | 2302.08054 | :mortar_board: | None |
2023-02-05 | Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining | Zekun Qi et.al. | 2302.02318 | :mortar_board: | Code |
2023-01-27 | HyperNeRFGAN: Hypernetwork approach to 3D NeRF GAN | Adam Kania et.al. | 2301.11631 | :mortar_board: | Code |
2023-01-20 | Semi-analytical computation of heteroclinic connections between center manifolds with the parameterization method | Miquel Barcelona et.al. | 2301.08526 | :mortar_board: | None |
2023-01-18 | Joint Representation Learning for Text and 3D Point Cloud | Rui Huang et.al. | 2301.07584 | :mortar_board: | None |
2023-01-18 | OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation | Tong Wu et.al. | 2301.07525 | :mortar_board: | None |
2023-01-18 | Three-dimensional reconstruction and characterization of bladder deformations | Augustin C. Ogier et.al. | 2301.07385 | :mortar_board: | None |
2023-01-12 | Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss | Anas Mahmoud et.al. | 2301.05709 | :mortar_board: | None |
2023-01-10 | Neural Radiance Field Codebooks | Matthew Wallingford et.al. | 2301.04101 | :mortar_board: | None |
2023-01-06 | Object as Query: Equipping Any 2D Object Detector with 3D Detection Ability | Zitian Wang et.al. | 2301.02364 | :mortar_board: | None |
2022-12-18 | SPARF: Large-Scale Learning of 3D Sparse Radiance Fields from Few Input Images | Abdullah Hamdi et.al. | 2212.09100 | :mortar_board: | Code |
2022-12-16 | Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning? | Runpei Dong et.al. | 2212.08320 | :mortar_board: | Code |
2022-12-13 | Structured 3D Features for Reconstructing Relightable and Animatable Avatars | Enric Corona et.al. | 2212.06820 | :mortar_board: | None |
2022-12-13 | Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders | Renrui Zhang et.al. | 2212.06785 | :mortar_board: | Code |
2022-12-10 | ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding | Le Xue et.al. | 2212.05171 | :mortar_board: | Code |
2022-12-09 | LoopDraw: a Loop-Based Autoregressive Model for Shape Synthesis and Editing | Nam Anh Dinh et.al. | 2212.04981 | :mortar_board: | Code |
2022-12-09 | Neural Volume Super-Resolution | Yuval Bahat et.al. | 2212.04666 | :mortar_board: | None |
2022-12-02 | 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation | Zutao Jiang et.al. | 2212.01103 | :mortar_board: | None |
2022-12-01 | SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction | Zhizhuo Zhou et.al. | 2212.00792 | :mortar_board: | None |
2022-11-24 | DiffusionSDF: Conditional Generative Modeling of Signed Distance Functions | Gene Chou et.al. | 2211.13757 | :mortar_board: | None |
2022-11-23 | Tetrahedral Diffusion Models for 3D Shape Generation | Nikolai Kalischek et.al. | 2211.13220 | :mortar_board: | None |
2022-11-21 | Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields | Yue Chen et.al. | 2211.11505 | :mortar_board: | None |
2022-11-21 | Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars | Jingxiang Sun et.al. | 2211.11208 | :mortar_board: | Code |
2022-11-20 | IC3D: Image-Conditioned 3D Diffusion for Shape Generation | Cristian Sbrolli et.al. | 2211.10865 | :mortar_board: | None |
2022-11-17 | RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation | Titas Anciukevičius et.al. | 2211.09869 | :mortar_board: | Code |
2022-11-15 | ParticleGrid: Enabling Deep Learning using 3D Representation of Materials | Shehtab Zaman et.al. | 2211.08506 | :mortar_board: | Code |
2022-11-11 | Shock-accelerated electrons during the fast expansion of a coronal mass ejection | D. E. Morosan et.al. | 2211.06049 | :mortar_board: | None |
2022-11-09 | ChromoSkein: Untangling Three-Dimensional Chromatin Fiber With a Web-Based Visualization Framework | Matúš Talčík et.al. | 2211.05125 | :mortar_board: | None |
2022-11-03 | Semantic 3D Grid Maps for Autonomous Driving | Ajinkya Khoche et.al. | 2211.01700 | :mortar_board: | Code |
2022-10-31 | gCoRF: Generative Compositional Radiance Fields | Mallikarjun BR et.al. | 2210.17344 | :mortar_board: | None |
2022-10-27 | Deep Generative Models on 3D Representations: A Survey | Zifan Shi et.al. | 2210.15663 | :mortar_board: | None |
2022-10-26 | Analyzing Deep Learning Representations of Point Clouds for Real-Time In-Vehicle LiDAR Perception | Marc Uecker et.al. | 2210.14612 | :mortar_board: | None |
2022-10-25 | MICP-L: Fast parallel simulative Range Sensor to Mesh registration for Robot Localization | Alexander Mock et.al. | 2210.13904 | :mortar_board: | Code |
2022-10-24 | Learning Neural Radiance Fields from Multi-View Geometry | Marco Orsingher et.al. | 2210.13041 | :mortar_board: | None |
2022-10-22 | NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos | Yi-Ling Qiao et.al. | 2210.12352 | :mortar_board: | None |
2022-10-20 | Coordinates Are NOT Lonely – Codebook Prior Helps Implicit Neural 3D Representations | Fukun Yin et.al. | 2210.11170 | :mortar_board: | Code |
2022-10-14 | Reference Based Color Transfer for Medical Volume Rendering | Sudarshan Devkota et.al. | 2210.08083 | :mortar_board: | None |
2022-10-13 | Visual Reinforcement Learning with Self-Supervised 3D Representations | Yanjie Ze et.al. | 2210.07241 | :mortar_board: | None |
2022-10-12 | AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars | Yue Wu et.al. | 2210.06465 | :mortar_board: | None |
2022-10-06 | XDGAN: Multi-Modal 3D Shape Generation in 2D Space | Hassan Abu Alhaija et.al. | 2210.03007 | :mortar_board: | None |
2022-10-05 | Water Simulation and Rendering from a Still Photograph | Ryusuke Sugimoto et.al. | 2210.02553 | :mortar_board: | None |
2022-10-04 | Bridged Transformer for Vision and Point Cloud 3D Object Detection | Yikai Wang et.al. | 2210.01391 | :mortar_board: | None |
2022-08-14 | Widely Used and Fast De Novo Drug Design by a Protein Sequence-Based Reinforcement Learning Model | Yaqin Li et.al. | 2209.07405 | :mortar_board: | None |
2022-09-07 | Multi-NeuS: 3D Head Portraits from Single Image with Neural Implicit Functions | Egor Burkov et.al. | 2209.04436 | :mortar_board: | None |
2022-09-09 | Towards Confidence-guided Shape Completion for Robotic Applications | Andrea Rosasco et.al. | 2209.04300 | :mortar_board: | Code |
2022-08-30 | Inferring Implicit 3D Representations from Human Figures on Pictorial Maps | Raimund Schnürer et.al. | 2209.02385 | :mortar_board: | None |
2022-08-24 | PeRFception: Perception using Radiance Fields | Yoonwoo Jeong et.al. | 2208.11537 | :mortar_board: | Code |
2022-08-23 | Spiral Contrastive Learning: An Efficient 3D Representation Learning Method for Unannotated CT Lesions | Penghua Zhai et.al. | 2208.10694 | :mortar_board: | None |
2022-08-19 | Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images | Nagabhushan Somraj et.al. | 2208.09463 | :mortar_board: | None |
2022-08-08 | 3D Vision with Transformers: A Survey | Jean Lahoud et.al. | 2208.04309 | :mortar_board: | Code |
2022-08-03 | Vision-Based Safety System for Barrierless Human-Robot Collaboration | Lina María Amaya-Mejía et.al. | 2208.02010 | :mortar_board: | None |
2022-08-02 | Self-Supervised Traversability Prediction by Learning to Reconstruct Safe Terrain | Robin Schmid et.al. | 2208.01329 | :mortar_board: | None |
2022-07-29 | Neural Density-Distance Fields | Itsuki Ueda et.al. | 2207.14455 | :mortar_board: | Code |
2022-07-26 | ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection | Junbo Yin et.al. | 2207.12654 | :mortar_board: | Code |
2022-07-24 | Cross-Modal 3D Shape Generation and Manipulation | Zezhou Cheng et.al. | 2207.11795 | :mortar_board: | None |
2022-07-22 | Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement | Prafull Sharma et.al. | 2207.11232 | :mortar_board: | None |
2022-07-21 | Approximate Differentiable Rendering with Algebraic Surfaces | Leonid Keselman et.al. | 2207.10606 | :mortar_board: | None |
2022-07-18 | Latent Partition Implicit with Surface Codes for 3D Representation | Chao Chen et.al. | 2207.08631 | :mortar_board: | Code |
2022-07-16 | Consistency of Implicit and Explicit Features Matters for Monocular 3D Object Detection | Qian Ye et.al. | 2207.07933 | :mortar_board: | None |
2022-07-13 | 3D Concept Grounding on Neural Fields | Yining Hong et.al. | 2207.06403 | :mortar_board: | None |
2022-07-12 | Vision Transformer for NeRF-Based View Synthesis from a Single Input Image | Kai-En Lin et.al. | 2207.05736 | :mortar_board: | None |
2022-07-06 | VMRF: View Matching Neural Radiance Fields | Jiahui Zhang et.al. | 2207.02621 | :mortar_board: | None |
2022-06-23 | EventNeRF: Neural Radiance Fields from a Single Colour Event Camera | Viktor Rudnev et.al. | 2206.11896 | :mortar_board: | None |
2022-06-22 | KiloNeuS: Implicit Neural Representations with Real-Time Global Illumination | Stefano Esposito et.al. | 2206.10885 | :mortar_board: | None |
2022-06-20 | WiFi-based Spatiotemporal Human Action Perception | Yanling Hao et.al. | 2206.09867 | :mortar_board: | None |
2022-06-13 | AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields | Takuhiro Kaneko et.al. | 2206.06100 | :mortar_board: | None |
2022-06-12 | NeuralODF: Learning Omnidirectional Distance Fields for 3D Shape Representation | Trevor Houchens et.al. | 2206.05837 | :mortar_board: | None |
2022-06-09 | Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields | Mingtong Zhang et.al. | 2206.04669 | :mortar_board: | None |
2022-06-08 | Learning Ego 3D Representation as Ray Tracing | Jiachen Lu et.al. | 2206.04042 | :mortar_board: | Code |
2022-06-08 | CO^3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving | Runjian Chen et.al. | 2206.04028 | :mortar_board: | None |
2022-06-02 | Machine Learning for Detection of 3D Features using sparse X-ray data | Bradley T. Wolfe et.al. | 2206.02564 | :mortar_board: | None |
2022-06-05 | FOF: Learning Fourier Occupancy Field for Monocular Real-time Human Reconstruction | Qiao Feng et.al. | 2206.02194 | :mortar_board: | None |
2022-05-30 | Neural Volumetric Object Selection | Zhongzheng Ren et.al. | 2205.14929 | :mortar_board: | None |
2022-05-28 | Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training | Renrui Zhang et.al. | 2205.14401 | :mortar_board: | Code |
2022-05-25 | sat2pc: Estimating Point Cloud of Building Roofs from 2D Satellite Images | Yoones Rezaei et.al. | 2205.12464 | :mortar_board: | None |
My Arxiv Daily
http://baiyucraft.top/Arxiv/Arxiv-daily.html