분류 전체보기(141)
-
Paper review: YOLOv12 (arxiv, technical report)
YOLOv12: Attention-Centric Real-Time Object DetectorsMotivationYOLO framework has focused on CNN-based improvements despite the proven superiority of attention mechanisms. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12 using Area Attention (A2) and Residual Efficient Layer Aggregation Netw..
2025.03.06 -
Paper reivew: Generalized Focal Loss (IEEE Transactions 2023)
Generalized Focal Loss: Towards Efficient Representation Learning for Dense Object DetectionThis paper is published by summarizing paper from v1 (NeurIPS 2020) and v2 (CVPR 2021).MotivationIn object detection, the classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. But Three problems are discovered in existing practices. 1) t..
2025.02.28 -
Paper review: Rewrite the Stars (CVPR 2024)
Rewrite the Stars MotivationSince AlexNet, a myriad of deep networks have emerged, each building on the other. Despite their characteristic instights and contributions, this line of models is mostly based on the blocks that blend linear projection with non-linear activations. Since self-attention, the most distinctive feature of self-attention is mapping features to different spaces and then con..
2025.01.13 -
Paper review: UniTR (CVPR 2023)
UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation MotivationPrevious works handle multi-modal data using modality-specific encoders sequential manner, then fuse the features based on late fusion. It slow down the inference speed and limiting their real-world applications. To tackle these problems, authors propose to process intra-modal representation learn..
2024.11.11 -
Paper review: DSVT(CVPR 2023)
DSVT: Dynamic Sparse Voxel Transformer with Rotated SetsMotivation3D backbone to handle sparse point clouds is a fundamental problem in 3D perception. Compared with the sparse convolution, the attention mechanism in Transformers is more appropriate and is easier to be deployed in real-world applications. However due to the sparse characteristics of point clouds, it is non-trival to apply a stand..
2024.10.18 -
Code review: YOLOv11 (2024)
YOLOv11YOLOv11 is released from ultralytics. I check the improvements of YOLOv11 comparing with YOLOv8 and YOLOv11. CommonTwo version of models are based on YOLOv8 ultralytics code and they share the rough architecture based on yolov8. DifferenceEach versions have differences in detail post processing parts and architectures Post-processingYOLOv11n (mAP 39.5, latency similar yolov10 without E2E ..
2024.10.08