References:
[1] Girshick R. Fast R-CNN[C]. In Proceedings of the IEEE International Conference on Computer Vision, 2015:1440–1448.
[2] He K M, Gkioxari G, Dollar P, et al. Mask R-CNN[C]. In Proceedings of the IEEE International Conference on Computer Vision, 2017: 2961–2969.
[3] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]. Advances in Neural Information Processing Systems, 2015, 28.
[4] Zhang S F, Chi C, Yao Y Q, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9759–9768.
[5] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]. In Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980–2988.
[6] Tian Z, Shen C H, Chen H, et al. FCOS: Fully convolutional one-stage object detection[C]. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9627–9636.
[7] Kim K and Lee H S. Probabilistic anchor assignment with IoU prediction for object detection[C]. In European Conference on Computer Vision, Springer, 2020: 355–371.
[8] Song G L, Liu Y, and Wang X G. Revisiting the sibling head in object detector[C]. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11563–11572.
[9] Xue Z Y, Liang J M, Song G L, et al. Large-batch optimization for dense visual predictions[C]. In Advances in Neural Information Processing Systems, 2022.
[10] Zong Z F, Cao Q G, and Leng B. RCNet: Reverse feature pyramid and cross-scale shift network for object detection[C]. In Proceedings of the 29th ACM International Conference on Multimedia, 2021: 5637–5645.
[11] Hosang J, Benenson R, and Schiele B. Learning non-maximum suppression[C]. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4507-4515.
[12] Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
[13] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers. In European Conference on Computer Vision. Aug 23, 2020[C]. Cham: Springer, 2020: 213-229.
[14] Vaswani. A., Shazeer. N., Parmar. N., et al. Attention is all you need[C]. In Advances in Neural Information Processing Systems, 2017: 5998–6008.
[15] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference. September 6-12, 2014[C]. Zurich: Springer, 2014, 5(13): 740-755.
[16] Zhu X Z, Su W J, Lu L W, et al. Deformable DETR: Deformable transformers for end-to-end object detection. In ICLR 2021: The Ninth International Conference on Learning Representations, 2021.
[17] Meng D P, Chen X K, Fan Z J, et al. Conditional DETR for fast training convergence. arXiv preprint arXiv: 2108.06152, 2021
[18] Yao Z Y, Ai J B, Li B X, et al. Efficient DETR: Improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318, 2021
[19] Wang Y M, Zhang X Y, Yang T, et al. Anchor DETR: Query design for transformer-based detector. arXiv preprint arXiv:2109.07107, 2021.
WeChat Official Account: Artificial Intelligence Perception Information Processing Algorithm Research Institute