? Notes: Image Matching Challenge Review (CVPR 2022)

The author has followed this competition for a long time, and I have written two articles last year and the year before 2021 , 2020 , homepage , competition address

What is the biggest difference this year? That’s right, it’s Kaggle !

In fact, last year’s competition was supposed to be held on Kaggle , but due to the time-consuming epidemic and data preparation, the decision was postponed to this year.

This year, a total of 3846 people participated in the competition, a total of 642 teams, of which 128 users participated in the competition for the first time (18 new users in the Top20), these participants came from 60 countries, and a total of 14170 submission records were recorded.

It can be seen from the following figure: more than 25 times the number of participants and 150 times the submission record:

Fun fact: The championship plan was completed 48 hours before the deadline.

the difference

Compared with previous years’ competitions, this year’s competition is different in the following aspects:

Contestants need to submit notebooks for offline processing of competition data
Contestants can’t view the test set: it’s hard to cheat
Allows algorithm to iterate quickly

Apart from that, there are a few differences:

Cut off the multiview track (multi-view matching track) and focus only on the stereo track . There are a number of reasons for this, the main one being the “technical problem” that it is difficult to run and evaluate matching performance in a limited and reasonable amount of time.
New datasets and evaluation criteria. The translation amount of the true value pose in previous years has no scale, and can only evaluate the accuracy of rotation; while this year’s translation amount has scale information, which makes it possible to evaluate the amount of pose rotation and translation at the same time. In addition, this year, a non-public dataset from Google (which is not available online) was used.
time limit. The total computing time is limited to 9 hours (computing platform: Kaggle GPU virtual instance), no timeout! This makes the contestants think about what algorithms can and cannot be used. A simple example: using a semantic segmentation mask may be helpful to improve the index, but the computing power required is too large, so it cannot be used!

useful tricks

Swapping image matching order improves accuracy of LoFTR-like matchers
Valid for positional encoding normalization of LoFTR-like matchers (top2 scheme)
There is little difference in the methods of using different resize images
Using ECO-TR to optimize coordinates is effective (not open source)
Using local descriptors + non-learning matchers to increase the number of matches does not work, such as DISK ¹¹ , ALIKE ¹² , etc.;
Semantic segmentation masks (sky/people) didn’t work either;

Summarize

The “2-stage” approach is quite effective for image matching tasks: first find the common view area, and then zoom to match;
It’s better to solve the “recall” problem first, i.e. find as many matches as possible, this process can use different matchers; it is believed that modern RANSACs can recover poses with fewer interior points;
LoFTR ⁵ is very sensitive to the input image size, which deserves further study.

refer to

^1. Image Matching: Local Features & Beyond, homepage: https://image-matching-workshop.github.io ↩

^2. Image Matching Challenge 2022, homepage: https://www.kaggle.com/competitions/image-matching-challenge-2022 ↩

^3. Image Matching Challenge 2022 Recap, Dmytro Mishkin, https://ducha-aiki.github.io/wide-baseline-stereo-blog/2022/07/05/IMC2022-Recap.html , homepage: http://dmytro .ai ↩

^4. Competition is Finalized : Congrats to our Winners, Recap, https://www.kaggle.com/competitions/image-matching-challenge-2022/discussion/329650 ↩

^5. LoFTR: Detector-Free Local Feature Matching with Transformers, CVPR 2021, code: https://github.com/zju3dv/LoFTR , pdf: https://arxiv.org/abs/2104.00680 ↩

^6. QuadTree Attention for Vision Transformers, ICLR 2022, code: https://github.com/Tangshitao/QuadTreeAttention , pdf: https://arxiv.org/abs/2201.02767 ↩

^7. SuperGlue: Learning Feature Matching with Graph Neural Networks, CVPR 2020, code: https://github.com/magicleap/SuperGluePretrainedNetwork , pdf: https://arxiv.org/abs/1911.11763 ↩

^8. DKM, Deep Kernelized Dense Geometric Matching, arxiv 2022, code: https://github.com/Parskatt/DKM , pdf: https://arxiv.org/abs/2202.00667 ↩

^9. ANMS, Efficient adaptive non-maximal suppression algorithms for homogeneous spatial keypoint distribution, code: https://github.com/BAILOOL/ANMS-Codes , pdf: https://www.researchgate.net/publication/323388062_Efficient_adaptive_non-maximal_suppression_algorithms_for_homogeneous_spatial_keypoint_distribution ↩

^10. OANet, Learning Two-View Correspondences and Geometry Using Order-Aware Network, code: https://github.com/zjhthu/OANet , pdf: https://arxiv.org/abs/1908.04964 ↩

^11. DISK: Learning local features with policy gradient, NeurIPS 2020, code: https://github.com/cvlab-epfl/disk , pdf: https://arxiv.org/abs/2006.13566 ↩

^12. ALIKE: Accurate and Lightweight Keypoint Detection and Descriptor Extraction, Transactions on Multimedia 2022, code: https://github.com/Shiaoming/ALIKE , pdf: https://arxiv.org/abs/2112.02906 ↩

^13. ASLFeat: Learning Local Features of Accurate Shape and Localization, CVPR 2020, code: https://github.com/lzx551402/ASLFeat , pdf: https://arxiv.org/abs/2003.10071 ↩

This article is reprinted from https://www.vincentqin.tech/posts/imc2022/
This site is for inclusion only, and the copyright belongs to the original author.

? Notes: Image Matching Challenge Review (CVPR 2022)

the difference

Top solutions

Top1 ideas

Top2 ideas

useful tricks

Summarize

refer to

Leave a Comment Cancel Reply