current position：Home>One line of preprocessing code makes your CV model stronger! Google research teaches you to use a learnable Resizer
One line of preprocessing code makes your CV model stronger! Google research teaches you to use a learnable Resizer
2021-08-27 09:27:33 【Xinzhiyuan】
New Zhiyuan Report
【 Introduction to new wisdom 】 An important operation of image preprocessing is resize, Zoom images of different sizes to the same size , But currently used resize Technology is still old , Cannot transform from data .Google Research Propose a learnable resizer, Just make minor changes in the preprocessing section , Can improve CV Model performance ！
The neural network requires the size of the input data in each mini-batch Is unified in , So when doing visual tasks , An important preprocessing step is image resize, Adjust them to a uniform size for training .
Usually zoom （image down-scaling） The image will not be too big , Because if the resolution is too high, the memory occupied by the model will rise sharply during training , And too high resolution will also lead to too slow training speed and reasoning speed . Although in recent years GPU Gradually improve the performance of , But the standard input image is still 224 × 224.
in the majority of cases , The final size of the processed image is very small , For example, the early deepfake The generated pictures are only 80 × 80 The resolution of the .
In the face dataset , Because faces are rarely square , Pixels in a picture will waste more space , There is less image data available .
At present, the most commonly used image resizing methods include nearest neighbor （nearest neighbor）、 Bilinear （bilinear） And double triple （bicubic）. these resize The method is fast , It can be flexibly integrated into the training and testing framework .
But these traditional methods were developed decades before deep learning became the mainstream solution for visual recognition tasks , Therefore, it is not particularly suitable for the deep learning model of the new era .
Google Research A new method is put forward , By improving the way the images in the dataset are scaled in the preprocessing stage , To improve the efficiency and accuracy of image-based computer vision training process .
The effect of image size on task training accuracy has not received much attention in model training . In order to improve efficiency ,CV Researchers usually adjust the input image to a relatively small spatial resolution ( for example 224x224) , Training and reasoning are carried out at this resolution .
The researchers thought , these resizer Does it limit the task performance of the training network ？
A simple experiment can prove that when these traditional resizer Be learnable resizer After substitution , Can significantly improve performance .
Conventional resizer It usually produces better visually scaled images , Learnable resizer It may not be particularly easy for people to see .
What is put forward in this paper resizer The model architecture is shown in the figure below ：
It mainly includes two important features ：（1） Bilinear feature resizing （bilinear feature resizing）, as well as （2） Skip the connection （skip connection）, The connection can accommodate bilinear resized images and images CNN Combination of functions .
The first feature takes into account the consistency between the features calculated at the original resolution and the model . Skipping connections can simplify the learning process , Because the Resizer model can directly transfer the bilinear resized image to the baseline task .
With the general encoder - Decoder architecture is different , The architecture proposed in this paper allows the image to be resized to any target size and aspect ratio . And learnable resizer Performance hardly depends on the choice of bilinear retainers , This means that it can directly replace other ready-made methods .
And this one resizer The model is relatively lightweight , No large number of trainable parameters will be added to the baseline task , these CNN Significantly smaller than other baseline models .
The experiment in this paper is mainly divided into three parts .
1、 Classification performance .
The model and output trained by the bilinear regulator will be used to adjust the resolution 224×224 Called the default baseline . It turns out that , stay 224×224 In the resolution model , Best performance , Use the resizer The trained network improves the performance .
Compared to the default baseline ,DenseNet-121 and MobileNet-v2 The baseline shows the maximum and minimum gain, respectively . about Inception-v2、DenseNet-121 and ResNet-50, Proposed resizer The performance is better than that with similar bilinear retainers .
2、 Quality assessment
Used by researchers 3 Two different baseline model pairs AVA Data sets are trained . The baseline model is based on ImageNet Initialize the pre trained weights on , And in AVA Fine tuning on the dataset .resizer Weights are initialized randomly . In this set of experiments , Use double three times resizer Is the baseline method . Performance is measured by the correlation between the average basic real score and the average predicted score , Pearson linear correlation coefficient was used to evaluate the correlation （PLCC） And Spearman rank correlation coefficient （SRCC）.
Compared to the baseline model , There are deterministic improvements . Besides , about Inception-v2 and DenseNet-121 model , What is put forward in this paper resizer The performance is better than double triple resizer. At a higher failure rate , For learning resizer Come on ,EfficientNet It seems to be a more difficult baseline model to improve .
First use and resizer The default baseline of different target baselines can be learned by joint fine-tuning resizer. then , Measure the performance of the target baseline on the underlying task . Can be observed , About 4 individual epoch Fine tuning the training data is enough to make resizer Adapt to the target model . This verification is a reasonable indicator , Can show trained resizer How common are the various architectures .
By classification and IQA It turns out that , Each column shows resizer Initialization checkpoint of the model , Each line represents a target baseline . These results suggest that , After a minimum of fine-tuning , You can train for a baseline resizer Can be effectively used to develop another baseline resizer.
In some cases , Such as DenseNet and MobileNet Model , Fine tuned resizer It actually exceeds the classification performance obtained by random initialization . about IQA Of EffectiveNet The model has the same observations .
Finally, the researchers point out that , These experiments are specially optimized for the task of image recognition , And in the test , their CNN Driven, learnable resizer It can reduce the error rate in such tasks .
In the future, we may consider training in other image tasks image resizer.
Reference material ：
author[Xinzhiyuan],Please bring the original link to reprint, thank you.
The sidebar is recommended
- Cadillac's new ats-l debut, netizen: "want fire!"
- Finally keeping up with the rhythm of the large screen, the Volkswagen New Tiguan l two-wheel drive top configuration is indeed a lot younger
- Samsung plans to sell its stake in Renault and withdraw from its auto business
- Honda's new civic type R is expected to continue to be equipped with a 2.0T engine
- How to avoid danger when the vehicle is trapped in the water on a rainy day?
- What is the crux of GAC Fick's continuous change of command by "life renewal" through shareholder blood transfusion?
- King of medium and large SUVs! BMW X5 please stand aside, Porsche Cayenne is coming! Ushered in a big price increase, with a huge province of 11000
- FF starts global recruitment and will announce the joining of several core executives in the near future
- Why don't racing cars use automatic transmission? Will automatic transmission affect the performance of racing cars?
- When the new Volkswagen Passat arrives at the store, the front face changes greatly. Two power options are available. Is it still worth buying?
guess what you like
Buy Buick GL8 and so on. This year, there are two boutique MPVS on the market, with explosive potential
The flagship strength under the world-class CMA framework is less than 200000, and there is almost no rival to test drive Geely Xingyue L
Health treasure self inspection replaces code scanning, killing records of "premature birth", and prevention and control should prevent "short board effect"
Tandian 2021 BMW 3 series, black coating shows sports, equipped with 2.0T, and the discount margin is "reduced"
Roewe's new SUV "whale" adopts the high-energy design of "rhythm wake-up"
GAC is sincere enough to launch 2021 shadow leopard, with 1.5T power and 6.95s breaking 100!
Excuse me, what are the benefits of three cylinders for consumers?
The Norwegian version of Weilai app was officially launched, with four functional sections
Real car purchase: tangled BBA? After calculating the car maintenance account, he placed an order for the BMW 5 series
MG6 Pro is no longer a pure "bubble orange" with high appearance and big action
- In July, BBA's compact SUV sales were the top three, Audi Q3 laughed last, and Mercedes Benz GLB was embarrassed
- Qiao Zhenyu's clothes of 40 + are not inferior to those of young students. Wearing striped T-shirts is refreshing and fashionable, not greasy at all
- Why can Volkswagen maiteng be recognized by riders? Is it handled well? Or are there other reasons?
- Real shot micro Blue 6: Sci Fi appearance, enough space, rich configuration, no tax, but average endurance
- Ideal car "up" success! The sales volume reached a new high in July and returned to Hong Kong stocks. Do you want to take the lead?
- Tanabata special broadcast, Lingke 06's "new color number" and "new mileage" were exposed
- Pentium T77 Pro: three moves to "hold" young people
- How did automobile production increase production before Ford introduced assembly line production?
- Great Wall acquires Mercedes Benz's Brazilian factory, and Wei Jianjun wants to be the "leader" of Chinese brands in the world?
- The price has been increased by 3 in case the car is hard to find! This Japanese SUV is now greatly upgraded, but you still can't buy it without increasing the price?
- Range 1012km! SUVs that can refuel and charge can be sold for less than 150000!
- 2.0T breaks 100 in 6.9 seconds! Not only pay attention to safety! Does this medium-sized car work better than BBA?
- The most low-key plus model is the Buddhist medium-sized SUV of SINOTRUK
- "Overbearing" also plays hybrid. Is the new Toyota Prado ready to raise the price to collect the car?
- What car do you buy with a budget of 200000? These three cars are suitable for ladies and sisters
- Tank 600 real vehicle exposed in advance! Taking 3.0T + 9at to the booth is too domineering than Prado
- The world has sold 2 million + in total, and the fourth generation Jiahua wants to "make a handsome decision"?
- Mazda CX5 was rear ended by Jetta. The owner praised the good quality. Netizen: pull out Tiguan
- The heat is high and the strength is strong. How can you miss these three when buying a pickup truck!
- Post Prado era! Equipped with 3.0T + 9at, tank 600 made its debut at Chengdu auto show
- Camry's real enemy, the door is only quiet, with a width of 1878mm and a price close to the people
- Cheap and affordable, easy maintenance, durable, large space, real shooting of Volkswagen Santana
- Is this new energy? The concept used to be more beautiful than maiteng, but now it is mass-produced
- The new Volkswagen Tiguan l family / 2022 Tiguan X were sold for 215800-293900 yuan
- 【 Tongxin anti epidemic I'm on site 】 Baofeng Road community: work together to build an epidemic prevention and control network
- Driving the Prado 2700 into Tibet, the power is very meat and the fuel consumption is very high. The higher the expectation, the greater the disappointment!
- The fastest SUV mansory on the surface is changed to Audi rsq8, which directly outperforms Lamborghini urus
- The future sports car 4S store appeared and hung the dragonfly LOGO! Passerby: I thought it was selling leather shoes
- Honda CRV hit a three wheeled motorcycle. After seeing the condition of the two cars, netizen: I give 82 points
- High end high-quality SUV, 2.0T high-power performance, real shooting Lingke 05
- After big dog and red rabbit, Harvard unveiled another new car with novel name, named "divine beast"
- 2020 Haval F5 is officially launched and the configuration is upgraded
- Lin Zhixuan won't let Zhao Wenzhuo turn on the air conditioner? Malicious editing of variety shows, please stop
- Details upgrade / new color matching the new Changan uni-t will be available on August 25
- Ability reconstruction interprets the insight report on the core operating power of automobile enterprises
- Is the new Roewe rx5 plus worth buying?
- What do you think of the car's "fuel consumption"? Old driver: turn it off and the fuel consumption will be reduced
- 11639 people were affected by heavy rain in Mianxian County, Shaanxi Province, and more than 1000 cars were flooded in the urban area
- The Trial Provisions of five departments on standardizing automobile data processing activities shall come into force in October
- BYD semiconductor suffered a sudden disaster and its teammates dragged down its listing