current position:Home>One line of preprocessing code makes your CV model stronger! Google research teaches you to use a learnable Resizer

One line of preprocessing code makes your CV model stronger! Google research teaches you to use a learnable Resizer

2021-08-27 09:27:33 Xinzhiyuan

New Zhiyuan Report


edit :LRS

【 Introduction to new wisdom 】 An important operation of image preprocessing is resize, Zoom images of different sizes to the same size , But currently used resize Technology is still old , Cannot transform from data .Google Research Propose a learnable resizer, Just make minor changes in the preprocessing section , Can improve CV Model performance !

The neural network requires the size of the input data in each mini-batch Is unified in , So when doing visual tasks , An important preprocessing step is image resize, Adjust them to a uniform size for training .

Usually zoom (image down-scaling) The image will not be too big , Because if the resolution is too high, the memory occupied by the model will rise sharply during training , And too high resolution will also lead to too slow training speed and reasoning speed . Although in recent years GPU Gradually improve the performance of , But the standard input image is still 224 × 224.

in the majority of cases , The final size of the processed image is very small , For example, the early deepfake The generated pictures are only 80 × 80 The resolution of the .

In the face dataset , Because faces are rarely square , Pixels in a picture will waste more space , There is less image data available .

At present, the most commonly used image resizing methods include nearest neighbor (nearest neighbor)、 Bilinear (bilinear) And double triple (bicubic). these resize The method is fast , It can be flexibly integrated into the training and testing framework .

But these traditional methods were developed decades before deep learning became the mainstream solution for visual recognition tasks , Therefore, it is not particularly suitable for the deep learning model of the new era .

Google Research A new method is put forward , By improving the way the images in the dataset are scaled in the preprocessing stage , To improve the efficiency and accuracy of image-based computer vision training process .

The effect of image size on task training accuracy has not received much attention in model training . In order to improve efficiency ,CV Researchers usually adjust the input image to a relatively small spatial resolution ( for example 224x224) , Training and reasoning are carried out at this resolution .

The researchers thought , these resizer Does it limit the task performance of the training network ?

A simple experiment can prove that when these traditional resizer Be learnable resizer After substitution , Can significantly improve performance .

Conventional resizer It usually produces better visually scaled images , Learnable resizer It may not be particularly easy for people to see .

What is put forward in this paper resizer The model architecture is shown in the figure below :

It mainly includes two important features :(1) Bilinear feature resizing (bilinear feature resizing), as well as (2) Skip the connection (skip connection), The connection can accommodate bilinear resized images and images CNN Combination of functions .

The first feature takes into account the consistency between the features calculated at the original resolution and the model . Skipping connections can simplify the learning process , Because the Resizer model can directly transfer the bilinear resized image to the baseline task .

With the general encoder - Decoder architecture is different , The architecture proposed in this paper allows the image to be resized to any target size and aspect ratio . And learnable resizer Performance hardly depends on the choice of bilinear retainers , This means that it can directly replace other ready-made methods .

And this one resizer The model is relatively lightweight , No large number of trainable parameters will be added to the baseline task , these CNN Significantly smaller than other baseline models .

The experiment in this paper is mainly divided into three parts .

1、 Classification performance .

The model and output trained by the bilinear regulator will be used to adjust the resolution 224×224 Called the default baseline . It turns out that , stay 224×224 In the resolution model , Best performance , Use the resizer The trained network improves the performance .

Compared to the default baseline ,DenseNet-121 and MobileNet-v2 The baseline shows the maximum and minimum gain, respectively . about Inception-v2、DenseNet-121 and ResNet-50, Proposed resizer The performance is better than that with similar bilinear retainers .

2、 Quality assessment

Used by researchers 3 Two different baseline model pairs AVA Data sets are trained . The baseline model is based on ImageNet Initialize the pre trained weights on , And in AVA Fine tuning on the dataset .resizer Weights are initialized randomly . In this set of experiments , Use double three times resizer Is the baseline method . Performance is measured by the correlation between the average basic real score and the average predicted score , Pearson linear correlation coefficient was used to evaluate the correlation (PLCC) And Spearman rank correlation coefficient (SRCC).

Compared to the baseline model , There are deterministic improvements . Besides , about Inception-v2 and DenseNet-121 model , What is put forward in this paper resizer The performance is better than double triple resizer. At a higher failure rate , For learning resizer Come on ,EfficientNet It seems to be a more difficult baseline model to improve .

3、 Generalization

First use and resizer The default baseline of different target baselines can be learned by joint fine-tuning resizer. then , Measure the performance of the target baseline on the underlying task . Can be observed , About 4 individual epoch Fine tuning the training data is enough to make resizer Adapt to the target model . This verification is a reasonable indicator , Can show trained resizer How common are the various architectures .

By classification and IQA It turns out that , Each column shows resizer Initialization checkpoint of the model , Each line represents a target baseline . These results suggest that , After a minimum of fine-tuning , You can train for a baseline resizer Can be effectively used to develop another baseline resizer.

In some cases , Such as DenseNet and MobileNet Model , Fine tuned resizer It actually exceeds the classification performance obtained by random initialization . about IQA Of EffectiveNet The model has the same observations .

Finally, the researchers point out that , These experiments are specially optimized for the task of image recognition , And in the test , their CNN Driven, learnable resizer It can reduce the error rate in such tasks .

In the future, we may consider training in other image tasks image resizer.

Reference material :

copyright notice
author[Xinzhiyuan],Please bring the original link to reprint, thank you.

Random recommended