current position：Home>"Dog" produces everything? The Israeli team proposed a zero sample training model, and the dog changed into Nicholas Cage
"Dog" produces everything? The Israeli team proposed a zero sample training model, and the dog changed into Nicholas Cage
2021-08-26 09:01:55 【Xinzhiyuan】
New Zhiyuan Report
edit ：Priscilla Good trapped
【 Introduction to new wisdom 】 Tel Aviv University and NVIDIA research team used CLIP The semantic power of the model , A text driven method is proposed ：StyleGAN-NADA, No need to collect images in new areas , As long as there is a text prompt, you can quickly generate domain specific images .
All things can be GAN, The family dog can also become Nicholas Cage ！
GAN There are many applications , Covers image enhancement 、 edit , Even classification and regression tasks , But before that ,GAN You have to collect a lot of images first .
And in reality , Some specific artist's paintings 、 Fictional scenes, etc , There may not be enough data to train GAN, There's not even any data at all .
Zero sample training generator , Directly generate an artistic image of New York City
To solve this problem , Tel Aviv University and NVIDIA research team use large-scale comparative language - Image pre training （CLIP） The semantic power of the model , A text-based method is proposed ：StyleGAN-NADA.
There is no need to collect any image in a new field , Only text prompts are needed to generate domain specific images .
Address of thesis ：https://arxiv.org/abs/2108.00946
With a text prompt , After a short period of training , You can adapt the generator to many fields of different styles and shapes .
There is no need to edit a single image , Use OpenAI Of CLIP The signal can train the generator .
At the same time, it can also say with the training data 「 Bye-bye 」, Run fast ！
（ High energy warning ） Enter... In the box Human and Zombie, The corresponding image can be generated immediately
The goal of the team is to transfer a pre trained generator from a given source domain to a new target domain , Use only text prompts , Image without target domain .
As a source of oversight in the target area , The author uses only one pre trained CLIP Model .
There are two key issues in this process ：
(1) How can we best extract CLIP Semantic information encapsulated in ？
(2) How to standardize the optimization process , Avoid confrontational solutions or pattern crashes ？
Overview of network structure
The core of the method is two uses StyleGAN2 framework , Combined generators .
Two generators share a mapping network , The same underlying code will initially produce the same image in both generators .
Setting of training structure
Two interleaved generators are initialized using the weights of the generators pre trained on the source domain image --Gfrozen and Gtrain.
Gfrozen The weight of remains fixed throughout the training process , and Gtrain The weight of is modified by the optimized and iterative layer freezing scheme .
This process shifts according to the text direction provided by the user Gtrain The field of , While maintaining a shared potential space .
be based on CLIP Guidance of
The author relies on a pre trained CLIP Model as the only source of oversight in our target area .
overall situation CLIP Loss
This loss is designed to make the difference between the generated image and some given target text CLIP- Minimum space cosine distance .
However , This loss also has disadvantages ：
（1） There is no advantage in maintaining diversity
（2） Very vulnerable to adversarial solutions
If regularization is insufficient , The model will add appropriate pixel level disturbances to the image to deceive the image CLIP.
These disadvantages make global losses unsuitable for training generators .
however , The authors still use it to adaptively determine which subset of layers to train in each iteration .
Directional CLIP Loss
therefore , The second loss is designed to protect diversity and prevent widespread image damage .
One by reference （ frozen ） The generator generates , The other is modified by （ Can be trained ） The generator uses the same underlying code to generate .
The author referred to （ Source ） Images and modifications （ The goal is ） Between image embedding CLIP- The relationship between spatial orientation and the embedding of a pair of source and target text CLIP- Keep the spatial direction consistent .
among EI and ET Namely CLIP Image and text encoder ,Gfrozen and Gtrain Is a frozen source generator and a modified trainable generator , and ttarget and tsource Is the source and target class text .
This overcomes the disadvantage of global loss ：
（1） Unable to maintain diversity
Because the global loss is affected by the pattern crash example , If the target generator creates only one image , So from all sources to the target image CLIP- The spatial direction will be different .
therefore , They can't all be in the same direction as the text .
（2） Networks are more difficult to converge to adversarial solutions
Because you want to cheat CLIP, It is necessary to design disturbances that can be applied to infinite instances .
Embed the images generated by the two generators CLIP Space , And ask to connect their vectors ΔI The direction specified with the source text and the target text ΔT Collinear , And by maximizing their normalized inner product .
Embedded constant loss
In some cases , Use StyleCLIP Potential mapper for , It can better identify the potential spatial region matching the target domain .
However , The mapper will induce unwanted semantic artifacts on the image , And this is different from the generated image CLIP- The specification of spatial embedding is increased .
therefore , These specifications are constrained by introducing an additional loss during mapper training , This prevents the mapper from introducing this artifact .
among M It's a potential mapper .
The method proposed in this paper is also applicable to a wide range of extraterritorial image generation , From style and texture changes to shape modification , From reality to fantasy , This can be achieved through a text-based interface .
Even the most extreme shape changes , Just a few minutes of training can do it .
The above images are randomly sampled images synthesized by the generator , The target area is from face 、 The church 、 A dog to a car .
For purely style based changes , The authors allow training at all levels .
For minor shape modifications , The authors found that training about two-thirds of the model layer （ namely 1024×1024 Model 12 layer ） Provides a good compromise between stability and training time .
The source field text is 「 Dog 」
Compared with the first two pictures , The changes in the above figure mainly focus on style or slight shape adjustment , The model here needs significant shape changes .
The existing pre training generator can only be edited in the field , It is impossible to generate images outside the field of training .
The figure above shows the use of three StyleCLIP And the results of editing latent code in the direction of extraterritorial text .
You can see , Only the model proposed in this paper successfully generates the corresponding image .
Reference material ：
author[Xinzhiyuan],Please bring the original link to reprint, thank you.
The sidebar is recommended
- The maximum discount is 65000 yuan, starting from 131100 yuan. Does maiteng's new car have a fashionable appearance?
- 2021 Honda Civic, with improved appearance design, enhanced sports flavor and fuel consumption of 5.9l per 100km!
- Inspire recently bought a car with a maximum discount of 25000 yuan, starting from 157800 yuan. Is it worth starting?
- Is the new car more exquisite than ELFA? GAC motor M6 recently offered a maximum discount of 8000 yuan, starting from 101800 yuan
- 2021 jietu X70 plus complete evaluation report
- Send all insurance + lifelong maintenance. The new Elantra sells 20000 vehicles a month, becoming the most undervalued good car?
- Muke.com creates a micro front-end framework from scratch: the actual "automobile information platform" project
- The size of Toyota Prado is super, and the official drawing of tank 600 is released. It will debut at Chengdu auto show
- "Perfect Scooter", 3.0T V6 equivalent to only 260000, released by Nissan Fairlady Z
- It's a long face this time! Don't see regret! Red flag HS5 vs inspire
guess what you like
Mercedes Benz C-class all terrain official map release - all-round warrior coming
They are at the top of the sales list. Who is the strongest vegetable car
Skoda represents a large space, and the configuration upgrade is better than hanlanda
Blue body painting, tough body, luxury interior, new jetball X90 plus official drawing
[kick car Q & A] can hybrid car maintenance save money?
The official map of tank 600 was released, equipped with 3.0T + 9at. It is known as domestic land patrol and made its debut at Chengdu auto show
Hold up a big move! Can't afford to lose! The monthly salary is 8000. Just buy it! Mingjue HS vs Ruicheng CC
In the new Porsche car
Performance car lovers are destined to launch the new Nissan Z sports car in the world
Listing! Let's go! A winner with a monthly salary of 5000! Arizer GX vs Baojun 530
- PK: BYD Qin Pro vs. xinbaojun RC-5
- Finally the trump card! Can you have a salary of about 5000 a month? Southeast DX3 vs Baojun 510
- Hechuang z03 opens Dading! The first ultra wide-angle seat with a mileage of more than 600km
- It's a loss to drop again! Chery Ruihu 8 fell to more than 70000, which is more spacious than the public
- Recognized as a good national car, it is cheaper to maintain and more durable than Toyota
- The official map of tank 600 is released. Is the hardline SUV no longer the world of land patrol?
- Young people can't afford model y for their first smart SUV. You might as well take a look at this Xiaopeng G3
- Wei Lai had an accident. The owner of the car died due to a traffic accident. When the accident happened, NOP pilot assisted driving was turned on
- Netizen: you can buy Beijing Hyundai ix25 with a monthly salary of 8000! Buy it if you like
- I like a more refined motorcycle, I also like the cruising style, and I also like rocket III. please recommend it
- Will Hyundai Elantra surpass Langyi in sales? Let's take a look at the sales interpretation of next July!
- This wave is not bad! Come on, come on! Can you afford 8000 a month? Ruicheng CC vs boyue
- Online competition complaint: are the car models of two teams at the same match point the same?
- Can't afford to lose! Can you afford a winged tiger with a monthly salary of 8000? Are you satisfied with the profit of 52000?
- The price of the 2021 Subaru forest man is "shrunk", the materials are exquisite, and the 2.0L power is strong
- Actress of the anti Mafia storm: Jiang Shuying 168, Che Xiao 170, Wang jundi and sun Honglei almost
- The new Buick Willan Pro / Pro GS has been opened for pre-sale, and the 1.5T four cylinder engine has been upgraded. It will be launched next month
- The tough tank Ranger officially went offline
- A large number of 4S stores are closed, and the competition in the automotive aftermarket may usher in a sudden change
- Euler lightning cat perfectly replicates the Porsche Panamera; Harley's first electric bicycle released
- After driving the Honda CRV for two years, I started this car from my colleagues and lamented that I bought the car early
- The old market of new Tiguan L: consumers have changed, and so has the public
- The two luxury cars are parked together. Regardless of the brand, they all choose red, but the price is quite different
- Until today, there is no such amazing car. With EA888, the interior is also very sci-fi
- Look over here! Another explosion! Don't worry about buying a car! Qin vs Baojun 510
- Geely Xingyue l 4WD, crush opponents with strength, including but not limited to CR-V and Tiguan L
- "Who says women are not as good as men!" EULA good cat GT Mulan version pre-sale 138000
- Musk will never experience the fun of community group buying
- Why does Volkswagen Passat sell for 220000? Listen to the people who drive
- The 32nd Golden Melody Awards came to an end, and Tian Fuzhen won the queen of the song, but other awards were all popular
- Volkswagen is on fire again. The wheelbase is 2651mm. It is equipped with a 1.4T engine and the fuel consumption is 5.7
- Luxury is not inferior to Mercedes Benz and BMW. It is equipped with 100000 air bags and 200000 km of zero fault. I despise Audi A4L when I buy it
- Eight new trends interpretation of automobile consumption trend insight Report
- The three members of Liu Meijuan's family rarely appear. They once suffered from aphasia. Their daughter is slim and graceful, and their mother and daughter are like copying and pasting
- Do you have an active air power kit? Spy photos of Porsche 911 GT3 RS
- Sports show! Still 3.5t power + 4WD, increased creeping function, aggressive appearance
- I regret that I bought the BMW X5 early. This car is purely imported, with a fuel consumption of 6L, which is more elegant than the Audi A7
- Power / domineering appearance of the new Mercedes AMG GLS 63 overseas real car sports car
- Cool! Toyota was depressed and announced a large-scale shutdown
- The maximum discount is 68000 yuan, starting from 167800 yuan. Tanyue x is the "flirting king"?