# Deepmind reinforcement learning boss launched pondernet, a neural network that can "think" like people

2021-08-26 09:01:57

New Zhiyuan Report

source ：marktechpost

edit ：LRS

【 Introduction to new wisdom 】 Do machines need thinking time ？ When making a neural network model , It may often be overlooked that the machine needs different calculations to solve different difficult problems .DeepMind Recently, a reinforcement learning model was introduced PonderNet, Can adaptively adjust the amount of calculation according to the difficulty of the problem .

When humans answer a question , If the problem is more difficult , Obviously, more time is needed to think .

But in a standard artificial neural network , The amount of computation used increases with the size of the input , It has nothing to do with the complexity of the problem learned .

But usually , The problem also has inherent complexity independent of the input size , For example, adding two numbers is faster than dividing .

Most machine learning algorithms do not adjust the computing budget according to the complexity of the task they are learning to solve , Or we can say , This adjustment is made by AI The creator of the model did it manually .

If this adaptation time works on people , It's called thinking .

Previous work such as adaptive computing time （Adaptive Computation Time, ACT） Automatically learn and estimate the required calculation time through standard probability .

This pause probability （halting probability） Adjust the number of calculation steps required for each input , be called 「 Thinking time 」. but ACT Very unstable , And you need to choose a super parameter very sensitively , Trade off accuracy and computational cost .

To overcome this limitation ,DeepMind A new model is proposed PonderNet, The amount of computation can be adjusted according to the complexity of the input problem .

PonderNet Learn the number of end-to-end calculation steps , To predict accuracy in training 、 Effective tradeoff between computational cost and generalization .

It includes a step function (step function), The outputs are the prediction of the network and in step n The probability of stopping . The step function can also be any neural network , Such as MLP、LSTM Or encoder - Decoder structure of the network , Such as Transformer. Apply this step function repeatedly , most N Time .

in application , Every problem requires a limited thinking step , Therefore, the step function can only be expanded in a finite number of iterations , And this must be normalized , Make the sum of probabilities 1.

It can be done in two ways ：

1、 Normalized probability , Make the sum of 1（ This is equivalent to adjusting the probability of stopping when you know the number of thoughts ）

2、 Assign all remaining pause probabilities to the last thought .

PonderNet The loss function used biases the network towards the expected number of previous steps . secondly , It provides an incentive , Make all possible steps have non-zero probability , Thus, it further promotes the exploration .

On a complex comprehensive problem ,PonderNet Compared with the previous adaptive calculation methods, the performance is greatly improved . As shown in the figure below ,PonderNet The parity check task is better than ACT Higher accuracy , And it makes more effective use of thinking time . Besides , If you consider the total calculation time during training , You can see , And ACT comparison ,PonderNet Take fewer calculations and get higher scores .

Another analysis is to observe the effect of a priori probability on the performance of parity check tasks . You can see PonderNet The only situation where a task cannot be solved is when prior（λp） Set to 0.9 when , That is, the average number of thinking steps is about 1（1/0.9） when .

The interesting phenomenon is , When a priori （λp） Set to 0.1 when , from 10 Step （1/0.1） The a priori average thinking time starts , The network can overcome this defect , And stabilize to about 3 Step is more effective, average thinking time . These results suggest that PonderNet More stable than previous methods , And with ACT There is obvious progress compared with , among τ Parameters are difficult to set , And it is the source of training instability .

Last , One advantage of setting a priori probability is , This parameter can easily be interpreted as “ Thinking steps ” Reciprocal , and ACT In the model τ Parameters have no direct explanation , So it becomes more difficult to define a priori .

In the test PonderNet Allow extrapolation (extrapolation) When . When in 96 When training a network on an input vector of elements , from 1 To 48 Start training with an integer of elements , And then in 49 To 96 Evaluate on integers between . Results show ,PonderNet Can achieve almost perfect accuracy in this extrapolation task , and ACT Keep at a random level .

Besides ,DeepMind The method matches the latest results of real-world question and answer data sets , Less computation is used . Include 20 Task bAbI When experimenting on a question and answer dataset , For standard neural network architecture without adaptive computing , It's hard to train .

PonderNet The model can match the most advanced results , Faster implementation , The average error is lower . And Universal transformerx comparison , It's used with PonderNet same Transformer framework , But use ACT The calculation time is optimized .

To solve 20 A mission ,Universal Transformer need 10161 A step , and PonderNet It only needs 1658, Therefore, it is confirmed that this method is better than ACT Use less computation .

also PonderNet It has achieved the most advanced results on a complex task designed to test the reasoning ability of neural network . In paired associative reasoning tasks （paired associative inference, PAI） I tested it. PonderNet. This task is considered to grasp the essence of reasoning , That is, the understanding of the distance relationship between elements distributed in multiple facts or memories , And it has been proved that it can benefit from the addition of adaptive computing .

PonderNet Able to match MEMO Result , Although this model uses UT The same architecture , But it can achieve higher accuracy .

PonderNet It is used to adapt to the computational complexity of neural networks . It optimizes a new objective function , This function combines the prediction accuracy with a regularization term , The regularization term stimulates exploration in thinking time .

Compared with the past ACT The method should be a progress .

Reference material ：