It takes so much time to define an Adam optimizer in TensorFlow
It takes so much time to define an Adam optimizer in TensorFlow
I use Adam optimizer to train a network, but I don't know why it takes so much time to just define the trainer. In TensorFlow, what does Adam optimizer do when we define it?
Here is how I define the trainer.
mse_loss = tf.reduce_sum(tf.squared_difference(generated_images, FD_placeholder)) / (batch_size * width * height)
print(gen_variables)
print(mse_loss)
g_trainer = tf.train.AdamOptimizer(learning_rate=lr)
print("aaaaa")
g_trainer = g_trainer.minimize(mse_loss, var_list=gen_variables)
print("aaaaa")
lr is a placeholder for learning rate, it has type tf.float32, and shape=[ ].
generated_images and FD_placeholder have a shape (batch_size,64,64,1). I use batch_size = 2. width and height are equal to 64
gen_variables has shape (91,1,1) and dtype=tf.float32.
Here is the output for these few lines of code.
[<tf.Variable 'generator_model/g_w1:0' shape=(91, 1, 1) dtype=float32_ref>]
Tensor("truediv:0", shape=(), dtype=float32)
[<tf.Variable 'generator_model/g_w1:0' shape=(91, 1, 1) dtype=float32_ref>]
aaaaa
It takes hours for the second print("aaaaa") to be executed, even though I reduce the batch size to 2.
There is no error message shown.
Could anyone suggest some possible reasons?
And what does Adam optimizer do when we define it?
@AbhishekMishra Thank you. I am running my model with GradientDescentOptimzier. But the behavior remains the same. I think it might take shorter time. what do you mean by 'how big' the model? The only trainable variable in my model is the variable with shape (91,1,1) shown above.
– Huidong Xie
Jun 30 at 1:39
When you say the behavior remains the same, is it still taking hours to define the gradient descent optimizer? What do you mean when you say it might take shorter time? Please explain.
– Abhishek Mishra
Jun 30 at 1:41
The code still stops at the same place and there is no error message shown. I don't know if it still needs many hours, because I am running the model right now. But, as you explained, GradientDescentOptimizer does not define two additional variables so I think it might take shorter time.
– Huidong Xie
Jun 30 at 1:45
What is the model you are using? How big is your network?
– Abhishek Mishra
Jun 30 at 1:47
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
How big is your model? Usually adam optimizer defines two additional variables for each variable in your model to take care of Adam update that relies on first moment and the square of first moment. Could you please check if the behavior remains the same when you replace the Adam optimizer with GradientDescent Optimizer.
– Abhishek Mishra
Jun 30 at 1:23