logo
Loading...

Colab GPU: Resource exhausted - Cupoy

加了 allow_growth 都未能跑完, 請問如何解決呢?config = tf.ConfigP...

Colab

Colab GPU: Resource exhausted

2020/04/13 上午 06:53
電腦視覺深度學習討論版
WP
觀看數:18
回答數:2
收藏數:0
Colab

加了 allow_growth 都未能跑完, 請問如何解決呢?

config = tf.ConfigProto()
config.gpu_options.allow_growth = True


Epoch 50/50

5/5 [==============================] - 7s 1s/step - loss: 72.5174 - val_loss: 70.2399

Unfreeze all of the layers.

Train on 90 samples, val on 10 samples, with batch size 16.

Epoch 51/100

---------------------------------------------------------------------------

ResourceExhaustedError                    Traceback (most recent call last)

in ()

    70         epochs=100,

    71         initial_epoch=50,

---> 72         callbacks=[logging, checkpoint, reduce_lr, early_stopping])

    73     model.save_weights(log_dir + 'trained_weights_final.h5')


6 frames

/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py in __call__(self, *args, **kwargs)

  1470         ret = tf_session.TF_SessionRunCallable(self._session._session,

  1471                                                self._handle, args,

-> 1472                                                run_metadata_ptr)

  1473         if run_metadata:

  1474           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)


ResourceExhaustedError: 2 root error(s) found.

 (0) Resource exhausted: OOM when allocating tensor with shape[16,105,105,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

[[{{node zero_padding2d_3/Pad}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


 (1) Resource exhausted: OOM when allocating tensor with shape[16,105,105,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

[[{{node zero_padding2d_3/Pad}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


[[loss_1/add_74/_5299]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


0 successful operations.

0 derived errors ignored.

回答列表

  • 2020/04/13 下午 05:32
    胡連福
    贊同數:0
    不贊同數:0
    留言數:0

    這問題之前也遇過,主要是從 epoch_51 開始就發生資源被耗盡了,因為這時已 unfreeze all layers,會消耗更大的記憶體資源。建議你檢查:

    是否先前在 colab 已做了很多次的訓練占用太多資源了 ? 可以先關機重新連結 colab 再 train 一次。

  • 2020/10/01 下午 07:41
    Patrick Ruan
    贊同數:0
    不贊同數:0
    留言數:0

    可以嘗試把 batch size 調小,來解決 OOM (out of memory)

    同學從學 sdg 到 mini batch,除了了解 mini batch 對 gradient descent 的神效之外,也可以了解一下,記憶體資源不大時,減小 mini-batch 是很有幫助的。當然時間就會有所損失了。