non_blocking=True indicates that the tensor will be moved to the GPU in a background thread. So, if you try to access
data immediately after executing the statement, it may still be on the CPU. If you need to use the data in the very next statement, then using
non_blocking=True won’t really help because the next statement will wait till the data has been moved to the GPU.
On the other hand, if you need to move several objects to the GPU, you can use
non_blocking=True to move to the GPU in parallel using multiple background threads.
In general, you can always use
non_blocking=True. The only risk is that there may be some weird edge cases which are not handled properly in the internal implementation (since parallel programming is hard), which I suspect is the reason why the default value of
See this thread for a more detailed discussion: https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/9