Commit Graph

2229 Commits

Author SHA1 Message Date
Glenn Jocher 9e9a6a1425 updates 2019-11-27 15:50:29 -10:00
Glenn Jocher 82b62c9855 updates 2019-11-27 15:50:00 -10:00
Glenn Jocher 4b251406e2 updates 2019-11-27 15:04:05 -10:00
Glenn Jocher 91fca0e17d updates 2019-11-27 15:03:05 -10:00
Glenn Jocher 9319ae8ff9 updates 2019-11-27 15:00:41 -10:00
Glenn Jocher 413afab11c updates 2019-11-27 14:59:46 -10:00
Glenn Jocher 9c1d7d5248 updates 2019-11-27 14:52:33 -10:00
Glenn Jocher ea19c33a87 updates 2019-11-27 14:35:18 -10:00
Glenn Jocher 3dec99b16c updates 2019-11-26 16:03:45 -10:00
Glenn Jocher 0417b3a527 updates 2019-11-26 13:53:05 -10:00
Glenn Jocher 78a2de52b5 updates 2019-11-26 13:23:47 -10:00
Glenn Jocher b04392e298 updates 2019-11-26 12:59:13 -10:00
Glenn Jocher 40ae87cb46 updates 2019-11-26 12:36:21 -10:00
Glenn Jocher 0fe40cb687 updates 2019-11-26 12:34:47 -10:00
Glenn Jocher 92f742618c updates 2019-11-26 10:26:14 -10:00
Glenn Jocher b269ed7b29 updates 2019-11-25 18:42:48 -10:00
Glenn Jocher 3c57ff7b1b updates 2019-11-25 17:24:05 -10:00
Glenn Jocher 90cfb91858 updates 2019-11-25 17:13:10 -10:00
Glenn Jocher 75e8ec323f updates 2019-11-25 11:45:28 -10:00
Glenn Jocher 0245ff9133 updates 2019-11-25 08:26:41 -10:00
Francisco Reveriano 26e3a28bee Update train.py for distributive programming (#655)
When attempting to running this function in a multi-GPU environment I kept on getting a runtime issue. I was able to solve this problem by passing this keyword. I first found the solution here: 
https://github.com/pytorch/pytorch/issues/22436
and in the pytorch tutorial

'RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable). '
2019-11-24 22:21:36 -10:00
Glenn Jocher a0ef217842 updates 2019-11-24 20:10:39 -10:00
Glenn Jocher 9b55bbf9e2 updates 2019-11-24 20:08:24 -10:00
Glenn Jocher 7773651e8e updates 2019-11-24 18:38:30 -10:00
Glenn Jocher 2f1c9a3f6f updates 2019-11-24 18:31:06 -10:00
Glenn Jocher f12a2a513a updates 2019-11-24 18:29:29 -10:00
Glenn Jocher 5f00d7419e updates 2019-11-23 19:27:33 -10:00
Glenn Jocher 4aff400777 updates 2019-11-23 19:23:31 -10:00
Glenn Jocher b027c66048 updates 2019-11-23 13:34:37 -10:00
Glenn Jocher 6c6aa483d7 updates 2019-11-23 13:23:38 -10:00
Glenn Jocher 46161ed94d updates 2019-11-23 12:09:46 -10:00
Glenn Jocher 55a6b05228 updates 2019-11-23 09:35:11 -10:00
Glenn Jocher bdf11ffdf1 updates 2019-11-23 09:25:21 -10:00
Glenn Jocher d623a425d9 updates 2019-11-22 16:20:11 -10:00
Glenn Jocher f1e8d23d39 updates 2019-11-22 14:36:49 -10:00
Glenn Jocher 4c61611ce0 updates 2019-11-22 14:20:35 -10:00
Glenn Jocher a137c21dc0 updates 2019-11-22 14:06:16 -10:00
Glenn Jocher 54d907d8c8 updates 2019-11-22 14:03:46 -10:00
Glenn Jocher 46da9fd26c updates 2019-11-22 13:38:28 -10:00
Glenn Jocher bbd6c884e6 updates 2019-11-22 13:27:23 -10:00
Glenn Jocher e701979862 updates 2019-11-22 13:03:29 -10:00
Glenn Jocher 3834b77961 updates 2019-11-21 11:52:48 -08:00
Glenn Jocher 7c59715fda updates 2019-11-21 00:00:17 -08:00
Glenn Jocher f38723c0bd updates 2019-11-20 19:34:22 -08:00
Glenn Jocher a0067ac8fb updates 2019-11-20 19:10:36 -08:00
Glenn Jocher 74b57500c7 updates 2019-11-20 16:02:57 -08:00
Glenn Jocher 3a4ed8b3ab updates 2019-11-20 13:40:24 -08:00
Glenn Jocher bb209111c4 updates 2019-11-20 13:36:15 -08:00
Glenn Jocher 8e327e3bd0 updates 2019-11-20 13:33:25 -08:00
Glenn Jocher 2950f4c816 updates 2019-11-20 13:26:50 -08:00