楼主:
magines (magines)
2019-01-18 00:50:00首先先感谢看本文的人,文章可能有点长。
然后我是python 超新手,某些词汇表达不是很精确..造成困扰的话,先说声抱歉。
基本上问题就是:
OverflowError: cannot serialize a bytes object larger than 4 GiB
*************来自github作者,声明发生这个问题的原因*****************
Hi, this is a common problem and stems from some of the patents
having a crazily large amount of text in them.
Reduce the size of the sample on which you're running inference.
E.g., instead of 20% (0.2), reduce it to 0.05 to start with and
try ratcheting it up slowly.
*********结论:patent档案太大了
参考
https://github.com/google/patents-public-data/issues/16
*****请问要怎么切档案?
他把所有的档案,存进一个叫td的东西(在python 上面打 td,他只会出现
<train_data.LandscapeTrainingDataUtil at 0x1369595c0>
完全没有想法要怎么切,也不知道他长怎样....