可是paper里好像只有用人类棋谱来建立SL策略网络,尽管它其实可以重新用训练好的AlphaGo重建。当然这也可能是因为写paper时AlphaGo还不够强啦,不过他们的paper好像是说希望有某种被人类筛选过的噪声。It is worth noting that the SL policy network per-formed better in AlphaGo than the stronger RL policynetwork, presumably because humans select a diversebeam of promising moves, whereas RL optimizes forthe single best move.今天的赛前访问我听的感觉像是在阐述人工智能的前景和研发方向,不太记得是不是有特别指围棋@@嗯对我想错了其实不是同一件事XD,至少paper也没解释为
作者: lwei781 (nap til morning?) 2016-03-12 17:35:00