Re: [问题] 函式写入txt ccwang002 PTT批踢踢实业坊

Re: [问题] 函式写入txt

楼主: ccwang002 (亮) 2014-04-25 19:23:51

※ 引述《harohepowegr (harohepowegr)》之铭言：
: 先读入两个txt档
: b1
: This is a book
: This is a pen
: This is a table
: This is a desk
: b2
: This is a papaya
: This is a pineapple
: This is a banana
: This is a melon
我想这是一个计算每个单字（以空格作断词）的一只程式。
以下是我自己的意见提供你参考，也欢迎大家讨论 ~
: 以下为程式码
: #读档函式
: def readbook(filename):
: readin = open(filename)
: count = []
: for line in readin:
: letter = line.split()
: count = count + letter
: return count
首先，如果是增加一个 list 内的元素，可以善用 .append(), .extend()
这边可以改写成 count.extend(line.split())
def readbook(filename):
count = []
with open(filename) as f:
for line in f:
count.extend(line.split())
return count
: #计算出现次数
: def list2dict(count):
: ldict = dict()
: for ch in count:
: ldict[ch]=ldict.get(ch,0)+1
: return ldict
用 dict 计算词频，可以善用 Python standard library 中的 collections.Counter
https://docs.python.org/3/library/collections.html#collections.Counter
此段程式码可以用 return Counter(count) 来取代
Counter 有非常多好用的功能，包含他相加就是次数的相加，不存在的 key 会回传 0，
提供 .most_common() 的词频排序功能…（请看官网说明）
在这边能用 Counter.update() 的函式来更新词频，可以传入 list 或者 dict
: book1 = readbook('b1.txt')
: book2 = readbook('b2.txt')
: text = list2dict(book1+book2)
: print('全部出现的单字和次数')
: print(list2dict(book1+book2))
因此整只程式可以改写成：
from collections import Counter
word_freq = Counter(readbook('b1.txt'))
word_freq.update(readbook('b2.txt'))
print(word_freq)
为什么写入的内容不一样，你可以试试以下的函式：
with open('output.txt', 'w') as f:
f.write(str(word_freq))
细节可以再想想~
再来讲断词的部份。这些程式码可以写得再简单一点，
在 readbook 中可以用双层的 list comprehension 来完成：
with open(filename) as f:
return Counter(
[wd for line in f for wd in line.split()]
)
或者，他根本可以是一个 generator
return Counter(
w for l in f for w in line.split()
)
以上~

作者: phisixersai (AHAMAY) 2014-04-25 20:07:00

python之神亮亮OP

楼主: ccwang002 (亮) 2014-04-25 20:15:00

楼上大神好说好说

作者: jacky7987 (忆) 2014-04-26 00:28:00

这好方便推推

作者: KSJ (阿真) 2014-04-26 17:29:00

推~

继续阅读

[问题] 函式写入txtharohepowegr [问题] 找出sublist在list中出现过的位置ptero Fw: [征才]前/后端/算法工程师（新加坡职缺）walao81 [问题] python 不能在 sublime text 2 跑lovepisces13 Re: [问题] list 里的元素累加qwertmn Re: [问题] list 里的元素累加flarehunter [问题] list 里的元素累加Drunk5566 Re: [问题] 用python将tsv档转成xlsccwang002 Re: [闲聊] 揪团团报 PyConAPAC 2014flydude [问题] 用python将tsv档转成xlssariel0322