Re: [问题] 想请问有更有效率的写法吗?

楼主: forloricever (sigh...)   2014-08-30 07:15:35
※ 引述《sariel0322 (sariel)》之铭言:
: 我想把一个列数相当多的csv档案
: 把里面重复的列数给删除掉
: 我只能想到这种写法:
: import csv
: rows = []
: a = 0
: o = open("output.csv","w")
: f = open("input.csv","r")
: for row in csv.reader(f):
: rows.append(row[0]+","+row[1]+","+row[2]+","+row[3]+","+row[4]+","+row[5]+","+row[6]+","+row[7]+","+row[8]+","+row[9]+","+row[10])
: for i in set(rows):
: o.write(i+"\n")
: f.close()
: o.close()
: 但由于行数非常多,资料量也大(csv档案约400mb)
: 因此全部跑完可能需要五天(有写个计数器来大约计算过,为了节省空间没列出来)
: 想请问有没有更有效率的写法
用 pandas
import pandas as pd
data = pd.read_csv('input.csv')
data.drop_duplicates().to_csv('output.csv')

Links booklink

Contact Us: admin [ a t ] ucptt.com