問題描述
我想從文本文件中刪除重復(fù)的單詞.
I want to remove duplicate word from a text file.
我有一些文本文件,其中包含如下內(nèi)容:
i have some text file which contain such like following:
None_None
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
ConfigHandler_56663624
None_None
ColumnConverter_56963312
ColumnConverter_56963312
PredicatesFactory_56963424
PredicatesFactory_56963424
PredicateConverter_56963648
PredicateConverter_56963648
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
ConfigHandler_80134888
結(jié)果輸出需要是:
None_None
ConfigHandler_56663624
ColumnConverter_56963312
PredicatesFactory_56963424
PredicateConverter_56963648
ConfigHandler_80134888
我只使用了這個(gè)命令:en=set(open('file.txt')但它不起作用.
I have used just this command: en=set(open('file.txt') but it does not work.
誰(shuí)能幫我從文件中提取唯一的集合
Could anyone help me with how to extract only the unique set from the file
謝謝
推薦答案
這里是關(guān)于保留順序的選項(xiàng)(與集合不同),但仍然具有相同的行為(請(qǐng)注意,EOL 字符被故意剝離并忽略空行)...
Here's about option that preserves order (unlike a set), but still has the same behaviour (note that the EOL character is deliberately stripped and blank lines are ignored)...
from collections import OrderedDict
with open('/home/jon/testdata.txt') as fin:
lines = (line.rstrip() for line in fin)
unique_lines = OrderedDict.fromkeys( (line for line in lines if line) )
print unique_lines.keys()
# ['None_None', 'ConfigHandler_56663624', 'ColumnConverter_56963312',PredicatesFactory_56963424', 'PredicateConverter_56963648', 'ConfigHandler_80134888']
那么你只需要將上面的內(nèi)容寫入你的輸出文件.
Then you just need to write the above to your output file.
這篇關(guān)于從文本文件中刪除重復(fù)項(xiàng)的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!