我自己来回答
默认过滤掉一个字符长度的词
text = ["我|,|爱你|白Z",
"他|爱狗",
"猫|爱鼠"
]
vectorizer = CountVectorizer(min_df=1, token_pattern='(?u)\\b\\w+\\b')
vectorizer.fit(text)
vector = vectorizer.transform(text)
print (vectorizer.vocabulary_)
print (vector.shape)
print (vector.toarray())