Files
task-3-2-1-Text-Processing-…/from sklearn.feature_extraction.text imp.ini
2026-04-23 15:53:06 +08:00

17 lines
466 B
INI
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

from sklearn.feature_extraction.text import CountVectorizer
docs = [
"Python 是 编程 语言",
"Java 是 编程 语言",
"Python Python Python"
]
vectorizer = CountVectorizer(tokenizer=lambda x: x.split())
X = vectorizer.fit_transform(docs)
print("词表Vocabulary", vectorizer.get_feature_names_out())
print("Doc1 向量:", X.toarray()[0])
print("Doc2 向量:", X.toarray()[1])
print("Doc3 向量:", X.toarray()[2])