Welcome to another module of Practical Data Science course. In this module, we will cover the basics of text mining. After completing this module, you will be comfortable with anything possible in text mining. This module starts with the definition of text mining. After that, you will learn the process of text mining. Then we will focus on application of text mining. The advantages and challenges of text mining will be discussed after it. And finally, we will implement the basic concepts of text mining in python.
Text mining is the process of exploring and analyzing large amounts of unstructured text data with the help of software that can find concepts, patterns, topics, keywords, and other attributes in the data.
It's also called text analytics, though some think the two terms are different. In their view, text analytics is the application that sorts through data sets by using text mining techniques. Sometimes, you will hear people using ‘Text Data Mining’ or ‘Document Mining’ instead of text mining. No matter which name is used, they all refer to the same thing. And that is the process of exploring unstructured text data to discover useful information.
Text mining has become more useful for data scientists and other users since big data platforms and deep learning algorithms that can analyze large amounts of unstructured data have become available.
Mining and analyzing text help businesses find potentially valuable business insights in corporate documents, customer emails, call center logs, verbatim survey comments, social network posts, medical records, and other text-based data sources. Text mining is also increasingly used in AI chatbots and virtual agents that companies use to respond to customers automatically as part of their marketing, sales, and customer service operations.
Here is the code used in this tutorial:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import sent_tokenize
text = '''Hello Mr. Jones, how are you doing today? The weather is great, and city is awesome.
The sky is bright-blue. You should't call for meeting today'''
tokenized_text = sent_tokenize(text)
print(tokenized_text)
from nltk.tokenize import word_tokenize
tokenized_word = word_tokenize(text)
print(tokenized_word)
from nltk.probability import FreqDist
frequency = FreqDist(tokenized_word)
print(frequency)
frequency.most_common(3)
import matplotlib.pyplot as plt
frequency.plot(30, cumulative=False)
plt.show()
from nltk.corpus import stopwords
stop_words=set(stopwords.words("english"))
print(stop_words)
filtered_sent = []
for w in tokenized_text:
if w not in stop_words:
filtered_sent.append(w)
print("Tokenized Sentence: ", tokenized_text)
print("Filtered Sentence: ", filtered_sent)
from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
ps = PorterStemmer()
stemmed_words=[]
for w in filtered_sent:
stemmed_words.append(ps.stem(w))
print("Filtered Sentence:", filtered_sent)
print("Stemmed Sentence:", stemmed_words)
from nltk.stem.wordnet import WordNetLemmatizer
lem = WordNetLemmatizer()
from nltk.stem.porter import PorterStemmer
stem = PorterStemmer()
word = "Working"
print("Lemmatized Word: ", lem.lemmatize(word, "v"))
print("Stemmed Word: ", stem.stem(word))
word = "Flying"
print("Lemmatized Word: ", lem.lemmatize(word, "v"))
print("Stemmed Word: ", stem.stem(word))
sentence = "Albert Einstein was born in Ulm, Germany in 1879"
tokens = nltk.word_tokenize(sentence)
print(tokens)
nltk.pos_tag(tokens)
Смотрите видео Text Mining Basics in Python онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Nuruzzaman Faruqui 24 Август 2022, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 12,950 раз и оно понравилось 273 людям.