Use app×
Join Bloom Tuition
One on One Online Tuition
JEE MAIN 2025 Foundation Course
NEET 2025 Foundation Course
CLASS 12 FOUNDATION COURSE
CLASS 10 FOUNDATION COURSE
CLASS 9 FOUNDATION COURSE
CLASS 8 FOUNDATION COURSE
0 votes
5.0k views
in Artificial Intelligence (AI) by (47.7k points)
closed by

What are the steps of text Normalization? Explain them in brief.

1 Answer

+1 vote
by (44.7k points)
selected by
 
Best answer

Text Normalizationin Text Normalization, we undergo several steps to normalize the text to a lower level.

Sentence Segmentation - Under sentence segmentation, the whole corpus is divided into sentences. Each sentence is taken as a different data so now the whole corpus gets reduced to sentences.

Tokenisation- After segmenting the sentences, each sentence is then further divided into tokens. Tokens is a term used for any word or number or special character occurring in a sentence. Under tokenisation, every word, number and special character is considered separately and each of them is now a separate token.

Removing Stop words, Special Characters and Numbers - In this step, the tokens which are not necessary are removed from the token list.

Converting text to a common case -After the stop words removal, we convert the whole text into a similar case, preferably lower case. This ensures that the case-sensitivity of the machine does not consider same words as different just because of different cases.

Stemming In this step, the remaining words are reduced to their root words. In other words, stemming is the process in which the affixes of words are removed and the words are converted to their base form.

Lemmatization -in lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one.

With this we have normalized our text to tokens which are the simplest form of words present in the corpus. Now it is time to convert the tokens into numbers. For this, we would use the Bag of Words algorithm

Welcome to Sarthaks eConnect: A unique platform where students can interact with teachers/experts/students to get solutions to their queries. Students (upto class 10+2) preparing for All Government Exams, CBSE Board Exam, ICSE Board Exam, State Board Exam, JEE (Mains+Advance) and NEET can ask questions from any subject and get quick answers by subject teachers/ experts/mentors/students.

Categories

...