Hindi is an Indic Language written in Devanagari Script. Hindi is the 4th largest spoken language, It follows Unicode (UTF-8) standards
- Hindi is morphologically rich which means that lot of information is contained in each of the words as compared to English.
2. Hindi is a free order language which means it can have text arranged in any word order unlike English that follows Subject-Verb-Object order for the sentence to be Grammatically Valid.
Hence preprocessing steps like Lemmatization isn’t a good idea.