Abstract:
Since the dawn of civilization, people are using the writing method to express their thoughts and views. To expose one's feelings it is the best way till now. Social network and many different blogs have a large amount of data, but people don’t provide their personal data such as age and other demographics. Age groups classification from text analysis has become a leading context for scientific and commercial market research in the field of machine learning. Currently, it’s a more prominent research field of English language processing system as there is few researches works regarding text analysis for this language. There are still failures to identify perfect age group because they do not consider the most important parameter which can influence the overall result. The main objective of this research is to develop systems that which word are more frequent in a particular age group. Different machine learning algorithm is used for the classification of the teenager and adult group. Almost 100k sentence was performed to determine which parameter is relevant. Logistic Regression with TF-IDF had the best performance reaching a precision .80 in the validation test. To make the mechanism more efficient and accurate unigram method has been implemented. Several techniques have been integrated for data collection and data processing to make the system more reliable and flexible. Adequate instances and experiments are also provided to describe the methodology for both approaches.
Description:
This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Information and Communication Engineering of East West University, Dhaka, Bangladesh