Feature Extraction and Classification of Web Data
For the last few years, text mining has been evolving and gathering revealing importance. The number of text documents in digital form is increasing and available to users through variety of sources like e-media, digital media and many more. Due to vast availability of text, a lot of unstructured data has been collected and converted into defined structured data. This process is known as text classification. High dimensionality of feature space is one of the problems in text classification. This is solved by feature selection and feature extraction methods and improves the performance of text classification. The feature extraction techniques remove the irrelevant and useless features from the text documents and reduce the dimensionality of feature space. This paper proposed a system for feature extraction and classification of text data. First features of text are extracted and then classified by classifier. The proposed solution is based on semi supervised learning. Datasets used for training and testing will be obtained from user feedback from different web sites. The results show that the proposed feature extraction and classification approach is simple, computationally tractable, and achieves low error rates.
Keywords - Text Mining, Text Classification, Feature Extraction.