An Efficient Approach to Reduce Text Dimension for Precise Text Classification
In modern society, some famous news websites such as Google and sina server gives information every day for many users. But nowadays with the continuous development of information technology, the quantity of disorder data is increasing in volume. Text classification and organization has become a challenge. The traditional manual classification of news text not only consumes a lot of human and financial resources, but classification is also not achieved quickly. This paper makes a research about the news text classification. A news text classification model is proposed based on Latent Dirichlet Allocation (LDA) and Domain Word Filtering. The model reduces the features dimension of the news text effectively and gets good classification results. This model uses topic model to reduce text dimension and get good features as the dimension of the news texts is too high.
Keywords - Topic Model, LDA, Domain Word Filtering, News Website, Text Classification