Autonomous Tagging of Stack Overflow Questions using Statistical Methods
Tagging of information plays an crucial role in indexing information. StackOverflow is one of the web portals that is based on query answering mechanism. It has a lot of data organized on the basis of tags. The research is focused on proposing an autonomous tagging based system. It uses the concept of ‘Document-Term Matrix to predict various tags associated with a problem. This is done by choosing every tag having probability above a threshold level. The paper helps in showcasing the application of the machine learning models. It also establishes the a statistical relationship between precision and number of questions per tag. This results in optimizing the parameters i.e number of questions per tag and number of tags.
Keywords- Autonomous tagging; RTextTools; SVM; Random-forest Stackoverflow; Document-Term-matrix; multilabel-classification.