Paper Title
Analyzing Sparsity–Accuracy Trade-offs in Knowledge-Distilled Transformer Models for Efficient Sentiment Classification
Abstract
This work investigates sequential knowledge distillation and global unstructured pruning for compressing a hybrid Distil BERT–BiLSTM sentiment classification model. Although Transformer-based architectures achieve strong performance, their computational cost limits deployment in resource constrained environments. A Distil BERT–BiLSTM teacher model is first trained and then distilled into a student model using temperature-scaled soft targets. The distilled student is subsequently compressed using global L1-norm pruning at different sparsity levels. The distilled student slightly outperforms the teacher baseline (82.97% vs 82.50% accuracy). The model maintains performance at 20% sparsity, indicating parameter redundancy, while 40% sparsity leads to noticeable degradation. These results indicate that moderate pruning provides an effective balance between efficiency and predictive performance. The proposed sequential distillation and pruning pipeline yields a compact model suitable for deployment in resource-constrained sentiment analysis scenarios.
Keywords - Sentiment Analysis, Knowledge Distillation, Global Unstructured Pruning, DistilBERT-BiLSTM, Model Compression, Sparsity-Accuracy Trade-off