01Text preprocessing: lowercase normalisation, digit removal, punctuation stripping, stopword filtering to reduce noise
02TF-IDF vectorisation with unigrams and bigrams, 15,000 features; bigrams capture compound terms such as "space shuttle" and "gun control" that meaningfully improve classification
03Three-classifier benchmark: Multinomial Naive Bayes, Logistic Regression, Linear SVM
04Linear SVM selected: consistent advantage on high-dimensional sparse TF-IDF feature spaces
05Confusion matrix and per-class F1 evaluation reveals which category pairs are most frequently confused
06Keyword analysis: coefficient extraction shows the 15 most discriminative terms per category