If you are going to go to the bother of fine tuning for trivial problems like subject classification then I think you'll find Scikit Learn with a SGDClassifier on 2-grams will do probably just as well and be under 1MB for the trained classifier.<p>You can train it in under a minute, and it will work perfectly well on embedded devices.<p>Small LLMs are good choices for text classification in two cases:<p>- If you next to provide in-context examples and classifier based on them.<p>- Your classification goes beyond simple subject-type classifiers. For example, multiple choice question answering is classification where small LLM will work but traditional ML methods won't/
If you want to go deeper on language models, try these project ideas:<p>- Zero-shot encoders like tasksource or GliNER<p>- Natural language inference: <a href="https://huggingface.co/blog/dleemiller/nli-xenc-ways-to-use" rel="nofollow">https://huggingface.co/blog/dleemiller/nli-xenc-ways-to-use</a><p>- GRPO training<p>- GEPA prompt tuning Qwen 0.6B (or GEPA, then GRPO)<p>- Use an embedding model and train a classifier (MLP, logistic, svm)<p>- Use a larger LLM to generate a synthetic dataset (beware of lack of diversity, mine "seed text" from real sources first)<p>- Synthetically generate "hard examples" where more than one category may be valid and DPO tune your preferred responses
If you are interested in small language model to fine tune, gemma3:270m is quite interesting for its size
I think the Qwen 0.6B is so cool. It is super fast and as illustrated here it has a clear niche, esp. when fine-tuned.<p>I'm also interested in it as a student for distillation.