Knowledge distillation from large language model to compact model for personalized training program generation for pilots and flight crew members of civil aviation
Аuthors
*, **Moscow Aviation Institute (National Research University), 4, Volokolamskoe shosse, Moscow, А-80, GSP-3, 125993, Russia
*e-mail: michaeltatarenko@gmail.com
**e-mail: lukinvn@list.ru
Abstract
Modern large language models based on the Transformer architecture demonstrate high effectiveness in personalized content generation tasks. However, their deployment on resource-constrained devices remains a challenging problem, limiting the possibilities of autonomous application in training systems for flight crew and cabin crew of civil aviation. This paper proposes an approach to knowledge distillation from the Qwen 3 4B model into the compact Gemma 3 270M model for generating personalized physical training programs for pilots and flight crew members of civil aviation. The model was trained using the Low-Rank Adaptation (LoRA) method on a synthetic dataset of 67,392 samples generated via the Self-Instruct methodology, along with specialized software libraries. A 27-fold model compression was achieved (from 8 GB to 300 MB in quantized format) with a final loss value of 0.029, indicating successful approximation of the teacher model's behavior. The resulting model can be deployed on a consumer-grade GPU with 6 GB of memory, making it suitable for autonomous use in civil aviation training centers and enabling simultaneous service of multiple users.
Keywords:
knowledge distillation, compact language models, fine-tuning, LoRA, personalization, civil aviationReferences
- Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems, 2017, vol. 30, pp. 5998–6008.
- Brown T.B., Mann B., Ryder N., Subbiah M., Kaplan J.D., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I., Amodei D. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020, vol. 33, pp. 1877–1901.
- Touvron H., Lavril T., Izacard G., Martinet X., Lachaux M.-A., Lacroix T., Rozière B., Goyal N., Hambro E., Azhar F., Rodriguez A., Joulin A., Grave E., Lample G. LLaMA: Open and efficient foundation language models. ArXiv : website. Available at: https://arxiv.org/abs/2302.13971 (accessed: 12 December 2025).
- Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Human Language Technologies, Minneapolis, 2019, pp. 4171–4186.
- Hinton G., Vinyals O., Dean J. Distilling the knowledge in a neural network. ArXiv : website. Available at: https://arxiv.org/abs/1503.02531 (accessed: 12 December 2025).
- Sanh V., Debut L., Chaumond J., Wolf T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing at NeurIPS 2019, Vancouver, 2019, 5 p.
- Gou J., Yu B., Maybank S.J., Tao D. Knowledge Distillation: a survey. International Journal of Computer Vision, 2021, vol. 129, pp. 1789–1819.
- Gholami A., Kim S., Dong Z., Yao Z., Mahoney M.W., Keutzer K. A Survey of Quantization methods for efficient neural network inference. ArXiv : website. Available at: https://arxiv.org/abs/2103.13630 (accessed: 12 December 2025).
- Li X.L., Liang P. Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021, pp. 4582–4597.
- Houlsby N., Giurgiu A., Jastrzebski S., Morrone B., De Laroussilhe Q., Gesmundo A., Attariyan M., Gelly S. Parameter-Efficient Transfer Learning for NLP. Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, 2019, pp. 2790–2799.
- Bai J., Bai S., Chu Y., Cui Z., Dang K., Deng X., Fan Y., Ge W., Han Y., Huang F. et al. Qwen Technical Report. ArXiv : website. Available at: https://arxiv.org/abs/2309.16609 (accessed 12 December 2025).
- Wang Y., Kordi Y., Mishra S., Liu A., Smith N.A., Khashabi D., Hajishirzi H. Self-Instruct: Aligning Language Models with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, 2023, pp. 13484–13508.
- Gemma Team, Mesnard T., Hardin C., Dadashi R., Bhupatiraju S., Pathak S., Sifre L., Rivière M., Kale M.S., Love J. et al. Gemma: Open Models Based on Gemini Research and Technology. ArXiv : website. Available at: https://arxiv.org/abs/2403.08295 (accessed: 12 December 2025).
- Hu E.J., Shen Y., Wallis P., Allen-Zhu Z., Li Y., Wang S., Wang L., Chen W. LoRA: Low-Rank Adaptation of Large Language Models. Proceedings of the 10th International Conference on Learning Representations (ICLR 2022), 2022, 13 p.
- Loshchilov I., Hutter F. Decoupled Weight Decay Regularization. Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, 2019, 19 p.
- Dettmers T., Pagnoni A., Holtzman A., Zettlemoyer L. QLoRA: Efficient Finetuning of Quantized LLMs. Advances in Neural Information Processing Systems, 2023, vol. 36, pp. 10088–10115.
- Wei J., Bosma M., Zhao V.Y., Guu K., Yu A.W., Lester B., Du N., Dai A.M., Le Q.V. Finetuned Language Models Are Zero-Shot Learners. Proceedings of the 10th International Conference on Learning Representations (ICLR 2022), 2022, 46 p.
- Wolf T., Debut L., Sanh V., Chaumond J., Delangue C., Moi A., Cistac P., Rault T., Louf R., Funtowicz M., Davison J., Shleifer S., von Platen P., Ma C., Jernite Y., Plu J., Xu C., Le Scao T., Gugger S., Drame M., Lhoest Q., Rush A.M. Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 38–45.
- Unsloth: Fine-tuning & Reinforcement Learning for LLMs. Github.com : website. Available at: https://github.com/unslothai/unsloth (accessed 12 December 2025).
- Dao T., Fu D.Y., Ermon S., Rudra A., Ré C. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. Advances in Neural Information Processing Systems, 2022, vol. 35, pp. 16344–16359.
Download

