在模型架构设计中,更宽且更浅的神经网络拥有更好的记忆能力,而更深且更瘦的网络则呈现出更强的推理能力。
[1]Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems, pages 7–10, 2016.
[2]Mobilellm: Optimizing sub-billion parameter language models for on-device use cases. arXiv preprint arXiv:2402.14905, 2024
[3]FOX-1 TECHNICAL REPORT