Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Ji, Yixin; Xiang, Yang; Li, Juntao; Chen, Wei; Liu, Zhongyi; Chen, Kehai; Zhang, Min

Computer Science > Computation and Language

arXiv:2405.10616 (cs)

[Submitted on 17 May 2024]

Title:Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Authors:Yixin Ji, Yang Xiang, Juntao Li, Wei Chen, Zhongyi Liu, Kehai Chen, Min Zhang

View PDF HTML (experimental)

Abstract:In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank compression, a promising technique, reduces non-essential parameters by decomposing weight matrices into products of two low-rank matrices. Yet, its application in LLMs has not been extensively studied. The key to low-rank compression lies in low-rank factorization and low-rank dimensions allocation. To address the challenges of low-rank compression in LLMs, we conduct empirical research on the low-rank characteristics of large models. We propose a low-rank compression method suitable for LLMs. This approach involves precise estimation of feature distributions through pooled covariance matrices and a Bayesian optimization strategy for allocating low-rank dimensions. Experiments on the LLaMA-2 models demonstrate that our method outperforms existing strong structured pruning and low-rank compression techniques in maintaining model performance at the same compression ratio.

Comments:	Accepted by 2024 ACL findings
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2405.10616 [cs.CL]
	(or arXiv:2405.10616v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.10616

Submission history

From: Yixin Ji [view email]
[v1] Fri, 17 May 2024 08:27:12 UTC (9,112 KB)

Computer Science > Computation and Language

Title:Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators