An empirical analysis of compute-optimal large language model training

An empirical analysis of compute-optimal large language model training