This paper introduces CommitPack, a large dataset of 4TB of code commits, and HumanEvalPack, a benchmark covering code synthesis, repair and explanation across 6 languages. The authors use CommitPack to create OctoCoder and OctoGeeX, the best commercially-licensed code LLMs. However, they still fall short of closed-source models like GPT-4, suggesting more progress is n...