MINT-1T: An Open-Source Trillion Token Multimodal Interleaved Dataset and a Key Component for Training Large Multimodal Models LMMs

MINT-1T: An Open-Source Trillion Token Multimodal Interleaved Dataset and a Key Component for Training Large Multimodal Models LMMs