Train bpe tokenizer. train extracted from open source … Hi @dszhengyu,.