蒋超亚,山东大学控制科学工程学院研究员。本科毕业于南京大学计算机科学与技术系,硕士毕业于北京大学王选计算机研究所,博士毕业于北京大学软件工程国家工程研究中心知识计算实验室(师从 张世琨 教授)。主持2023年国家自然科学基金青年学生基础研究项目(博士研究生),2024年中国电子学会-腾讯博士科研激励计划项目及2025年山东省重点研发计划课题等多个项科研项目。获得2023年北京市科技进步一等奖,北京大学校长奖等多项奖励。与北京大学,阿里巴巴通义实验室,腾讯公司等多家高水平研究机构合作紧密。主要从事多模态大语言模型及其在新能源领域的应用,具体方向包括:1.通用多模态大模型推理能力增强;2.基于多模态大模型的智能体(Agent);3. 新能源垂域多模态大模型开发及应用。以一作/共一身份目前在人工智能领域CCF-A类顶级会议和期刊(NuerlPS,CVPR,ICCV,ACL,AAAI等)发表论文数十篇,研究成果得到了同行和业界的高度认可。
招收实习生及科研助理:
欢迎人工智能、大模型以及AI+新能源方向感兴趣且有想法的同学报名。会手把手带你入门,提供细致的科研指导和充足的算力支持,一起发顶会顶刊。提倡平等交流,尊重每个人的想法。对于优秀的学生,我会尽力推荐去北大等顶尖高校深造,期待与你一起探索科研!
通用多模态大模型:
1. 研究类R-1 强推理多模态大模型,包括Thinking with Image 推理范式,强化学习对齐算法,特定场景下的推理能力评估等。
2. 研究复杂多模态Agent智能体,包括面向GUI的多模态Agent 构建, 面向风光储氢新能源控制的智能体设计优化等。
新能源多模态大模型:
1. 面向复杂设备运维的智慧运维大模型
2. 综合能源智慧管控大模型
[1] Jiang, C.*, Hongruijia*(Equal Contribution),, Ye, W., Xu, H., Yan, M., et al. (2024). MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model. *Proceedings of the Conference on Neural Information Processing Systems (NIPS 2024)*. (CCF-A)
[2] Hongruijia*, Jiang, C.*(Equal Contribution), Ye, W., et al. (2025). SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization. *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)*. (CCF-A)
[3] Jiang, C., Ye, W., Dong, M., Jia, H., et al. (2024). Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models. *Proceedings of the ACM International Conference on Multimedia (MM 2024)*. (CCF-A)
[4] Jiang, C., Xu, H., Dong, M., Chen, J., Ye, W., et al. (2024). Hallucination Augmented Contrastive Learning for Multimodal Large Language Model. *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)*. (CCF-A)
[5] Jiang, C., Ye, W., Xu, H., Ye, Q., Yan, M., Zhang, J., & Zhang, S. (2024). TiMix: Text-aware Image Mixing for Effective Vision-Language Pretraining. *Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2024)*. (CCF-A)
[6] Jiang, C., Xu, H., Ye, W., Ye, Q., et al. (2023). BUS: Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization. *Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023)*. (CCF-A)
[7] Jiang, C., Xu, H., Ye, W., Ye, Q., et al. (2023). COPA: Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment. *Proceedings of the ACM International Conference on Multimedia (MM 2023)*. (CCF-A)
[8] Jiang, C., Ye, W., Xu, H., Yan, M., et al. (2023). Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation. *Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)*. (CCF-A)
[9] Jiang, C., Xie, R., Ye, W., Sun, J., & Zhang, S. (2023). Exploiting Pseudo Image Captions for Multimodal Summarization. *Finds of the Association for Computational Linguistics: ACL 2023*.
[10] Jiang, C., Xu, H., Li, C., Yan, M., et al. (2022). TRIPS: Efficient Vision-and-Language Pre-training with Text-relevant Patch Selection. *Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)*. (CCF-B)
[11] Jiang, C., Yang, D., & Chen, X. (2020). Similarity Learning For Cover Song Identification Using Cross-Similarity Matrices of Multi-Level Deep Sequences. *Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)*. (CCF-B)
[12] Jiang, C., Yang, D., & Chen, X. (2020). Learn A Robust Representation For Cover Song Identification Via Aggregating Local And Global Music Temporal Context. *Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2020)*. (CCF-B)