Journal of Visual Communication and Image Representation Special Issue – Call for Papers
Journal of Visual Communication and Image Representation Special Issue on
Multimodal Learning for Visual Intelligence: From Emerging Techniques to Real-World Applications
In recent years, the integration of vision with complementary modalities such as language, audio, and sensor signals has emerged as a key enabler for intelligent systems that operate in unstructured environments. The emergence of foundation models and cross-modal pretraining has brought a paradigm shift to the field, making it timely to revisit the core challenges and innovative techniques in multimodal visual understanding.
This Special Issue aims to collect cutting-edge research and engineering practices that advance the understanding and development of visual intelligence systems through multimodal learning. The focus is on the deep integration of visual information with complementary modalities such as text, audio, and sensor data, enabling more comprehensive perception and reasoning in real-world environments. We encourage contributions from both academia and industry that address current challenges and propose novel methodologies for multimodal visual understanding.
Topics of interest include, but are not limited to:
- Multimodal data alignment and fusion strategies with a focus on visual-centric modalities
- Foundation models for multimodal visual representation learning
- Generation and reconstruction techniques in visually grounded multimodal scenarios
- Spatiotemporal modeling and relational reasoning of visual-centric multimodal data
- Lightweight multimodal visual models for resource-constrained environments
- Key technologies for visual-language retrieval and dialogue systems
- Applications of multimodal visual computing in healthcare, transportation, robotics, and surveillance
Guest editors:
Guanghui Yue, PhD
Shenzhen University, Shenzhen, China
Email: yueguanghui@szu.edu.cn
Weide Liu, PhD
Harvard University, Cambridge, Massachusetts, USA
Emai: weide001@e.ntu.edu.sg
Ziyang Wang, PhD
The Alan Turing Institute, London, UK
Emai: zwang@turing.ac.uk
Hadi Amirpour, PhD
Alpen-Adria University, Klagenfurt, Austria
Emai: hadi.amirpour@aau.at
Zhedong Zheng, PhD
University of Macau, Macau, China
Email: zhedongzheng@um.edu.mo
Wei Zhou, PhD
Cardiff University, Cardiff, UK
Email: zhouw26@cardiff.ac.uk
Timeline:
Submission Open Date 30/05/2025
Final Manuscript Submission Deadline 30/11/2025
Editorial Acceptance Deadline 30/05/2026
Keywords: Multimodal Learning, Visual-Language Models, Cross-Modal Pretraining, Multimodal Fusion and Alignment, Spatiotemporal Reasoning, Lightweight Multimodal Models, Applications in Healthcare and Robotics