3D Multimedia

Analytics, Search and Generation

In Conjunction with ICME 2022

July 22, 2022, Taipei, Taiwan

News !

  • May 17, 2022:   Eight papers are accepted. Congratulations to the authors.

  • July 6, 2022:   We are honored to invite Prof. Cewu Lu, Prof. Feng Xu and Prof. Minhyuk Sung to give keynotes.

  • July 7, 2022:   The topic of Prof. Cewu Lu's talk is ''3D Semantics in Points''

  • July 7, 2022:   The topic of Prof. Feng Xu's talk is ''Interaction Motion Reconstruction Based on Deep Learning''

  • July 19, 2022:   The topic of Prof. Minhyuk Sung's talk is ''Language-Driven Shape Analysis and Manipulation''

  • July 19, 2022:   The details of three keynotes can be found at here


   Today, ubiquitous multimedia sensors and large-scale computing infrastructures are producing at a rapid velocity of 3D multi-modality data, such as 3D point cloud acquired with LIDAR sensors, RGB-D videos recorded by Kinect cameras, meshes of varying topology, and volumetric data. 3D multimedia combines different content forms such as text, audio, images, and video with 3D information, which can perceive the world better since the real world is 3-dimensional instead of 2-dimensional. For example, the robots can manipulate objects successfully by recognizing the object via RGB frames and perceiving the object size via point cloud. Researchers have strived to push the limits of 3D multimedia search and generation in various applications, such as autonomous driving, robotic visual navigation, smart industrial manufacturing, logistics distribution, and logistics picking. The 3D multimedia (e.g., the videos and point cloud) can also help the agents to grasp, move and place the packages automatically in logistics picking systems. Therefore, 3D multimedia analytics is one of the fundamental problems in multimedia understanding. Different from 3D vision, 3D multimedia analytics mainly concentrate on fusing the 3D content with other media. It is a very challenging problem that involves multiple tasks such as human 3D mesh recovery and analysis, 3D shapes and scenes generation from real-world data, 3D virtual talking head, 3D multimedia classification and retrieval, 3D semantic segmentation, 3D object detection and tracking, 3D multimedia scene understanding, and so on. Therefore, the purpose of this workshop is to: 1) bring together the state-of-the-art research on 3D multimedia analysis; 2) call for a coordinated effort to understand the opportunities and challenges emerging in 3D multimedia analysis; 3) identify key tasks and evaluate the state-of-the-art methods; 4) showcase innovative methodologies and ideas; 5) introduce interesting real-world 3D multimedia analysis systems or applications; and 6) propose new real-world or simulated datasets and discuss future directions. We solicit original contributions in all fields of 3D multimedia analysis that explore the multi-modality data to generate the strong 3D data representation. We believe this workshop will offer a timely collection of research updates to benefit researchers and practitioners in the broad multimedia communities.

Call for papers

   We invite submissions for ICME 2022 Workshop, 3D Multimedia Analytics, Search and Generation (3DMM2022), which brings researchers together to discuss robust, interpretable, and responsible technologies for 3D multimedia analysis. We solicit original research and survey papers that must be no longer than 6 pages (including all text, figures, and references). Each submitted paper will be peer-reviewed by at least three reviewers. All accepted papers will be presented as either oral or poster presentations, with the best paper award. Papers that violate anonymity, do not use the ICME submission template will be rejected without review. By submitting a manuscript to this workshop, the authors acknowledge that no paper substantially similar in content has been submitted to another workshop or conference during the review period. Authors should prepare their manuscript according to the Guide for Authors of ICME available at Author Guidelines. The paper submission website is available at here. Please make sure your paper is submitted to the correct track. The latex template is available at here and the word template is available at here.
  The scope of this workshop includes, but is not limited to, the following topics:

  • Generative Models for 3D Multimedia and 3D Multimedia Synthesis
  • Generating 3D Multimedia from Real-world Data
  • 3D Multimodal Analysis and Description
  • Multimedia Virtual/Augmented Reality
  • 3D Multimedia Systems
  • 3D Multimedia Transport and Delivery
  • 3D Multimedia Search and Recommendation
  • 3D Multimedia Art, Entertainment and Culture
  • Mobile 3D Multimedia
  • 3D Shape Estimation and Reconstruction
  • 3D Scene Understanding
  • 3D Semantic Segmentation
  • 3D Object Detection and Tracking
  • 3D Multimedia Data Understanding for Robotics
  • High-level Representation of 3D Multimedia Data
  • 3D Multimedia Application in Industry

  Fast Review for Rejected Regular Submissions of ICME 2022
  We set up a Fast Review mechanism for the regular submissions rejected by the ICME main conference. We strongly encourage the rejected papers to be submitted to this workshop. In order to submit through Fast Review, authors must write a front letter (1 page) to clarify the revision of the paper and attach all previous reviews. All the papers submitted through Fast Review will be directly reviewed by meta-reviewers to make the decisions.

Important Dates

Description Date (UTC +8)
Paper Submission Deadline March 20, 2022
Notification of Acceptance April 25, 2022
Camera-Ready Due Date May 2, 2022
Workshop Date July 22, 2022

Workshop Agenda

Date (UTC +8) Description
13:00 - 13:10 Opening
13:10 - 13:50 Keynote 1: 3D Semantics in Points
13:50 - 14:30 Keynote 2: Interaction Motion Reconstruction Based on Deep Learning
14:30 - 15:10 Keynote 3: Language-Driven Shape Analysis and Manipulation
15:10 - 15:15 Tea Break
15:15 - 16:20 8 Oral Presentations(~8min*8)
16:20 - 16:30 Discussion and Closing

Invited speakers

 Cewu Lu
Shanghai Jiao Tong University, China
Title: 3D Semantics in Points
Abstract: Point-level semantics understanding is the fundamental way for object manipulation knowledge transfer. However, current literature neither lacks fine-grained semantics, nor has the ability to understand objects in the wild. To solve these problems, we propose: (1) a generalized framework for sparse keypoint detection, together with dense semantics learning algorithm; (2) a semantics-rich rotation-invariant point descriptor, which can be used for dense semantics matching and retrieval; (3) a novel voting scheme to detect object poses in the wild, through the interaction between individual points.
Biography: Cewu Lu is a professor at Shanghai Jiao Tong University. His research interests fall mainly in Computer Vision and Intelligent Robot. He has published more than 100 papers at top conferences and journals, like Nature/ Nature Machine Intelligence/TPAMI/CVPR/ICCV, etc. He served as the Senior Area Chair of NeurIPS2022, Associate Editor of IROS 2021/2022, Area Chair of CVPR 2020/ ICCV2021/ECCV2022, Senior Program Committee Members of AAAI 2020/2021, and reviewer for the journal Science. In 2016, he was selected as the National "Oversea Youth Talent". In 2018, he was selected as 35 Innovators Under 35 (MIT TR35) by MIT Technology Review. In 2019, he was awarded Qiu Shi Outstanding Young Scholar. In 2020, he was awarded the Special Prize of Shanghai Science and Technology Progress Award (ranked third).

 Feng Xu
Tsinghua University, China
Title: Interaction Motion Reconstruction Based on Deep Learning
Abstract: Human motion reconstruction is a hot topic in computer vision and graphics and is very useful in movies, games, VR/AR, and other applications. Interaction motion is one important kind of motion as humans always interact with the environment in their daily lives, but is also very challenging to be reconstructed due to the severe occlusions between humans and the interacted objects. In this talk, we will introduce our methods for reconstructing interaction motions. Physics and motion priors are used to better solve the ambiguities in this topic.
Biography: Feng Xu is an associate professor in school of software, Tsinghua University. He has authored top conference and journal papers in computer vision, graphics, and interdisciplinary science, including the Lancet Digital Health, Cell Patterns, Physical Review Letters, Siggraph, Siggraph Asia, ICCV, CVPR, IEEE VR, TOG, TVCG, and TIP. He has served as TPC member for Siggraph, Siggraph Asia, SCA, and Pacific Graphics, and reviewers for Science Advance, TOG, TPAMI, TIP, CVPR, ICCV, and so on. His research interests include performance capture, 3D reconstruction, virtual reality, and AI for medicine.

 Minhyuk Sung
KAIST, Korea
Title: Language-Driven Shape Analysis and Manipulation
Abstract: Research connecting images and natural language is recently receiving huge attention thanks to the emergence of large vision-language models, while research relating 3D shapes and natural language has been much less explored. In this talk, I will present our recent work on analyzing and manipulating 3D shapes using natural language. I will first introduce our method of segmenting 3D shapes into parts using language descriptions. 3D annotation is a much more laborious and time-consuming task than 2D annotation, and its cost has been a bottleneck in creating a large-scale dataset and improving segmentation accuracy. I will describe how 3D segmentation can be achieved only with weak natural-language-based supervision and an attention module in a neural network. Second, I will introduce our method of utilizing the CLIP pretrained model for language-guided shape editing. A language command for 3D shape editing describes a "change" of the input shape, while the CLIP embedding mapping a text to a point cannot encode the meaning of change in the text. I will present our approach of finetuning the CLIP model while mapping texts to regions in the embedding space so that a shape can be changed (deformed) properly based on a language command. I will conclude my talk with potential research directions about language and 3D.
Biography: Minhyuk Sung is an assistant professor in the School of Computing at KAIST, affiliated with the Graduate School of AI and the Graduate School of Metaverse. Before joining KAIST, he was a Research Scientist at Adobe Research. He received his Ph.D. from Stanford University under the supervision of Professor Leonidas J. Guibas. His research interests lie in vision, graphics, and machine learning, with a focus on 3D geometric data processing. His academic services include serving as a program committee member in Eurographics 2022, SIGGRAPH Asia 2022, and AAAI 2023.

8 Oral Presentations

Time Paper Title
15:15-15:23 Dual-Neighborhood Deep Fusion Network for Point Cloud Analysis
15:23-15:31 FoldingNet-based Geometry Compression of Point Cloud with Multi Descriptions
15:31-15:39 Multi-attribute Joint Point Cloud Super-Resolution with Adversarial Feature Graph Networks
15:39-15:47 Pyramid-Context Guided Feature Fusion for RGB-D Semantic Segmentation
15:47-15:55 Local to Global Transformer for Video based 3D Human Pose Estimation
15:55-16:03 Unsupervised Severely Deformed Mesh Reconstruction (DMR) from a Single-View Image for Longline Fishing
16:03-16:11 3DSTNet: Neural 3D Shape Style Transfer
16:11-16:19 3D-DSPNet: Product Disassembly Sequence Planning


Wu Liu
Explore Academy of JD.com, China
Hao Su
University of California San Diego, USA
Yang Cong
Shenyang Institute of Automation of CAS, China
Tao Mei
Explore Academy of JD.com, China

Committee Chairs

Xinchen Liu
Explore Academy of JD.com, China
Kun Liu
Explore Academy of JD.com, China
Cheng Zhang
Ohio State University, USA

If you have any questions, feel free to contact < liukun167@jd.com >