Publications

Publication thumbnail

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

arXiv

EMNLP2025 Main

See tldr

TLDR; We present MAVL, a multimodal benchmark for singable lyrics translation, and SylAVL-CoT, a model using audio-video cues and syllable constraints for natural, accurate results.

Publication thumbnail

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues

arXiv

ACL2025 Main

See tldr

TLDR; We introduce VENUS, a large-scale video dataset for generating and understanding nonverbal expressions, along with MARS, a model designed to leverage it.

Publication thumbnail

Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

arXiv

MICCAI 2025

See tldr

TLDR; We introduce ScalpVision, an AI system for comprehensive scalp disease and alopecia diagnosis that uses innovative hair segmentation and DiffuseIT-M, a generative model for dataset augmentation, to improve severity assessment and prediction accuracy.

Publication thumbnail

SlumpGuard: An AI-Powered Real-Time System for Automated Concrete Slump Prediction via Video Analysis

arXiv

Under Review

See tldr

TLDR; We introduce SlumpGuard, an AI-powered, real-time video analysis system for fully automated concrete slump prediction at construction sites, supported by a large-scale dataset of over 6,000 real-world videos of concrete discharge.

Publication thumbnail

Preprocessing for Keypoint based Sign Language Translation without Glosses

arXiv

Sensors (IF: 3.847)

See tldr

TLDR; We introduce the effective preprocessing pipeline for sign language translation without glosses, combining skeleton-based motion features, keypoint normalization, and stochastic frame selection to enhance model performance.

Publication thumbnail

A 2-Stage Model for Vehicle Class and Orientation Detection with Photo-Realistic Image Generation

arXiv

IEEE BigData 2022

See tldr

TLDR; We introduce a two-stage vehicle class and orientation detection model using synthetic-to-real image translation and meta-table fusion to improve real-world prediction accuracy.

Publication thumbnail

A Study of Tram-Pedestrian Collision Prediction Method Using YOLOv5 and Motion Vector

KCI Article

Korea Information Processing Society (KIPS)

See tldr

TLDR; (Korean) We introduce a real-time tram collision prediction system that combines fast object detection with YOLOv5 and a modified local dense optical flow to estimate object speed and predict collision time and probability using a single camera image.

Publication thumbnail

Pedestrian Accident Prevention Model Using Deep Learning and Optical Flow

KCI Article

Korea Computer Congress 2021 (🥇Best Paper Award)

See tldr

TLDR; (Korean) We introduce a real-time pedestrian collision prediction system that uses YOLOv5 for fast object detection and a Local Dense Optical Flow method to quickly estimate pedestrian direction and speed, enabling accurate prediction of collision time and location.

Publication thumbnail

Optical Flow Estimation Techniques and Recent Research Trends Survey

KCI Article

Korea Information Processing Society (KIPS) Special Session

See tldr

TLDR; (Korean) We survey recent advances in optical flow estimation, comparing traditional and deep learning-based methods, and highlight their applications in autonomous driving, medical imaging, and surveillance systems.