Fashion Landmark Detection in the Wild

Ziwei Liu*     Sijie Yan*     Ping Luo     Xiaogang Wang     Xiaoou Tang
European Conference on Computer Vision (ECCV) 2016


Visual fashion analysis has attracted many attentions in the recent years. Previous work represented clothing regions by either bounding boxes or human joints. This work presents fashion landmark detection or fashion alignment, which is to predict the positions of functional key points defined on the fashion items, such as the corners of neckline, hemline, and cuff. To encourage future studies, we introduce a fashion landmark dataset with over 120K images, where each image is labeled with eight landmarks. With this dataset, we study fashion alignment by cascading multiple convolutional neural networks in three stages. These stages gradually improve the accuracies of landmark predictions. Extensive experiments demonstrate the effectiveness of the proposed method, as well as its generalization ability to pose estimation. Fashion landmark is also compared to clothing bounding boxes and human joints in two applications, fashion attribute prediction and clothes retrieval, showing that fashion landmark is a more discriminative representation to understand fashion images.



Code and Models


Fashion Landmark Detection Benchmark evaluates the performance of fashion landmark detection. This is a large subset of DeepFashion, with diverse and large pose/zoom-in variations. It contains

  • 123,016 number of clothes images;

  • 8 fashion landmarks (both location and visibility) for each image;

  • Each image is also annotated by bounding box, clothing type and variation type.

Fashion Landmark Detection Benchmark


 author = {Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, and Xiaoou Tang},
 title = {Fashion Landmark Detection in the Wild},
 booktitle = {European Conference on Computer Vision (ECCV)},
 month = October,
 year = {2016}