Data Challenge 1: Predict Key Attributes from Product Images

What's the Challenge?

Have you ever encountered a product listing on an e-commerce platform where the image showed a short-sleeve shirt, but the description claimed it was long-sleeved? Such discrepancies are not just frustrating for customers—they're a significant challenge for e-commerce platforms striving to maintain accurate product catalogs at scale.

Meesho is sponsoring the data challenge that addresses this critical issue in the e-commerce industry. Participants will develop models to automatically predict key product attributes from images, revolutionizing how products are cataloged and listed online.

Task: Develop a robust machine learning model that can accurately predict various product attributes (such as color, pattern, and sleeve length) solely from product images uploaded by suppliers.

Why It Matters:

  • Efficiency: Reduce the time-consuming and error-prone process of manual attribute entry by suppliers.
  • Accuracy: Minimize discrepancies between product images and descriptions.
  • Cost-effectiveness: Decrease reliance on human agents for verification, cutting operational costs.
  • User Experience: Ensure customers receive accurate product information, enhancing trust and satisfaction.

Dataset Information

At Meesho, millions of products are listed by suppliers where their details are added by suppliers and later verified and corrected by our agents before final listing of products. We are taking this listed data from the Meesho platform. For the purpose of this competition, the scope is limited to only 5 categories: Sarees, Women Kurties, Men Tshirts, Women Tshirts, Women Tops and Tunics. In each category, we have 10k products for training and 3k for testing, there would also be a hidden test set of 3k products.

Dataset would look like this.
  • Product ID: Unique identifier for each product.
  • Product Image: URLs of images which are publicly available online / get from the image folder using productid.jpg
  • Category: category
  • Attribute Keys: The attribute name specified for the product (i.e. sleeve length, neck type)
  • Attribute Values: The attribute values specified for the product for the corresponding attribute key (i.e. for attribute type sleeve length: full sleeve, short sleeve)

Evaluation Criteria

Model Assessment: We will evaluate the attribute classification model using a product dataset encompassing the same categories as the training data. The attribute identification task is structured as a multi-class classification problem at the attribute level. It's important to note that in our framework, each attribute can only be assigned a single value, making this a standard multi-class problem rather than a multi-label one.

Metrics: To ensure a fair and comprehensive evaluation, we will employ both micro and macro F1-scores at the attribute level.

Scoring Process:
  • Attribute-Level Evaluation:
    • Calculate micro and macro F1-scores for each individual attribute.
  • Category-Level Aggregation:
    • For each product category, we'll compute an unweighted average of all attribute F1-scores within that category.
  • Final Score Calculation:
    • The overall model performance will be determined by taking the unweighted average of the category-level F1-scores.

This multi-tiered evaluation approach allows us to assess model performance at various levels of granularity, from individual attributes to overall accuracy across all product categories. It provides a balanced view of the model's effectiveness in handling diverse product attributes and categories.

Timelines

  • User registration opens on 23rd Sept 2024.
  • The competition officially launches on September 26, 2024.
  • Last date to submit is November 7, 2024 for the leaderboard.
  • Participants have until November 10, 2024, to submit their final code and reports.
  • Winners will be declared by Nov 20, 2024
Final Submission Requirements:
  • Code: The complete codebase used for the final submission.
  • Instructions: A PDF document detailing how to run the code and approach used to solve the problem.

Please ensure all materials are submitted by the specified deadlines. Late submissions will not be accepted.

Competition Rules & Regulations

Submissions must meet the following criteria to be considered valid. Please ensure your solution complies with these guidelines to avoid disqualification.

  • API Restrictions:
    • Use of proprietary APIs is strictly prohibited.
  • Performance Constraint:
    • Model inference time must not exceed 500 milliseconds per image.
  • Operational Requirements:
    • Ensure your solution functions entirely offline.
    • Include all necessary resources and models in your final code submission.

Dataset License: ATTRIBUTION-NONCOMMERCIAL-NODERIVATIVES 4.0 INTERNATIONAL

Registration/Website Link

Please register for this data challenge using the below link and we will send out the dataset and submission link.
Registration form: https://forms.gle/XQvU9uWfUP7EcLAw5
Website link: https://www.meesho.io/ai/data-challenge

Competition Sponsors

The competition is sponsored by Meesho

  • 1st Prize : 2,00,000 INR
  • 2nd Prize : 1,50,000 INR
  • 3rd Prize : 50,000 INR