STARC-9

Large-scale multi-class tissue classification dataset for colorectal cancer histopathology

STARC-9 is a large-scale dataset for multi-class tissue classification in colorectal cancer histopathology. The project focuses on creating morphologically diverse and class-balanced image tiles to improve the robustness, reproducibility, and generalization of tissue-type classification models in computational pathology.

The dataset was developed to support foundation-model-based histopathology analysis and improve downstream tissue map generation for colorectal cancer whole-slide images.

STARC-9 overview figure
Overview of the STARC-9 dataset construction and multi-class tissue classification workflow.

Role

  • Lead / equal-contribution researcher

Key Contributions

  • Curated a large-scale tile-level tissue classification dataset for colorectal cancer histopathology.
  • Designed sample collection strategies to improve morphological diversity and class balance.
  • Supported benchmarking and reproducible research for pathology foundation models.
  • Built dataset preparation workflows for whole-slide image tiles and tissue-type classification.

Methods / Tools

  • Whole-slide image analysis
  • Tile-level tissue classification
  • Dataset curation
  • Class-balanced sampling
  • Foundation-model-based histopathology
  • Computational pathology

Barathi Subramanian, Rathinaraja Jeyaraj, Mitchell Nevin Peterson, Terry Guo, Nigam Shah, Curtis Langlotz, Andrew Y. Ng, Jeanne Shen, “STARC-9: A Large-scale Dataset for Multi-Class Tissue Classification for CRC Histopathology,” NeurIPS 2025 Datasets and Benchmarks Track.

*Equal contribution.