STARC-9
Large-scale multi-class tissue classification dataset for colorectal cancer histopathology
STARC-9 is a large-scale dataset for multi-class tissue classification in colorectal cancer histopathology. The project focuses on creating morphologically diverse and class-balanced image tiles to improve the robustness, reproducibility, and generalization of tissue-type classification models in computational pathology.
The dataset was developed to support foundation-model-based histopathology analysis and improve downstream tissue map generation for colorectal cancer whole-slide images.
Role
- Lead / equal-contribution researcher
Key Contributions
- Curated a large-scale tile-level tissue classification dataset for colorectal cancer histopathology.
- Designed sample collection strategies to improve morphological diversity and class balance.
- Supported benchmarking and reproducible research for pathology foundation models.
- Built dataset preparation workflows for whole-slide image tiles and tissue-type classification.
Methods / Tools
- Whole-slide image analysis
- Tile-level tissue classification
- Dataset curation
- Class-balanced sampling
- Foundation-model-based histopathology
- Computational pathology
Related Publication
Barathi Subramanian, Rathinaraja Jeyaraj, Mitchell Nevin Peterson, Terry Guo, Nigam Shah, Curtis Langlotz, Andrew Y. Ng, Jeanne Shen, “STARC-9: A Large-scale Dataset for Multi-Class Tissue Classification for CRC Histopathology,” NeurIPS 2025 Datasets and Benchmarks Track.
*Equal contribution.