Label-free Classification of MCF7 and HeLa Cells using High-content Imaging and Deep Learning

Shagun Sharma; Kalpna Guleria; Ayush Dogra; Monika Sharma; Satyam Kumar Agrawal

Sharma S, Guleria K, Dogra A, Sharma M, Agrawal S. K. Label-free Classification of MCF7 and HeLa Cells using High-content Imaging and Deep Learning. Biomed Pharmacol J 2026;19(1).

Manuscript received on :02-06-2025
Manuscript accepted on :13-01-2026
Published online on: 26-02-2026

Plagiarism Check: Yes
Reviewed by: Dr. Ananya Naha
Second Review by: Dr. Tolmas Hamroyev
Final Approval by: Dr. Anton R Keslav

How to Cite | Publication History

Views:

Visited 139 times, 3 visit(s) today

PDF Downloads:

131

Label-free Classification of MCF7 and HeLa Cells using High-content Imaging and Deep Learning

Shagun Sharma^1,2, Kalpna Guleria^2*, Ayush Dogra³, Monika Sharma⁴and Satyam Kumar Agrawal⁴

¹School of Computing Science and Engineering, VIT Bhopal University, Sehore, India

²Department of Computer Science and Engineering, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India

³Department of Electronics and Communication Engineering, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India

⁴Centre for in Vitro Studies and Translational Research, Chitkara College of Pharmacy, Chitkara University, Rajpura, Punjab, India

Corresponding Author: guleria.kalpna@gmail.com

DOI : https://dx.doi.org//10.13005/bpj/3348

Abstract

The manual cell classification methods are highly labour-intensive and time-consuming due to the complex and multidimensional structure of the biological data. The deep learning models are capable of extracting relevant features from the images, processing diverse data types, and handling the nonlinear relationships in the dataset. In the proposed work, two cell lines, MCF7 and HeLa, were cultured in standard conditions and images were captured using a phase contrast microscope. These captured images were grouped on the basis of their confluency and substituted into the deep learning-based Xception, InceptionV3, and MobileNetV2 models for training and multiclass classification. The dataset was augmented with multiple transformations, including rotation, shift, shear, and flipping, which produced 4,375 cancer cell line images. These models were implemented with both the original and augmented datasets, resulting in MobileNetV2 being the most optimal model, producing 94.3% accuracy with the augmented cell lines and 81.5% accuracy with the original cell lines dataset. Further, this model was also found to be the most efficient and lightweight deep learning architecture that provides faster communications in embedded and mobile vision applications. The findings of this work pave the way to further extend the proposed model for the real-time classification and identification of cancer cell lines in in vitro labs.

Keywords

Cell Classification; Deep Learning; HeLa; High-content Imaging; MCF7

Download this article as:

Copy the following to cite this article:

Sharma S, Guleria K, Dogra A, Sharma M, Agrawal S. K. Label-free Classification of MCF7 and HeLa Cells using High-content Imaging and Deep Learning. Biomed Pharmacol J 2026;19(1).

Copy the following to cite this URL:

Sharma S, Guleria K, Dogra A, Sharma M, Agrawal S. K. Label-free Classification of MCF7 and HeLa Cells using High-content Imaging and Deep Learning. Biomed Pharmacol J 2026;19(1). Available from: https://bit.ly/4cbJFUN

Introduction

Researchers have widely employed cancer cell lines to study anticancer activity and evaluate potential leads, that include natural or synthetic molecules.¹ The effectiveness of these cell lines hinges on their accurate classification, given the diverse availability of cell lines from several authentic and non-authentic sources. While the primary method of categorizing these cancer cell lines relies on the source tissue or histopathology of the original tumor, further subtyping often necessitates extensive molecular and genetic analysis.² The emergence of cutting-edge high-throughput sequencing technologies has facilitated comprehensive molecular profiling of these cancer cell lines. This breakthrough has paved the way for more sophisticated classification systems that better capture the diversity of cancer cells.³ Nevertheless, there are still obstacles in consolidating and deciphering the extensive molecular data generated, necessitating the creation of advanced bioinformatics tools and algorithms for precise cell line categorization and identification. These refined classification systems have the potential to enhance the selection of suitable cell lines for specific research objectives and drug screening processes.⁴ By improving the alignment between cell lines and the molecular characteristics of patient tumors, scientists can boost the translational significance of preclinical investigations.⁵ Furthermore, enhanced classification methodologies may uncover previously unidentified subtypes or molecular features of cancer cell lines, potentially leading to new opportunities in targeted therapy development. For many of these cases, it is crucial to assess whether cell lines can be characterized phenotypically using microscopy images without specific staining, i.e. label-free imaging.⁶ Traditional methods for image-based cell phenotyping often depend on manual feature extraction, which involves identifying specific visual characteristics from each cell, such as the dimensions, form, membrane continuity, inter-cellular processes, and structural details of the nucleus and cytoplasm.⁷ Classification of cells using machine-learning techniques like support vector machines or neural networks relies on manually designed feature extraction methods .⁸ This approach has limitations in identifying specific phenotypes due to the restricted complexity of features, which hinders the ability to differentiate between similar phenotypes. Moreover, the feature extraction process from microscopy images is highly variable and depends on numerous manually adjusted parameters. In culture systems, all cells grow independently, having the same genetic content but definitely dis-similar appearance. The cells, however show certain characteristics that are similar or more pronounced in a cell population, which makes it unique and can be classified or characterized only by advanced machine learning techniques.^9,10 To overcome these challenges, researchers have employed machine learning methods to directly phenotype cells from microscopy images. However, previous studies have primarily focused on broad cell categories with easily distinguishable morphologies, such as lymphocytes, granulocytes, and erythrocytes. For cells with more subtle morphological differences, phenotyping has largely been limited to binary classification to detect specific disease-related alterations.¹¹ An alternative approach involves using brightfield microscopy images to predict the location of immune stains on subcellular structures for organelle identification. Nevertheless, additional research is needed to interpret these stains and establish their relationship to cell phenotypes. One significant obstacle in the development of machine learning algorithms for cell classification from unstained microscopy images is the process of dividing larger microscopy fields into individual cell images. This is particularly problematic for adherent cells, which grow in close proximity to each other, making it difficult to discern their boundaries. This can only be overcome when growing cells in culture flasks with fewer cells, which grow separately and at a distance. While this segmentation issue can also be simplified by using enzymes to separate the cells before imaging, it remains uncertain whether the phenotypic characteristics of the cells are maintained after this disaggregation process. The preservation of these traits is crucial for accurate classification and analysis.

The proposed work contributes to the development of a lightweight artificial intelligence-based model to efficiently classify MCF7 and HeLa cells, which enhances the automated analysis of medical images. The dataset has been created in the Centre for In Vitro Studies and Translational Research laboratory, where the MCF7 and HeLa cell cultures were used, and the microscope-based image acquisition was performed. Further, the created dataset has been divided into eight distinct categories, namely, MCF7_SCA, MCF7_LC, MCF7_SC, MCF7_FC, HeLa_SCA, HeLa_LC, HeLa_SC, and HeLa_FC. These images were used for the implementation of InceptionV3, Xception, and lightweight MobileNetV2 architectures to ensure computational efficiency while maintaining high classification accuracy.¹² This AI-based classification model is suitable for real-time and resource-constrained applications in biomedical research and diagnostics.

This article has been divided into different sections consisting of section 2 elaborating on the literature review on the cancer cell classification techniques. Section 3 discusses the dataset creation and model implementation phases. Section 4 provides the results of the proposed model, and section 5 concludes the proposed work.

Literature review on the cancer cell classification techniques

The authors developed an optical biosensor model for cancerous and healthy cells classification.¹³ For the detection of these cells, the label-free setup has been utilized, and no external labels, markers, or tags are used to identify cells. The implementation of the multi-layer perceptron (MLP), convolutional neural network (CNN), fully convolutional network (FCN), ResNet, Inception, and Multi-channel deep CNN (MCDCNN) has been done for classifying the cell lines, namely, HeLa, MDA-MB-231, LCLC103H, MCF-7, which identified that each model achieves an accuracy between 70-80%. The authors developed a hybrid model for classifying HeLa cells, which had three classes, namely C1, C2, and C3, corresponding to background, unclear, and sharp cells.¹⁴ These three models, namely Inception-U-Net, ResNet34-U-Net, and VGG19-U-Net techniques, were provided with the 650 cell lines images. In this hybrid technique, the U-Net has been used for segmentation, where the change has been performed on the encoder part by replacing it with VGG19, ResNet34, and Inception models. The model was trained for 200 epochs, where the learning rate and batch sizes were set to 0.001 and 8, The authors developed an AI-based Cellpose technique for HeLa-FUCCI cell classification. respectively.¹⁵ The dataset contains 37,000 images having red/green and contrasted images of cell lines. The segmentation is performed with UNet architecture, for extracting saturated cells, contaminated cells, and edges from the images. The authors developed a cell classification model by utilizing a deep neural networks (DNN) model.¹⁶ The dataset has 10,696 images of HAPI, HeLa, K562, and GM12878 classes. The finding is the model shows an accuracy of 96.9% and an AUC of 0.95. The authors utilized a random forest (RF) model for the prediction of 3D MCF7 microtissues.¹⁷ This model was developed using 955 samples, out of which 450 essential features were extracted, and regrouping was performed. The work focused more on image processing, where contrast enhancement was the first step, followed by binarization and morphological closing, and finally, the microtissues classification was performed. Table. 1 tabulates the summary of the existing cancer cell classification models.

Table 1: A detailed summary of the cancer cell classification techniques

Ref.	Model	Dataset	Accuracy	Future Scope
¹³	CNN, FCN, ResNet, Inception, MCDCNN	>4500 single cell images	70-80%	Though the authors have provided the dataset for multiple classes, the classification has been done for binary classes.
¹⁴	U-Net, Inception-U-Net, ResNet34-U-Net, and VGG19-U-Net	650 images	U-Net: 98.69%, Inception-U-Ne: 99.04%, ResNet34-U-Net: 99.09%, VGG19-U-Net: 98.65%	The authors implemented the model on a very large value of epochs and have utilized a small value of batch size, which may require high computational time to effectively train the model.
¹⁸	AI-based cellulose model	37,000 images	True positive rate: Cellpose 1.0: 1245/1275, Cellpose 2.0: 1255/1275	The authors have not provided information on the hyperparameter configuration to propose the cellpose 1.0 and cellpose 2.0 models
¹⁶	DNN-SCANN	10,696 images	96.9%	The authors have used a DNN-based SCANN model in the proposed work. However, the detailed information on the model has not been provided.
¹⁹	RF	955 samples	Not mentioned	The authors have not provided thorough details of the proposed model architecture; instead, more focus was placed on the image processing step.

Materials and Methods

This section discusses the dataset and the methodology used for the development of the proposed multiclass cancer cell line classification model.

Cell lines and culture

Two cell lines, MCF7, a human breast adenocarcinoma cell line, and HeLa, a human cervical adenocarcinoma cell line, were used in this study. The cells were gifted by Dr. Deepak Kumar at IMTECH, Chandigarh. Both the cell lines were cultured in MEM (E) with NEAA (HiMedia) supplemented with 10% Fetal Bovine Serum (HiMedia) and 1% antibiotic/antimycotic solution (HiMedia). They were separately seeded in a T-75 tissue culture flask (Corning) and maintained under standard culture conditions at 37°C with 5% CO₂ using a previously described protocol.²⁰–²² After seeding, the cells were allowed to attach, and images were taken at regular intervals.

Microscopy

Cell images were captured using an Olympus CKX53 inverted phase contrast microscope equipped with a 4000K colour temperature LED light source and MagCam DC 5MP camera. Once the cells were attached, the images were acquired for each cell line to create a primary dataset, taken at regular intervals, from a single cell attached (SCA) stage to different stages of confluency (fewer cells – LC, Sub-confluent – SC and Full confluent–FC). A magnification of 10 × was used, and each 24-bit image had dimensions of 2592 × 1944 pixels. At each stage, 50-100 images were clicked. Areas with floating cells and overcrowded cells, where cells overlapped, were excluded. Fig. 1 shows the image preparation phase.

Figure 1: Workflow of the image preparation

Hyperparameters	Values	Hyperparameters	Values
Batch Size	16	Epochs	50
Learning Rate	0.001	Optimizer	Adam
Activation Function	ReLU	Train-Test Split	80:20