CNN-based Lung Segmentation
Algorithm – COVID-19

Background design1.png

Problem Statement


Images per class
There are total 8 dance genre in the given data-set. Let's plot the count of the number of image samples per genre


Let's plot some sample training images

Image classification training data.JPG

Project Statement

The COVID-19 virus, also known as SARS-CoV-2, has severely impacted people’s lives all over the world and continuous research effort is being carried out in improving methodologies related to the detection of COVID-19. While treatment of patients, COVID infected regions need to be identified and segmented. An example of COVID infected lung CT slice is shown in Figure 1.

To automate such a task an AI-powered segmentation model is trained using a large amount of unlabeled data available. In this work, we utilized 800 CT volumes of COVID infected patients which results in around 25k CT slices. Training of AI models on such large amounts of data requires parallel computing hardware such as GPUs.  Zeblok's platforms provide a seamless way to train such models on their platform. 

The result of this work improves upon the segmentation performance of previous models by around 2-4 dice percentage points and achieves state-of-the-art COVID segmentation results. 
This automation of such time-consuming segmentation tasks can reduce the workload of clinicians already working under tremendous pressure. 

Data Used

Pranjal fig1.png

Figure 1. An example of COVID-19 infected lung CT slice

For this work, we used the publicly available largest dataset of infected lung CT scans called MOSMEDDATA dataset. It comprises around 1100 CT volumes with 800 infected and 300 non-infected CT volumes. Using CNN based lung segmentation algorithm we first filter out the scans containing only lung region. This filtering process results in around 25K CT slices from COVID infected lung volumes. Using these volumes we train a COVID lesion segmentation model.


We propose a novel segmentation approach: using a convolutional neural network (CNN) derived from U-Net for this purpose and we follow a semi-supervised training strategy to train the model. We first train CNN using a small amount of labeled data. The trained CNN is then used to get the pseudo masks for the 25K CT slices. In the second step, we again train the proposed CNN from scratch using the pseudo masks of the 25K CT slices.
Training with such a large amount of data improves the segmentation performance of the model.


We observe that by using the large amount of unlabeled data the segmentation performance improves by about 2 to 4 dice points. 

By following the proposed training approach, our model could reach a segmentation dice score of 0.66 while the current state-of-the-art COVID lesion segmentation model could only obtain a dice score of 0.61. 

This is a significant improvement and shows the utility of our semi-supervised training strategy. A qualitative comparison of the output of the proposed method with U-Net is shown in Figure 2.

PRanjal fig2.png

Figure 2. Qualitative comparison of the proposed method with U-Net.

Zeblok AI- Platform resources used:


  • Multiple containers to support multi-GPU, multi-CPU compute engines

  • 2 RTX6000s GPUs

  • 4 vCPU

  • 70GB RAM

  • 50GB Block Store

  • 500GB of Parallel File System