Skin Cancer Classification
Image classification is one of the oldest and widely popular problem category in the computer vision domain. Stack of convolutional blocks are used to make the neural network learn about different features in the image.
This project is for an image classification competition: Dance Classification
Problem Statement: Predict the dance genre in the given image.
Images per class
There are total 8 dance genre in the given data-set. Let's plot the count of the number of image samples per genre
We will randomly move five images per genre from the dataset folder to val folder to split the dataset into train and validation sets
Sample Images & Augmented Images
Skin cancer is the most prevalent type of cancer. Melanoma, specifically, is responsible for 75% of skin cancer deaths, despite being the least common skin cancer. The American Cancer Society estimates over 100,000 new melanoma cases will be diagnosed in 2020. It's also expected that almost 7,000 people will die from the disease. As with other cancers, early and accurate detection—potentially aided by data science—can make treatment more effective.
Currently, dermatologists evaluate every one of a patient's moles to identify outlier lesions or “ugly ducklings” that are most likely to be melanoma. Existing AI approaches have not adequately considered this clinical frame of reference. Dermatologists could enhance their diagnostic accuracy if detection algorithms take into account “contextual” images within the same patient to determine which images represent a melanoma. If successful, classifiers would be more accurate and could better support dermatological clinic work.
Melanoma is a deadly disease, but if caught early, most melanomas can be cured with minor surgery. Image analysis tools that automate the diagnosis of melanoma will improve dermatologists' diagnostic accuracy. Better detection of melanoma has the opportunity to positively impact millions of people.
This work was a part of “SIIM-ISIC Melanoma Classification Challenge” and the data used was The ISIC 2020 Challenge Dataset. The dataset contains 33,126 dermoscopic training images of unique benign and malignant skin lesions from over 2,000 patients. All malignant diagnoses have been confirmed via histopathology, and benign diagnoses have been confirmed using either expert agreement, longitudinal follow-up, or histopathology.
We used deep CNN classification models for melanoma classification. We experimented with different models like VGG19, DenseNet, ResNeXt and the family of EfficientNet. For train and val split, we used a simple 80/20 split based on the triple stratified K-Fold split.
Using a deeper CNN architecture gives better results. ResNeXt gave much better results than a simple VGG19 but having deeper architectures leads to model overfitting as well. EfficientNet family of models can come to the rescue as they balance depth along with model width. I was able to achieve a validation AUC ROC metric of 0.974 and a test score of 0.894 using efficient-net-b4.
Zeblok AI- Platform resources used:
Zeblok AI-Rover WorkStation
1 RTX6000 GPU
50GB Block Store
100GB Object Store