“We are exploring to see if an AI-Analyst could identify novel endpoints using natural history data to shrink study duration”

Head of Clinical Development, a Publicly Trading Pharma


Zeblok conducted an analysis, on behalf of and in collaboration with PTC Therapeutics, to aid in study design with the goal of efficiently developing safe and effective therapies to treat Friedreich's Ataxia or spinocerebellar degeneration, a rare genetic disease that causes difficulty walking, a loss of sensation in the arms and legs, and impaired speech. The goal was to use xxx TB of data from Friedreich's Ataxia Clinical Outcome Measure Study (FA-COMS) natural history data to identify novel endpoints, biomarkers, and baseline characteristics that may optimize clinical study design. By utilizing machine learning and artificial intelligence-based algorithms, PTC was more efficiently able to identify correlations/inter dependencies between a large number of variables in the data set. This approach helped uncover interesting correlations to the onset and progression of the disease. Target attributes such as genetic data, baseline characteristics, and the relationship of cardiac decline to neurological symptoms were of prime importance. The study focused on natural history data, rather than placebo and treatment arm data.


Drug trials are a very time consuming and expensive process. For a disease that progresses over many years or decades, like Friedreich's Ataxia, the success of a drug trial depends in large part on how quickly a subject progresses. If the subjects in both the treatment and control group do not progress very much over a 5 year drug trial, then the efficacy of the drug cannot be determined and another 5 year study has to be performed. Conversely, any effect a drug has will be much more substantial in subjects whose disease has a much faster progression rate. Therefore, there is a tremendous benefit in identifying subjects that are likely to progress more quickly. The challenge is to use explainable AI to identify subjects that will have a faster disease progression rate based on their natural history data.

Data Cleaning

About the POC

We obtained anonymized patient data from the Center for Policy Analysis on Trade and Health (CPATH). The analysis was focused on the FA-COMS dataset. This study contains 1,050 patients with yearly follow-ups over a thirteen year period. The target variable of interest is the 1-year and 2-year change in MFARS (modified Friedreich's Ataxia Scale, which aggregates a number of tests and ranges from 0, i.e. no FA, to 93, i.e. advanced stage of FA). On average, someone with FA will typically progress at a rate of 2 points per year. Our goal was to identify indicators that identify subjects that progress at a significantly faster rate than 2 points per year. Our approach was to:

  1. Use the pattern mining module to identify subpopulations with an unusually high change in MFARS.

  2. Use the causal analysis module to remove spurious correlations.

  3. Identify subpopulations in which there is a high correlation between the change in MFARS and some independent variable.

Univariate Analysis - Heatmap

Zeblok Advantage

Zeblok has developed a bioinformatics cloud platform for the development of new digital biomarkers, AI-based predictive capabilities for identification of novel endpoints, and real-world data collection for creating new digital health insights. Zeblok works with business partners and academic researchers to develop and deliver best in class tools using Artificial Intelligence. Zeblok intends to provide such software to data scientists to improve efficiency during the drug discovery process. While there is a wide range of machine learning methods that can help with prediction tasks, they are primarily black boxes, lacking accountability and trustworthiness. Some examples of these black boxes are decision trees, random forests, neural networks, and deep learning. Our approach explains conditions that yield lower or higher values in any target variables and defines causal relationships among different salient data regions. This approach is an emerging field in machine learning, called Explainable AI or XAI. We use techniques of dimensionality reduction through subspace clustering of relevant attribute sets and causality prediction using XAI.


Zeblok in collaboration with Akai Kaeru, a leader in visual analytics for complex data, provided the study design team at PTC Therapeutics with a Zeblok AI Workstation (a variant of Jupyter Notebook) fto  interact with the FA-COMs data.

Correlation Mining


The analysis resulted in the identification of several subpopulations that have a higher rate of progression. One such subpopulation was defined by subjects who have not progressed very far but whose time for the 25 foot walk test was high. These subjects were more likely to progress at a higher rate. This is a criterion that can now be used in the selection of subjects for an upcoming drug trial.

Sample Correlation Table

Akai Kaeru, LLC creates AI-powered software that helps data scientists solve stubborn problems in real-world applications. Leveraging extensive research expertise in interactive high-dimensional data visualization, its two co-founders transformed acclaimed research products into the practical Salient Pattern Miner, Visual Causal Analyst, and Data Context Map that are at the heart of the Explainable AI data analytics software suite.


Zeblok and Akai Kearu have come together to offer the Explainable-AI algorithm as part of Zeblok’s ecosystem.


More information on Akai Kearu at https://akaikaeru.com/

Zeblok logo original-01 (1).jpg

Email: zeblok@zeblok.com

Tel: +1 (631) 223-8233

HQ Office:

1500 Stony Brook Road

Stony Brook, NY 11794


51 JFK Parkway

First Floor West

Short Hills, NJ 07078

    © 2020 Zeblok Computational, Inc.