Overview

The ubiquitous collection and availability of big data offers tremendous opportunities for the financial sector. It allows highly personalized assessments of risks and prospects, which companies large and small can then act upon to generate substantial profits.

However, raw financial data are typically rather noisy, with low veracity.  However, to overlook or misinterpret even small nuances and trends in these data can be extremely costly. Designing sensitive predictive metrics from these data also requires substantial engineering, which ups the game even further.

The goal of financial analytics is predictive and better even, prescriptive – recommendations on actions to mitigate risk and maximize profits. Both must be grounded in exquisite, ideally superior, knowledge of the domain at hand. Formalizing this knowledge from data is the job of descriptive analytics.

A key to reliable descriptive data modeling is to identifying sub-populations in the data that share certain common features, and doing so in a statistically robust manner. No sane financial strategy should be based on learning predictive models from outlier data points. The challenge is to find these stable data regions. 

One problem in this mission is the noisy nature of the data; another is the massive number of data attributes and the predictive metrics derived from them. Both lead to vast and unwieldy feature spaces where data are rarely homogeneous and subpopulations are difficult to separate.  

Furthermore, many of the features and attributes may not be important for a particular task but this is difficult to assess beforehand. Data analysts are often confronted with the problem of selecting the right features for a specific analysis problem. Unable to cope, feature selection can be reduced to a guessing game. 

There are many FinTech application areas and many other business sectors with these critical challenges:     

  • Stock market prediction: predict stock prices from past data patterns and current observations   

  • Fraud detection and prevention: identify malicious entities that try to hide, but still have common patterns  

  • Credit scoring: assess credit worthiness by estimating default risk based on past data

  • Risk management: use past internal and external data to estimate risk and prevent future losses

  • Personalized marketing: estimate a person’s subpopulation to suggest purchases or investments 

Typical Challenges:

In any of these applications, as well as others, knowing a data point’s membership in a concrete and statistically robust subpopulation is critically important. Determining an individual’s subpopulation membership with high confidence drastically reduces risk and uncertainty in the decision’s outcome. 


Moreover, many times it is not only important to know whether a particular data item will yield a certain outcome in a given response variable. Rather, it can be important whether it allows certain predictions with regard to the variable’s growth or decline.

For example: 

  • Will a certain stock rise or fall and what are the attributes and their ranges when this occurs?

  • What are suspicious data signatures of possible fraud to inform on-site situational awareness? 

  • Will a loan applicant default and what are the specific attributes/values I should look out for?

  • What are specific customer profiles – attributes and value ranges – to use in targeted marketing?

 

These kinds of predictions are commonly assessed by statistically correlating the target variable with one or more attributes. However, correlation can be sporadic and confined to a certain sub-population only. It may even vanish and remain undetected when aggregated over loosely defined sub-populations.

Identifying robust and well defined sub-populations in large data-sets is hence mission-critical. However, the aforementioned “data fog” in which these sub-populations are typically situated poses tremendous challenges. In other words, large data is a blessing and a curse at the same time.

 

Deep neural networks, random forests, etc. circumvent this problem in an elegant manner but this comes at the price of low explainability and subsequent poorly informed decision making. This uncertainty is a fateful adversary in the high stakes effort of financial modeling.

 Contextual Pattern Visualization

 Salient Pattern Mining Output

How Explainable-AI can help

 Correlation Heat Map

Probability Bar Chart

Akai Kaeru’s advanced pattern mining and discovery engine operates at the root of the problem. It directly decomposes the high-dimensional input data into a reliable set of independent data patterns -- the subpopulations -- using a sophisticated yet efficient suite of proprietary algorithms. 


Each pattern consists of data items that behave similarly in terms of a given target variable and which are succinctly defined by just a small set of attributes. The succinct description of each pattern makes them easy to understand and appeals to the user’s domain knowledge – a hallmark of explainable AI. 


Our pattern mining engine delivers its products to a novel fully interactive visual interface. It alerts the analyst to a set of concrete and concise regions with unusually high or low target variable values and correlations. The patterns are explained by visualizing them in the context of their relevant attributes. 


Insights like news or sentiments from textual data such as twitter feeds, Google finance, and so on often form an important ingredient in Fintech analysis. They can be converted into sets of numerical attributes that can be integrated into the high-dimensional feature space for analysis within our software. 


Also part of our software is a causal inference engine which can establish true causal relations among the patterns. It provides directional relations such as “smoking causes cancer, but not vice versa”. Our visualization interface conveys the intricate web of causal relationships directly on the pattern layout.  


Via these capabilities, Akai Kaeru’s software can help analysts and decision makers turn troves of data into actionable insight they can trust, justify and explain to others. There are no magic black boxes   

Akai Kaeru, LLC creates AI-powered software that helps data scientists solve stubborn problems in real-world applications. Leveraging extensive research expertise in interactive high-dimensional data visualization, its two co-founders transformed acclaimed research products into the practical Salient Pattern Miner, Visual Causal Analyst, and Data Context Map that are at the heart of the Explainable AI data analytics software suite.

 

Zeblok and Akai Kearu have come together to offer the Explainable-AI algorithm as part of Zeblok’s ecosystem.

 

More information on Akai Kearu at https://akaikaeru.com/

LINKS
ABOUT

Email: zeblok@zeblok.com

Tel: +1 (631) 223-8233

HQ Office:

1500 Stony Brook Road

Stony Brook, NY 11784

Office:

51 JFK Parkway

First Floor West

Short Hills, NJ 07078

    © 2020 Zeblok Computational, Inc.