AI Driven Predictive Insights using Sentiment Analyses

Problem Statement


Images per class
There are total 8 dance genre in the given data-set. Let's plot the count of the number of image samples per genre


Let's plot some sample training images

Project Statement

Thomson Reuters MarketPsych Indices (TRMI) analyze news and social media in real-time to convert the volume and variety of professional news and the internet into manageable information flows that drive sharper decisions. The indices are delivered as real-time data series that can easily be incorporated into your investment and trading decision processes – quantitative or qualitative. Goal was to improve the signal-to-noise ratio and pick-up predictive signals for decision making based on the three types of indicators provided in TRMI. 

Data used

Zeblok team procured TRMI data from Refinitiv and combined with pricing data in a unique project that aimed to generally understand the impact of market sentiment on returns. Application of this project include various business processes in the buy side asset re-allocation or portfolio rebalancing, algorithmic trading desks, risk management and quantitative tolerance in Net Asset Value calculations. Goal was to improve the signal-to-noise ratio and pick-up predictive signals for decision making based on the three types of indicators provided in TRMI:
• Emotional indicators such as Anger, Fear and Joy 
• Macroeconomic metrics including Earnings Forecast, Interest Rate Forecast, Long vs. Short 
• Buzz metrics on the asset level, i.e., Buzz, and on market-moving topics for that asset, such as Litigation, Regulatory Crackdown, Mergers and Volatility


Usually such an analysis is done by a data science expert/analyst who combines the multiple datasets with pre-existing domain knowledge to complete a univariate and multi-variate analysis to understand correlation, leading to predictive signals. Very quickly the analysis becomes a combinatorial and a time-consuming challenge. Hence the domain knowledge plays a key role in identifying attributes to combine to arrive at conclusions expediently. Additionally, quantitative basis is a key requirement to such analysis due to regulatory regime and hence explainability is mandatory. Deep learning solutions are hard to deploy where explainability is important. 

Zeblok team applied explainable-AI based software, using it’s “AI-Rover” workstation, to TRMI and pricing data, using a single day price movement. The explainable-AI algorithm creates a data context map that spontaneously identify patterns. Additionally, correlation pattern miner discovers multiple attributes that influence an outcome positively or negatively as a sub population. Both these are done without requiring domain knowledge. 


Most patterns include both analystRating and earningsForecast attributes. This is conveyed by these to attributes being in the center. The patterns all convey a signal in which the analyst rating contradicts what the other indicators suggest. One such pattern is shown below.

Pattern based on Low earningsForecast, a High analystRating, and Low optimism

analystRating vs. Returns (stratified on Low earningsForecast)

Optimism vs Returns (stratified on Low earningsForecast and High analystRating)

Conculsion #1: Stock indices that fall within this pattern have a higher next day return than those that fall outside the pattern 67% of the time.

Correlation Mining

earningsForecast vs. Returns (Univariate).

Consistent with a domain expert’s understanding, the AI-Rover found that there is no univariate correlation between earningsForecast and returns. There are other attributes that influence returns. 

earningsForecast v.s. Returns (Subpopulation).

Our AI-Rover segmented the population automatically picking just a few attributes out of 38 to find statistically significant correlations to returns

Volatility vs. returns (Univariate)

Volatility vs. Returns (Subpopulation)

Conclusion #3: Volatility does not have a significant univariate correlation with returns. If we segment on High managementTrust, High laborDispute, and High violence, then volatility becomes significantly correlated with returns

Conclusion #2: If there is a subpopulation, on High laborDispute, High analystRating and Low loveHate, then earningsForecast becomes significantly correlated with returns.

Zeblok logo original-01 (1).jpg


Tel: +1 (631) 223-8233

HQ Office:

1500 Stony Brook Road

Stony Brook, NY 11794


51 JFK Parkway

First Floor West

Short Hills, NJ 07078

    © 2020 Zeblok Computational, Inc.