G-SIB Risk Intelligence with Bank Of England

A detailed technical report from the Cambridge Data Science Career Accelerator Capstone Project.


📊Project Overview

This project analyses CEO and CFO language from Barclays, UBS, and Morgan Stanley to uncover early risk signals not captured in traditional financial ratios. Using NLP, sentiment analysis, topic modelling, and LLM summarisation, we identify shifts in tone and narrative that can indicate emerging vulnerabilities.
The work was completed as part of a 10-week capstone challenge for the Bank of England’s Prudential Regulation Authority (PRA).


📂Data Sources

  • Earnings Reports & Transcripts (2023–2025): Barclays, UBS, Morgan Stanley.
  • Financial Benchmarks: Return on Equity (ROE), Net Interest Margin (NIM), CET1 Capital Ratios.
  • Financial Statements: Extracted key metrics using regex automation.
  • Macroeconomic Indicators: Interest rates, GDP growth, regulatory signals.

💼Business Problem

Traditional risk frameworks rely heavily on quantitative indicators such as capital ratios and liquidity buffers. However, early signs of instability often emerge first in qualitative language used by CEOs and CFOs during earnings calls. Regulators miss these signals because they lack systematic NLP-driven tools.

The PRA challenged our team to create an NLP pipeline that extracts, measures, and compares these linguistic cues across major G-SIBs—helping supervisors identify risks sooner.


⚙️Approach & Methodology

  • Data Extraction: Regex-based capture of key financial metrics.
  • Data visualisation→ Matplotlib & Seaborn (visualisation)
  • Preprocessing: Text cleaning, chunking, speaker attribution, tokenisation.
  • Sentiment Analysis: FinBERT and VADER for financial tone & emotional signals.
  • Topic Modelling: BERTopic to uncover themes like integration risks, transparency gaps, global tariffs.
  • LLM Summarisation: FinLLaMA vs GPT-2 vs Phi-4 benchmarking.
  • Stress Testing: Combined financial + linguistic signals under shock scenarios.
  • Data Visualisation: Matplotlib, Seaborn, Chart.js dashboards.

The pipeline integrates quantitative and qualitative features to build a holistic early-warning system.


👩‍💻 My Role

  • Led BERTopic modelling and interpretation of hierarchical themes.
  • Linked sentiment scores with topic clusters to reveal hidden risk signals.
  • Designed sentiment-by-topic risk mapping.
  • Produced documentation, visual storytelling, and dashboards.
  • Communicated insights clearly for regulatory stakeholders.

📈 Key Findings

The analysis revealed contrasting sentiment profiles across the three banks.

BankSentiment ProfileRisk Insight
UBSHighest persistent negative sentimentPost-Credit Suisse integration issues and reputational risks
BarclaysVolatile sentiment spikes in Q2 2023 & 2024Requires targeted regulatory investigation
Morgan StanleyMost stable sentiment; 50% net income growthStrong performance, stabilising G-SIB

Stress Testing: All banks maintained CET1 ratios > 13%, confirming quantitative stability—but qualitative sentiment exposed risks not reflected in ratios.

Financial Metrics

Risk Metrics

🔍 Analytical Insights

  • NLP Sentiment Trends: UBS was the negative outlier; Barclays showed volatility; Morgan Stanley reflected confidence.
  • Topic Modelling: Key themes included integration risks, transparency concerns, and market pressure.
  • LLM Comparison: FinLLaMA produced more financially coherent summaries than GPT-2 & Phi-4.
  • Hybrid Risk View: Combining financial metrics, sentiment, and topics gives a more complete supervisory perspective.

✅Recommendations

  • Weekly sentiment and topic shift monitoring (focus: UBS).
  • Real-time automated alerts for sentiment thresholds.
  • Integrate Basel III compliance signals into NLP pipeline.
  • Fine-tune FinLLaMA using PRA filings for regulatory precision.

💡Business & Regulatory Impact

  • 95% reduction in manual review via automated NLP workflows.
  • Early-warning system enables detection before ratio shifts.
  • Supports targeted interventions (e.g., UBS monitoring, Barclays anomaly review).
  • Introduces domain-specific LLMs into stability monitoring.

🔧Future Work

  • Improve speaker identification for more granular sentiment.
  • Strengthen topic coherence through improved embeddings.
  • Validate early-warning indicators using historical backtesting.
  • Develop interpretability metrics for LLM-driven summaries.

🚀 Deliverables

  • Executive report (PDF) for regulators
  • Interactive dashboards (HTML, Chart.js)
  • Structured outputs (CSV) for supervisory integration

📘Conclusion

This project demonstrates that combining linguistic signals with financial ratios gives regulators a proactive edge in identifying early signs of stress within Global Systemically Important Banks. UBS showed the most concerning sentiment signals, Barclays displayed volatility, and Morgan Stanley remained comparatively stable.

By integrating NLP, financial modelling, topic analysis, and LLM benchmarking, the analysis provides a richer and more forward-looking understanding of systemic risk.