Financial analysis is often gate-kept by complex spreadsheets and static PDF reports. In this post, we’ll break down the technical architecture of a “Premium Balance Sheet Analyzer” that uses Generative AI to bridge the gap between static financial statements and real-time market data.
1. Unified Data Ingestion Architecture
One of the most challenging aspects of financial automation is handling varied input sources. Our application implements a Dual-Stream Ingestion Path for both local files and remote URLs.
Handling Remote Streams with io.BytesIO
To avoid bloating server storage, we process remote URLs in-memory using the requests library and io.BytesIO.
import requests import io response = requests.get(file_url) response.raise_for_status() source_file = io.BytesIO(response.content) source_file.name = "downloaded_report.pdf" # Metadata normalization
PDF vs. Excel Parsing Strategies
- PDF Extraction: We use
pdfplumberinstead of basicPyPDF2because it handles tabular data with higher fidelity, preserving the vertical alignment of currency columns. - Excel Extraction: We use
pd.read_excel(file, sheet_name=None)to iterate through every sheet in the workbook, converting numerical tables into a sanitized text representation that the LLM can ingest easily.
2. Intelligence Layer: Gemini 2.5 Flash
We selected Gemini 2.5 Flash for its massive context window (over 1M tokens) and multimodal reasoning capabilities.
Complex Multi-Stage Prompting
The core logic resides in a two-stage inference process:
Stage 1: Extraction & Ticker Identification
The AI is instructed to identify the “Probable Ticker Symbol” for the Indian stock exchange. We use a standardized output format: TICKER: <SYMBOL>.NS.
prompt = """ Analyze the data for: 1. Key Ratios (Current Ratio, D/E, ROE) 2. Red Flags ... Identify the NSE/BSE ticker symbol. Output as 'TICKER: SYMBOL.NS' """
Stage 2: Recursive Valuation
We take the output of Stage 1, fetch real-time data from yfinance, and then pass both the original balance sheet and the live market data back to Gemini to determine if the stock price is “justified.”
3. Real-Time Market Integration with yfinance
Once the ticker is identified (e.g., TCS.NS), we hit the Yahoo Finance API.
stock = yf.Ticker("RELIANCE.NS")
info = stock.info
curr_price = info.get('currentPrice')
# Fetching multiples like P/E and Market Cap
market_cap = info.get('marketCap') / 1e7 # Converting to Cr (Standard Indian Finance)
We use this data to populate Plotly Graph Objects, creating a dynamic 30-day price action chart that provides context to the balance sheet health.
4. State Management and Conversations
Streamlit’s st.session_state is used to maintain the chat history and the analysis results, preventing redundant API calls (and costs).
Chat History Management
Gemini’s start_chat method is perfect for keeping financial context. We store the history object in the session state.
# Initializing Chat with context
if "chat_history" not in st.session_state:
st.session_state.chat_history = []
# Appending with history awareness
chat = model.start_chat(history=st.session_state.chat_history)
response = chat.send_message(user_query)
st.session_state.chat_history = chat.history
5. Premium UI Engineering
To achieve a “Bloomberg Terminal” aesthetic, we bypassed standard Streamlit layouts with custom CSS and glassmorphism.
Glassmorphism Card System
.card {
background: rgba(30, 41, 59, 0.7);
backdrop-filter: blur(12px);
border: 1px solid rgba(255, 255, 255, 0.1);
border-radius: 16px;
padding: 24px;
}
Multi-Tab Navigation
We use st.tabs to separate the Qualitative Report, Quantitative Valuation, and the Interactive AI Chat. This keeps the user focused on one dimension of the data at a time while preserving the overall context.
🛡️ Best Practices & Security
- API Key Management: Supports both
.envvariables and runtime user input (stored safely in session). - Graceful Failures: Wrapped extraction logic in
try-exceptblocks to handle malformed PDF tables. - Resource Cleanup: Used
io.BytesIOto ensure no temporary files are left on the execution server.
This implementation demonstrates the power of combining modern LLMs with real-time financial APIs to build institutional-grade tools on top of lightweight frameworks like Streamlit.
Github link – https://github.com/sethlahaul/streamlit-balance-sheet-analyzer
