My Personal Guide to the Applied Use of LLMs for Economic Analysis / Financial Modeling
Hryhorii Kukuruza, economist, co-founder of Ukraine Economic Outlook
Bohdan Kukuruza, Computer Science student, Dublin City University
What makes this text valuable?
- This guide is based solely on my own experience and on advice from highly experienced colleagues in the financial sector. LLMs used for programming and those used for business tasks/economic analysis are completely different worlds. As a result, standard benchmark tables may not reflect your specific needs (e.g., competitive analysis, rapid market sizing, or fast news analysis to build a timeline of the latest developments around a particular subject).
- You will also find opinionated recommendations on which models to use for different tasks. My day-to-day toolkit includes Claude, ChatGPT, Gemini, Grok, and Perplexity. Each model has its own strengths: working with sources (building a dataset for subsequent analysis), reasoning and output quality, working with files (building the “skeleton” of a model and validating formulas and dependencies), working with text, and more.
- Finally, this document includes observations and practical recommendations on integrating LLMs into workflows. The starting point is an acute shortage of qualified talent, combined with limited discretionary budgets for innovation and productivity improvements. In this context, the optimal approach is not end-to-end LLM process automation, but tailored productivity gains for each individual team member.
Business in Ukraine certainly does not have the same financial cushion for tech experiments as entrepreneurs in Scandinavian, DACH, or Benelux countries. I would caution that 95% of such experiments ended in failure (MIT study, July 2025), and only 5% of integrations actually yield high returns. The realistically achievable goal is to take a mid-skilled employee and "upgrade" them (think cyberpunk).
Of course, the long-term risk is that there will be even fewer people who at least understand how to do this manually. And even more news about consultants being caught using ChatGPT-generated text. The latter is a separate pain point—but it can be avoided if LLM use in specific functions is governed by clear rules, without excessive fanaticism.
Main examples of applying LLM models from my practice:
- Preparation of information memos – chronologies of events (for companies, investment projects, draft laws);
- Search and aggregation of large arrays of open data (for example, investments made by industries and regions over the last 5 years).
- Formation of consensus forecasts for currency pairs, commodities, specific economies;
- Rapid hypothesis testing (business ideas, export directions);
- Market analysis / competitive analysis. For work with Ukrainian queries, I “enrich” the source base with selections from registers in BizFinder (automatic selections of companies by industries, turnover, and geo-location);
- Building basic financial models (Balance Sheet, P&L, Cash Flow); verification of interconnections and formulas in xlsx-files, seasonality analysis;
Stage 1. Data search, working with sources, and populating the data repository for further analysis
External sources. I do not rely on LLMs as a “source of facts,” to avoid hallucination risk. I use them primarily to extract, structure, and summarize information from external sources.
- ChatGPT – Agent Mode (significantly "smarter" than Web Search).
- Gemini – Deep Research in Pro mode.
- Grok – DeepSearch in Expert mode.
- Claude – Opus 4.5 – Web Search + Extended thinking + Research mode.
You write your search query, detailing the expected result. In practice, I run all 4 models for the search. I collect all results in Claude and then form the report. Typically, Agent Mode processes ~10-15 sources, but studies each of them in maximum detail. Gemini – Deep Research, depending on the task, can process from ≤ 10 sources to 120-150. Grok is similar: for a broad query, it can process ~60-100 sources. Grok’s advantage is that among free versions for source processing, it is unequivocally the leader. Claude, depending on the query, processed from 50 to 400 sources each.
Query language matters. Every LLM model by default tries to minimize its resource consumption. Therefore, if you thought the model searches for information simultaneously in all languages and then translates the result into the query language – this is not the case. Priority in source processing is given to the query language. Then, depending on the context, the most common languages for the topic (English, German, French, Chinese) or contextual search in other local languages may be added. Each model has its own, slightly different approach to language prioritization. Therefore, this aspect is critically important to consider. Otherwise, to a request for a short report on the coal industry, you will receive a report on how to properly integrate the mines of the so-called DNR and LNR into the Russian energy system (from personal practice).
Internal sources. Formally, all tools allow connecting access to Google Drive. Claude desktop generally has a wide arsenal of tools. But in practice, one has to work from different devices directly with a "zoo" of files of various formats. For "light" files up to 5 MB, everything works. With 20+ MB, only Gemini managed it for me. Grok struggles terribly and lags when uploading files in principle.
Working with audio. Offline meetings and strategic sessions remain in our lives. Gemini Pro is the only tool that allows quickly transcribing a recording and automatically compiling a short structured document with all the information.
A separate point – working with registers. ChatGPT, Gemini, and Grok have unlimited access to registers with limited access (for example, with company financial statements). Therefore, using the EDRPOU / European Unique Identifier / other legal entity identifier, one can easily find a company's financial statements.
Stage 2. Data processing and analysis, report generation and conclusion formation.
The main rule is that all LLM models resemble a competent but highly specialized professional who, in the process of preparing a report, will inevitably make 1-2 critical errors that render the result unsuitable for use in the first iteration of your joint work. The meme line: “Yes, indeed, you are right. I will redo the report now” is actually a game of roulette where either you find the error, or your client does.
Therefore, if you are ready to take the risk and responsibility for error analysis in the prepared report, then there is a point in playing this game. Previously ~70% (the old days of the o3 ChatGPT model a year ago) of the information in the prepared report was correct; currently, this proportion has grown to ~90-95%. The question is – is your competence sufficient to detect the 5% of errors among the entire array?
I will simply remind you that a new term has emerged – workslop – AI-generated content that looks ready for use at first glance but actually requires significant refinement.
Among the important pros: the performance level of the best LLM models in the direction of “reasoning” is already higher than that of a junior analyst.
Specifically regarding models.
For me, the unequivocal leader is Claude Opus 4.5 (Web Search + Extended thinking + Research mode). The model release was only at the end of November. The model forms a report of truly high quality and possesses clear, structured thinking. It recommends the response form itself (Word format, xlsx). Strengths: business analysis, work with financial statements, economic analysis.
Short overview of functionality.
Second place goes to GPT 5.2 Thinking (+ Deep Research mode). The release was only on December 11. Overall, it has a right to exist. But exclusively as number two. The main downside is the instability of the result. Situationally, the model can form a quite decent report (with step-by-step thinking, self-reflection, unique conclusions). Sometimes — one regrets the time spent waiting (it can take up to 10-15 minutes of “reasoning and preparation”).
Third place can be shared between Gemini 3.0 Pro and Grok Expert. Gemini is the unequivocal leader in document processing, their systematization, and forming short simple conclusions. But nothing more. The unequivocal favorite in Gemini is the Deep Research option, but in all other tasks, I do not share my colleagues' enthusiasm. Quality metrics can look strong, but the final output is often underwhelming.
Grok Expert, considering I had absolutely no positive expectations, is actually quite decent. Low Hallucination Rate (the lowest after Claude). Among the minuses — it can “lag” and lose the entire prompt query you wrote after an hour of waiting (unpleasant, but it happened).
Perplexity Laboratory / Perplexity Deep Research – also has a right to exist. Unequivocally strong in working with files (creating Excel files, presentations), but very similar to a Claude clone.
For objectivity, I am adding an overview of the conducted efficiency measurements of different models (although it does not coincide with my observations):
Short reminder: Always keep an eye on the model switcher – the LLM's task is to minimize the compute spent on you. Therefore, it is better not to choose the “Auto” option when selecting the operating mode, only the smartest (but also the slowest) of the modes.
Stage 3. Working with files (xlsx), analysis of interconnections and written formulas, basic financial modeling.
Main functionality (use directly within Excel). In short, the product is not yet ready for practical mass use. But within the framework of cautious testing, it is quite acceptable. In this round, only Claude and ChatGPT remain with us, built into the Microsoft 365 Frontier lineup (main interest – Excel). This is test access to Copilot agents (built on competing LLMs, with an internal selection option).
Advantages: transparent code that allows tracking the logic of the LLM model. Truly fast and convenient work with large Excel files (which is much harder when working directly in the LLM window), data analysis functionality, embedding and checking formulas. Claude possesses a "professional background" for working with financial models.
Yes, a mid-level analyst from the Big Four and Big Three will do it better. But in how much time, isn't it better to simplify the work and start it on an already prepared template, and the main question is – what should companies do that do not have personnel of this qualification level? Therefore, in reality, Agents in Excel can be forgiven for a lot.
Limited functionality (working with files via the standard UI window).
One can work with Claude and Perplexity (I might be mistaken about ChatGPT, but such is the experience formed over the last months; ChatGPT objectively lags significantly specifically when working with xlsx files, and regarding analyzing all existing formulas inside, compiling a description of what, why and how, and checking for possible errors – there was no positive experience).
Claude and Perplexity (Laboratory Mode) analyze multi-sheet xlsx files quite decently, check the model construction logic, interconnections between formulas. Considering the frequent practice of client behavior "need it very urgently — then don't need it, and then again need it right now very urgently, right at midnight," – this is an ideal tool to refresh understanding of the file / onboarding into a project.
Stage 4. Integration and internal personnel resistance.
First of all, a big "thank you" is due to all the infopreneurs who in 2023-2024 formed inadequately inflated expectations for the technology. In the West — to boost capitalization, here — to sell courses / integrations / masterclasses. Now, thanks to them, topics of LLM automation are not discussed in polite society. But we must proceed from existing realities and the necessity to increase personnel productivity at any cost.
- Finances. Identify employees who bear a significant intellectual load and provide them with a monthly budget of at least $150-250 for LLM subscriptions. Experience and evaluation of the possibilities of their applied use are acquired in the process of tests and experiments. In a few months, employees will return with ideas / proposals on what and how can be implemented on a permanent basis. Then it makes sense to switch to corporate subscriptions.
- The 'Old Guard' employee who claims they can do everything better than AI. There is one in every team. Therefore, the company must have a CDTO (or someone assigned this function under high austerity conditions), who has the authority of the "third" person after the CEO and CFO.
- Quick results and the reality of goals. You don't need to run faster than the bear, only faster than the other guy. Walmart optimized logistics through an internal AI solution, Route Optimization, and sells it as a SaaS product. The probability that you will build a similar internal solution that can later even be sold to the market is minimal (although anything is possible). Focus on increasing the productivity of individual employees. This is a less ambitious approach, but more realistic.
- Advisability of external expertise. In the short run, it is significantly more profitable to engage an external team to analyze existing processes and prepare proposals regarding LLM integration than to bet on internal hiring. Real “old-timers” — professionals with deep practical experience — are actually very few. They might still take it on as a project, but hiring is usually possible only at an inflated price. On the other hand, “AI automation enthusiasts” with half a year of experience will sooner cause harm than produce a result. If there are no professional developers with at least 5 years of experience in the team, this is at the very least suspicious.