Python Expands the Toolbox of Finance Professionals More than Any Alternative

 

Companies of all kinds – asset managers most of all – hire finance professionals who can code in Python. They value this skill primarily because Python is uniquely useful for enhancing and automating data analysis. Alternatives like Alteryx and VBA offer some of the same functions, but Python provides the most tools in a single platform at no cost.


 

Introduction

PyFi is a business that aims to teach finance professionals how to code in Python. In the early years, we distributed classes exclusively through partners like Wall Street Oasis, Corporate Finance Institute, and The Marquee Group, but more recently, we have started to develop our own marketing operation.

Good marketing helps prospects to understand and successfully navigate reality. In our case, prospective students want to know, “Is learning Python the best investment of my time and money? If so, is PyFi the best place for me to learn it?” Gill and I are comfortable answering the second question, but we realized a few months ago that we were not able to adequately address the first. I know why I learned Python – I wanted to use machine learning to impress my boss – but many finance professionals find themselves in different circumstances. To serve our prospects, PyFi’s marketing must explain the value of Python, and in order to do that, I needed to understand it better myself.

To gain this understanding, I set my other work aside and embarked on a research project. This essay contains my findings.

The Job Posting Survey

Introduction

In order to succeed, a company must identify (with some degree of accuracy) the tools that its employees need to solve the problems that they face. Companies mention these tools in their job postings because they want to hire candidates who have experience using them. Therefore, to begin my research, I decided to conduct a survey of job postings to understand how often and why successful companies hire finance professionals who can code in Python.

High-Level Staistics

My research assistant and I analyzed 2,718 finance-related[1] job postings from the following 42 companies:

Banks

  • Capital One
  • Goldman Sachs
  • JPMorganChase
  • Royal Bank of Canada
  • Toronto Dominion
  • UBS Group
  • Wells Fargo

 Investment Managers

  • AQR Capital Management
  • BlackRock
  • Fidelity
  • Vanguard

 Private Equity Firms

  • Apollo Global Management
  • Bain Capital
  • Blackstone
  • The Carlyle Group
  • General Atlantic
  • KKR
  • Millennium Management
  • Vista Equity Partners

 Other

  • 3M
  • Amazon
  • American Express
  • BMW Group
  • Bristol-Myers Squibb
  • CVS Health
  • Home Depot
  • IBM
  • Johnson & Johnson
  • Lockheed Martin
  • Merck
  • Microsoft
  • Nationwide
  • Nike
  • Nvidia
  • RTX
  • Salesforce
  • Sony
  • Toyota
  • Uber
  • UnitedHealth Group
  • Visa
  • Walmart

 

A significant minority of job postings (10%) mentioned Python.

 

 

A majority of companies (60%) had at least one finance-related job posting that mentioned Python.

 

Jobs posted by investment management firms mentioned Python far more frequently (39%) than banks (13%), private equity firms (6%), and other companies (3%).

 

These ten companies mentioned Python most frequently in their finance-related job postings:

Note: Wes McKinney was working for AQR when he created pandas, one of the most popular Python packages.

 

Jobs in the categories of investment / portfolio management, risk management, research, and trading mentioned Python most frequently. Jobs in the categories of accounting, investment banking, and private equity did not mention Python at all.[2]

 

Low-Level Staistics

After collecting the job count data above, we selected 100 job postings to analyze in greater detail.

Python experience satisfied a requirement 51% of the time and satisfied a preference 49% of the time.

 

Job postings most frequently indicated that the successful candidate would use Python for data analysis.

Note: n=33. Some postings indicate multiple uses.

 

 

While 20% of job postings mentioned Python exclusively, 80% mentioned Python alongside alternative tools (e.g. “The successful candidate must have experience using a high-level programming language such as Python or R to analyze large datasets.”).

 

The following alternative tools appeared alongside Python most frequently:

 

Different types of job postings mentioned alternative tools with different frequency:

Note: Each value represents the proportion of each job type’s job postings that mention each tool (out of that job type’s job postings that list alternative tools). Most of these postings mention multiple tools).

 

Analysis of Alternative Tools

Mapping Python and Alternatives

After completing my analysis of the job posting data, I became interested in precisely understanding the similarities and differences between Python and the most frequently mentioned alternative tools. As I compared tools,[3] I realized that most of their differences fell into two categories: scope of utility[4] and interface abstraction.[5]

Some tools are designed for a relatively narrow purpose, whereas others are designed to be more generally useful. For example, Tableau is a specialized data visualization tool, whereas Excel is good for visualizing data plus performing statistical analysis, building financial models, and more. In this case, Excel has the wider scope of utility.

Some tools make their functions available to users through relatively simple, higher-level interfaces whereas other tools make their functions available through more complex, lower-level interfaces. The former – tools that have abstracted further away from the underlying computing machine – are more convenient, but the latter are more powerful.[6]

Note: Opaque tools are available for free. Transparent tools are not.

 

As I mapped Python and its alternatives across these two dimensions, I saw that they formed a diagonal line. The special-purpose tools had higher-level interfaces, and the general-purpose tools had lower-level interfaces. Why?

The Tradeoff Between Convenience and Potential

IKEA sells furniture kits. Home Depot sells tools and raw materials. Using products from Home Depot, you could make any furniture that you can buy at IKEA – but why would you? If a cheap dresser is all that you need, you would be wasting your time by building one from scratch. On the other hand, if you want to add a custom deck to your back yard, then you need to go to Home Depot. They beat IKEA not by making more convenient furniture kits but by providing customers with greater potential.

 

Businesses like IKEA and Home Depot differentiate themselves by occupying a unique place along the efficient frontier of the tradeoff between convenience and potential. One business can take sales from another by providing a) greater potential with equal convenience or b) greater convenience with equal potential. Over time, this competitive pressure pushes the frontier outward.

 

Like a business, in order to compete with other software, a computing tool must occupy a unique place along this efficient frontier. Therefore, when we map Python and its most frequently mentioned alternatives, we find them arranged in a diagonal line.

Graphical Analytical Software

 

Since its inception, Tableau has helped nontechnical users to analyze data through a high-level, drag-and-drop, graphical interface. Developing and sharing interactive dashboards with Tableau is simple and intuitive, but the software is costly and its functions are limited.

Power BI is not significantly different from Tableau across the dimensions of interface abstraction or scope of utility. It is about ten times less expensive, though.

Alteryx is another drag-and-drop analytical tool with a higher price than Tableau or Power BI ($4,950 per year at the time of writing). While these three tools provide similar capabilities through similar interfaces, users favor Alteryx for data preparation and predictive analytics and Tableau for data visualization.

Users interface with Excel and SAS largely through high-level programming languages (Excel’s formula language and the SAS language)[7]; therefore, I classify them as hybrid software. Like Tableau, Power BI, and Alteryx, Excel and SAS support accessing, transforming, and analyzing data and communicating insights (even through interactive dashboards). Further, the flexibility provided through their lower-level interfaces enable users to perform some tasks (e.g. financial modeling) that are not possible with higher-level software.

Excel is available for free with limited features or as part of Microsoft 365. SAS does not share prices on their website, but users indicate that the cost is similar to Alteryx.

Every tool in this group includes a Python development environment that allows users to write and execute Python code.

Special-Purpose High-Level Programming Languages

 

Just below the graphical software we find two special-purpose high-level programming languages: SQL and VBA.

SQL is for interfacing with relational databases, and VBA is for extending and automating Microsoft Office applications. Both languages are simple and more or less intuitive. For example:

 

The following SQL code would retrieve the “Date” and “Amount” columns from a table named “Transactions” for all transactions with an amount greater than $100…

SQL

SELECT Date, Amount
FROM Transactions
WHERE Amount > 100;

 

…and the following VBA code would add the text “New Text” to cell A1 of Sheet1 in an Excel workbook:

VBA

Sub AddText()
   ThisWorkbook.Sheets(“Sheet1”).Activate
   Range(“A1”).Value = “New Text”
End Sub

 

Any action that a user can perform in Excel can be represented in (and therefore automated with) VBA, but the same is true for Python.[8] By learning Python, a VBA user would extend his capabilities far beyond automating Excel, but a Python user would gain little from learning VBA.

On the other hand, if a Python user wants to interface programmatically with a relational database (e.g. Oracle), then he will need some understanding of SQL. Python does provide packages like SQLAlchemy for interfacing with relational databases; however, because these databases and SQL are so closely related, SQLAlchemy is like a Pythonic dialect of SQL.

While SQL is a necessary complement to Python in certain circumstances, it is not a general-purpose tool. For this reason, most relational database management systems allow users to embed Python code within SQL.

Low-Level Programming Languages

 

Skipping ahead to the bottom end of the spectrum, we find one relatively low-level[9] programming language: C++. Although the 100 job postings in our detailed analysis listed C++ only twice, I decided to include it in this map to demonstrate that finance professionals sometimes require even lower-level tools than Python. The increased control provided by C++ (e.g. manual memory management) allows users to develop faster programs for demanding applications like high-frequency trading.

C++ is more complex and less intuitive than the other programming languages on the map. For example, in order to calculate 2 + 2, store the result in a variable, and then display the value of the variable, you would need to write the following code:

 

C++

#include <iostream>

int main() {
         int sum = 2 + 2;
         std::cout << sum << std::endl;
         return 0;
}

General-Purpose High-Level Programming Languages

 

Finally, in the middle of the bottom-right quadrant are Python and its closest alternatives, R and MATLAB. Each language has a similar syntax. The following code is functionally identical to the C++ example above:

 

Python

result = 2 + 2
print(result)

 

R

result <- 2 + 2
print(result)

 

MATLAB

result = 2 + 2;
disp(result);

 

Despite their similarity, Python, R, and MATLAB differ widely in popularity:

In the third quarter of 2024, there were over 2 million GitHub developers using Python compared to about 100,000 using R and about 50,000 using MATLAB. This spread has increased steadily over time. Since 2020, the number of Python developers has grown an average of 22% per year compared to 5% for R and 3% for MATLAB.[10]

The most obvious reason for this difference that I have found is that Python provides users with the biggest toolbox:[11]

 

Of course, one cause of this difference in the number of packages is the difference in the size of the communities that make them. These two factors reinforce one another – a more popular language will have a larger community that produces more tools, and an abundance of tools causes the language to become even more popular.

One important subset of community-generated tools is APIs, which allow users to programmatically interface with other software. In the domain of APIs, Python is a first-class citizen; R and MATLAB are not. Bloomberg, FactSet, OpenAI, and Grok, for example, all provide an official API for Python but not for R or MATLAB.

Package count is not a perfect proxy for convenience or scope of utility; these three languages share more functionality than the chart above suggests. A small number of packages meet most of the needs of Python users, and most of the time, R and MATLAB offer an equivalent.

 

Conclusion

Companies of all kinds – asset managers most of all – hire finance professionals who can code in Python. They value this skill primarily because Python is uniquely useful for enhancing and automating data analysis. Alternatives like Alteryx and VBA offer some of the same functions, but Python provides the most tools in a single platform at no cost.

My personal experience leads me to believe that job postings are a lagging indicator of the utility of Python in finance. Infrequent mentions of Python in a category of job postings does not indicate that professionals in that category would not benefit from Python. When I went through recruiting to become an investment banking analyst, nobody asked me if I knew Python. However, after I was hired, I used Python to apply machine learning algorithms to improve the advice that my team gave to our clients. The people who hired me were not aware of the opportunity, so they were not looking for candidates with Python experience, but the opportunity was there nonetheless.

Python is 3x more popular than C++, 20x more popular than R, and 40x more popular than MATLAB. If the historical trend continues, this spread will increase over time, causing further growth in the Python developer community and the tools they produce. This productive community provides Python users with assurance that they will retain access to cutting-edge tools as technology and business needs evolve.

Having completed this research project, I now understand what PyFi’s marketing position should be:

Are you a creative, technical finance professional held back by IKEA-grade equipment?

Expand your toolbox. Build new solutions with Python.

Written by Zach Washam


[1] I only included companies in the survey if they classified their job postings with a reasonable degree of reliability (i.e. jobs in their finance-related categories mostly targeted finance professionals). Still, because I chose to tolerate some imprecision for the sake of scale, some jobs in finance-related categories that I counted in the high-level statistics did not target finance professionals. I assume that such irrelevant jobs which did not mention Python (e.g. administrative assistants) were roughly equal in number to those that did (e.g. developers) and that, therefore, the high-level statistics that I calculated with this data provide a meaningful (even if imperfect) view of the finance job market.

[2] Based on their function, I sorted most jobs posted by private equity firms into the finance, investment / portfolio management, and trading categories (e.g. I classified a Blackstone Private Equity Strategies job as Private Equity, but I classified a Blackstone Tactical Opportunities job as Investment / Portfolio Management).

[3] At the time of writing, I have learned about these tools mostly through research rather than firsthand experience.

 

[4] More precisely, I mean scope of potential utility independent of the amount of work required to actualize that potential. For example, because Python is written in C, C must have a greater scope of potential utility.

 

[5] If you are not familiar with the concept of abstraction in computing, take four minutes to watch this video.

 

[6] i.e. They offer users greater potential even if actualizing that potential requires more work than actualizing the relatively limited potential of a higher-level interface.

[7]  Users occasionally interact with Tableau and Power BI through high-level programming languages (Tableau’s calculation language and DAX), but not as frequently as they do with Excel and SAS.

[8] VBA does provide a small set of features which are not available to Excel or Python users. For example, with VBA, users can cause Excel events (e.g. creating a new worksheet) to trigger automations (e.g. adding a header).

 

[9] Although C++ is lower-level than Python, many people classify it as a high-level programming language and reserve the term “low-level” for assembly language and binary machine code. Some of these people distinguish Python and close alternatives by classifying them as “very high-level” languages.

[10] GitHub’s Innovation Graph repository provided the data for this chart.

[11] The Python Package Index, the MATLAB File Exchange, and CRAN provided the data for this chart.

Back to blog