How to Perform Sentiment Analysis in Python, 2023

August 10, 2023

In the world of financial markets, the sentiment of a stock is as important as its quantitative metrics. Being able to understand public perception can provide insight into future price movements, market trends, and investment strategies. Nvidia, being a leading player in the tech industry, particularly in graphics processing units (GPUs) and AI technologies, presents a significant interest for investors, analysts, and traders alike. In this blog, we delve into the detailed process of collecting, processing, and analyzing news data for sentiment analysis regarding Nvidia's stocks.

Setting Up the Environment
Data Collection
1. Searching for News Articles
2. Extracting and Storing Results
Web Scraping Full Article Text
1. Storing and Reloading the Results
Integrating Stock Data
Performing Sentiment Analysis on News Articles
1. Data Prep
Merging Sentiment Analysis with Stock Data
Visualization
1. Sentiment Polarity Over Time
2. Nvidia Closing Price Over Time
Conclusion

Setting Up the Environment

Before diving into the coding, you need to ensure your environment is properly configured. This involves installing some essential libraries. Open your command prompt or terminal and run the following commands:


!pip install GoogleNews
!pip install fake-useragent
!pip install newspaper3k

GoogleNews: Helps fetch news articles related to Nvidia from Google News.
fake-useragent: Helps simulate real browser requests to scrape the news content.
newspaper3k: A library for extracting articles from various news sources.

Data Collection

1.) Searching for News Articles

We begin by defining a keyword to search, and in this case, it's 'Nvidia'. We will fetch news articles related to this keyword using the GoogleNews library


from GoogleNews import GoogleNews
keyword = 'Nvidia'
all_results = []
for period in ['7']: # Collecting data for the last 7 days
	googlenews = GoogleNews(lang='en', region='US', period=period, encode='utf-8')
	googlenews.search(keyword)
	for page in range(1, 20): # Iterating through 20 pages of search results
  	  googlenews.get_page(page)
        all_results.extend(googlenews.results())
	googlenews.clear()

Here, we create a GoogleNews object with specific attributes such as language (English), region (US), and period (last 7 days). We then extend our search through multiple pages to collect a comprehensive set of news articles.

2.) Extracting and Storing Results

Next, we'll convert the results into a Pandas DataFrame, allowing for easy manipulation and analysis.


import pandas as pd
news_data_df = pd.DataFrame.from_dict(all_results)

This DataFrame contains valuable information such as title, date, description, link, and image of the news articles.

Web Scraping Full Article Text

With the links obtained, we need to scrape the full text of each article. We'll be using the newspaper3k library and simulating browser requests using fake-useragent.


import requests
from fake_useragent import UserAgent
from newspaper import fulltext 
ua = UserAgent()
news_data_df_with_text = []
for index, headers in news_data_df.iterrows():
	news_link = str(headers['link'])
	html = requests.get(news_link, headers={'User-Agent':ua.chrome}, timeout=5).text
	text = fulltext(html)
    news_data_df_with_text.append([news_title, news_media, news_update, news_timestamp, news_description, news_link, news_img, text])
#print news data head
news_data_df.head()


#print the tail of the data
news_data_df.tail()

Here, we iterate through each article link, send a GET request with a simulated user-agent, and extract the full text. The timeout ensures that the request doesn't hang indefinitely.

1.) Storing and reloading the Results

To facilitate future analysis, we store the collected data in a CSV file.


news_data_with_text_df = pd.DataFrame(news_data_df_with_text, columns=['Title', 'Media', 'Update', 'Timestamp', 'Description', 'Link', 'Image', 'Text'])
news_data_with_text_df.to_csv("./news_data_with_text.csv")
# Reload the saved news data content from a CSV file.
news_data_with_text_df1 = pd.read_csv("/content/news_data_with_text.csv", index_col=0)

Integrating Stock Data

We'll use the "OpenBB " library to fetch Nvidia's stock data, which can be installed using the following command:


!pip install openbb
import pandas as pd
from openbb_terminal.sdk import openbb
nvda_df = openbb.stocks.load(symbol = 'nvda')
nvda_df.head()
nvda_df.tail()

Here, nvda_df contains the stock information for Nvidia, including the opening, high, low, close, and volume.

Performing Sentiment Analysis on News Articles

Sentiment Analysis is the process of determining whether a piece of writing (in our case, news articles) is positive, negative, or neutral. We'll use the TextBlob library in Python, which provides a simple API for diving into common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, and sentiment analysis.

First, we'll install TextBlob using pip.


!pip install textblob

Then, you can calculate subjectivity and polarity by defining the following functions:e


from textblob import TextBlob
def getSubjectivity(text):
	return TextBlob(str(text)).sentiment.subjectivity
def getPolarity(text):
	return TextBlob(str(text)).sentiment.polarity
news_data_with_text_df1['Subjectivity'] = news_data_with_text_df1['Text'].apply(getSubjectivity)
news_data_with_text_df1['Polarity'] = news_data_with_text_df1['Text'].apply(getPolarity)

TextBlob has a pre-trained sentiment prediction function that returns a tuple representing polarity and subjectivity of a sentence.

Polarity: This is a float that lies between [-1,1], -1 indicates a negative sentiment and +1 indicating a positive sentiment.
Subjectivity: This is a float that lies in the range of [0,1]. Subjective sentences generally refer to personal opinion, emotion, or judgment, whereas objective refers to factual information.

1.) Data Prep

The next step involves handling the date, ensuring that it is in the correct format, and grouping the sentiment scores by date:


def check_date(date):
	try:
    	pd.to_datetime(date)
    	return True
	except ValueError:
    	return False
 
news_data_with_text_df1 = news_data_with_text_df1[news_data_with_text_df1["Timestamp"].apply(check_date)]
news_data_with_text_df1['Timestamp'] = pd.to_datetime(news_data_with_text_df1['Timestamp'])
news_data_with_text_df1['Date'] = news_data_with_text_df1['Timestamp'].dt.date

Merging Sentiment Analysis and Stock Data

We will combine the sentiment data with Nvidia's stock data, ensuring that the date ranges align:


merged_df = nvda_df.merge(News_df_daily, left_index=True, right_index=True, how='inner')

Visualization

1.) Sentiment Polarity over Time

We can visualize how the polarity of news sentiment changes over time:

Positive Sentiment: A positive sentiment implies a favorable or optimistic view. Words like "good," "happy," and "excellent" contribute to a positive sentiment. In terms of polarity, positive sentiment is usually represented with values greater than 0, often on a scale from 0 to +1.

Negative Sentiment: Conversely, a negative sentiment indicates an unfavorable or pessimistic view. Words like "bad," "sad," or "terrible" contribute to a negative sentiment. Negative sentiment is usually represented with values less than 0, often on a scale from 0 to -1.

Neutral Sentiment: A neutral sentiment means that the text doesn't convey a particularly positive or negative view. This could include factual statements or content that doesn't express emotion. Neutral sentiment is usually represented with a value of 0.


import matplotlib.pyplot as plt
 
plt.figure(figsize=(10, 6))
plt.scatter(News_df_daily.index, News_df_daily['Polarity'], label='Polarity')
plt.plot(News_df_daily.index, News_df_daily['Polarity'], color='blue', alpha=0.5)
plt.title('Sentiment Polarity over Time')
plt.xlabel('Date')
plt.ylabel('Polarity')
plt.legend()
plt.grid(True)
plt.show()

Since all the polarity values fall within positive range, it indicates a consistently positive sentiment, but not excessively so. The sentiment is likely to be mildly to moderately positive across the dataset. There may be differences in intensity, but they are all in the positive direction

2.) Nvidia Closing Price Over Time

Lastly, we can plot Nvidia's closing price to observe how it corresponds with the sentiment:


nvda_subset = nvda_df['2023-08-01':]
plt.figure(figsize=[15,7])
plt.plot(nvda_subset['Close'])
plt.title('NVIDIA Closing Price Over Time (from 2023-08-02)')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.grid(True)
plt.show()

Since the sentiment polarity is positive and neutral there has been a corresponding gradual increase in the stock price. This correlation may be attributed to various interconnected factors. Positive sentiment, reflecting optimistic views and emotions related to a specific company or market, can boost investor confidence and increase demand for the stock, leading to a rise in price. Positive public perception may also influence consumer behavior, translating into improved sales or other favorable financial outcomes that further bolster the stock price. Additionally, some investment strategies employing algorithmic trading might respond to positive sentiment by triggering buying actions, contributing to the price increase. However, it's essential to recognize that the relationship between positive sentiment and increasing stock price might also be influenced by underlying financial performance, broader market conditions, or other external factors. While the observed correlation between sentiment and price movement can be insightful, a comprehensive analysis is crucial to understand the intricate dynamics fully.

Conclusion

This comprehensive guide walks you through the process of performing sentiment analysis on Nvidia stocks by scraping news articles, calculating sentiment scores, integrating with stock data, and visualizing the results.

Understanding the relationship between public sentiment and stock price is an evolving field of study, with great potential for investors, researchers, and market analysts. Remember that the sentiment analysis only provides one perspective and should be used in conjunction with other metrics and market analyses to make more accurate predictions and decisions.

The complete code and for this project can be found on the PyFi GitHub page.

Written by Numan Yaqoob, PHD candidate

DISCLAIMER

*This information is for educational purposes only, and is not financial advice. Trading securities is risky, and can result in financial losses. Trade at your own risk.

Back to blog

Highly Recommend!

What an excellent course. The Python portion gets you quickly up to speed on Python data structures, common libraries, and functions.

The ML portion gives two end to end examples of structuring, training, testing, and selecting ML models.

The pace of the course and level of detail given were great. The videos also had the right level of enthusiasm and inflection to keep you engaged and interested. There are opportunities to practice what you learn after almost every video.

I really want to thank the course creators for a job well done and I hope they create more courses. I highly recommend this training.

PyFi - Intro to Python

Great foundational course for Python. I had no previous experience in Python and this course equips beginners with practical and easy to learn fundamentals of the programming. I felt the instruction was clear and provided great applications for finance.

Intro to Python

This course provides an excellent foundation for Python, focusing on essentials before applications. Zach Washam clearly defines the course purpose, aligning perfectly with my goals. The structured, multi-session format worked great with my busy schedule, and the pre-class preparation was well-communicated with a helpful video guide.

Zach’s articulate teaching and consistent use of precise terminology made learning seamless. He not only covered the basics thoroughly but also provided guidance for tailoring and advancing your Python experience. His detailed, example-driven responses to questions were particularly valuable.

Introduction to Python Course

The Introduction to Python course was a great start on the basics of Python. Zach was great at answering questions and taking the class along the journey to build a strong foundation of Python knowledge. This has me excited to continue learning more Python and applying it to my everyday work.

I'm looking forward to seeing how the course evolves in the future, and I'll be recommending this to my team members in FP&A. I'd love to see more examples added, especially depending on the format of the course (self-study vs. live).

Introduction to Python

I found this Python course incredibly useful and easy to follow. It covers the basics of Python programming and provides a solid foundation. The examples are clear, and while I haven't yet applied Python in my work, I’m excited to explore how it can be used in finance. I’m definitely looking forward to learning more and seeing how Python can enhance my work in the future.

How to Perform Sentiment Analysis in Python, 2023

Table of Contents

Setting Up the Environment

Data Collection

1.) Searching for News Articles

2.) Extracting and Storing Results

Web Scraping Full Article Text

1.) Storing and reloading the Results

Integrating Stock Data

Performing Sentiment Analysis on News Articles

1.) Data Prep

Merging Sentiment Analysis and Stock Data

Visualization

1.) Sentiment Polarity over Time

2.) Nvidia Closing Price Over Time

Conclusion

DISCLAIMER

*This information is for educational purposes only, and is not financial advice. Trading securities is risky, and can result in financial losses. Trade at your own risk.

Let customers speak for us

Country/region

Table of Contents

Setting Up the Environment

Data Collection

1.) Searching for News Articles

2.) Extracting and Storing Results

Web Scraping Full Article Text

1.) Storing and reloading the Results

Integrating Stock Data

Performing Sentiment Analysis on News Articles

1.) Data Prep

Merging Sentiment Analysis and Stock Data

Visualization

1.) Sentiment Polarity over Time

2.) Nvidia Closing Price Over Time

Conclusion

DISCLAIMER

*This information is for educational purposes only, and is not financial advice. Trading securities is risky, and can result in financial losses. Trade at your own risk.

Let customers speak for us