How to Perform Sentiment Analysis in Python, 2023

How to Perform Sentiment Analysis in Python, 2023

In the world of financial markets, the sentiment of a stock is as important as its quantitative metrics. Being able to understand public perception can provide insight into future price movements, market trends, and investment strategies. Nvidia, being a leading player in the tech industry, particularly in graphics processing units (GPUs) and AI technologies, presents a significant interest for investors, analysts, and traders alike. In this blog, we delve into the detailed process of collecting, processing, and analyzing news data for sentiment analysis regarding Nvidia's stocks.

Table of Contents

  1. Setting Up the Environment
  2. Data Collection
    1. Searching for News Articles
    2.  Extracting and Storing Results
  3. Web Scraping Full Article Text 
    1. Storing and Reloading the Results 
  4. Integrating Stock Data
  5. Performing Sentiment Analysis on News Articles 
    1. Data Prep
  6. Merging Sentiment Analysis with Stock Data 
  7. Visualization
    1. Sentiment Polarity Over Time
    2. Nvidia Closing Price Over Time 
  8. Conclusion 

 

Setting Up the Environment

Before diving into the coding, you need to ensure your environment is properly configured. This involves installing some essential libraries. Open your command prompt or terminal and run the following commands:


!pip install GoogleNews
!pip install fake-useragent
!pip install newspaper3k
  •         GoogleNews: Helps fetch news articles related to Nvidia from Google News.
  •         fake-useragent: Helps simulate real browser requests to scrape the news content.
  •         newspaper3k: A library for extracting articles from various news sources.

 

Data Collection

1.) Searching for News Articles

We begin by defining a keyword to search, and in this case, it's 'Nvidia'. We will fetch news articles related to this keyword using the GoogleNews library

 


from GoogleNews import GoogleNews
keyword = 'Nvidia'
all_results = []
for period in ['7']: # Collecting data for the last 7 days
	googlenews = GoogleNews(lang='en', region='US', period=period, encode='utf-8')
	googlenews.search(keyword)
	for page in range(1, 20): # Iterating through 20 pages of search results
  	  googlenews.get_page(page)
        all_results.extend(googlenews.results())
	googlenews.clear()

Here, we create a GoogleNews object with specific attributes such as language (English), region (US), and period (last 7 days). We then extend our search through multiple pages to collect a comprehensive set of news articles.

 

2.) Extracting and Storing Results

Next, we'll convert the results into a Pandas DataFrame, allowing for easy manipulation and analysis.


import pandas as pd
news_data_df = pd.DataFrame.from_dict(all_results)

This DataFrame contains valuable information such as title, date, description, link, and image of the news articles.

 

Web Scraping Full Article Text

With the links obtained, we need to scrape the full text of each article. We'll be using the newspaper3k library and simulating browser requests using fake-useragent.


import requests
from fake_useragent import UserAgent
from newspaper import fulltext 
ua = UserAgent()
news_data_df_with_text = []
for index, headers in news_data_df.iterrows():
	news_link = str(headers['link'])
	html = requests.get(news_link, headers={'User-Agent':ua.chrome}, timeout=5).text
	text = fulltext(html)
    news_data_df_with_text.append([news_title, news_media, news_update, news_timestamp, news_description, news_link, news_img, text])
#print news data head
news_data_df.head()


#print the tail of the data
news_data_df.tail()

 

Here, we iterate through each article link, send a GET request with a simulated user-agent, and extract the full text. The timeout ensures that the request doesn't hang indefinitely.

1.) Storing and reloading the Results

To facilitate future analysis, we store the collected data in a CSV file.


news_data_with_text_df = pd.DataFrame(news_data_df_with_text, columns=['Title', 'Media', 'Update', 'Timestamp', 'Description', 'Link', 'Image', 'Text'])
news_data_with_text_df.to_csv("./news_data_with_text.csv")
# Reload the saved news data content from a CSV file.
news_data_with_text_df1 = pd.read_csv("/content/news_data_with_text.csv", index_col=0)

Integrating Stock Data

We'll use the "OpenBB " library to fetch Nvidia's stock data, which can be installed using the following command:


!pip install openbb
import pandas as pd
from openbb_terminal.sdk import openbb
nvda_df = openbb.stocks.load(symbol = 'nvda')
nvda_df.head()
nvda_df.tail()

 

Here, nvda_df contains the stock information for Nvidia, including the opening, high, low, close, and volume.

Performing Sentiment Analysis on News Articles

Sentiment Analysis is the process of determining whether a piece of writing (in our case, news articles) is positive, negative, or neutral. We'll use the TextBlob library in Python, which provides a simple API for diving into common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, and sentiment analysis.

First, we'll install TextBlob using pip.


!pip install textblob

Then, you can calculate subjectivity and polarity by defining the following functions:e


from textblob import TextBlob
def getSubjectivity(text):
	return TextBlob(str(text)).sentiment.subjectivity
def getPolarity(text):
	return TextBlob(str(text)).sentiment.polarity
news_data_with_text_df1['Subjectivity'] = news_data_with_text_df1['Text'].apply(getSubjectivity)
news_data_with_text_df1['Polarity'] = news_data_with_text_df1['Text'].apply(getPolarity)

TextBlob has a pre-trained sentiment prediction function that returns a tuple representing polarity and subjectivity of a sentence.

  • Polarity: This is a float that lies between [-1,1], -1 indicates a negative sentiment and +1 indicating a positive sentiment.
  • Subjectivity: This is a float that lies in the range of [0,1]. Subjective sentences generally refer to personal opinion, emotion, or judgment, whereas objective refers to factual information.

1.) Data Prep

The next step involves handling the date, ensuring that it is in the correct format, and grouping the sentiment scores by date:


def check_date(date):
	try:
    	pd.to_datetime(date)
    	return True
	except ValueError:
    	return False
 
news_data_with_text_df1 = news_data_with_text_df1[news_data_with_text_df1["Timestamp"].apply(check_date)]
news_data_with_text_df1['Timestamp'] = pd.to_datetime(news_data_with_text_df1['Timestamp'])
news_data_with_text_df1['Date'] = news_data_with_text_df1['Timestamp'].dt.date

Merging Sentiment Analysis and Stock Data

We will combine the sentiment data with Nvidia's stock data, ensuring that the date ranges align:


merged_df = nvda_df.merge(News_df_daily, left_index=True, right_index=True, how='inner')

Visualization

1.) Sentiment Polarity over Time 

We can visualize how the polarity of news sentiment changes over time:

Positive Sentiment: A positive sentiment implies a favorable or optimistic view. Words like "good," "happy," and "excellent" contribute to a positive sentiment. In terms of polarity, positive sentiment is usually represented with values greater than 0, often on a scale from 0 to +1.

Negative Sentiment: Conversely, a negative sentiment indicates an unfavorable or pessimistic view. Words like "bad," "sad," or "terrible" contribute to a negative sentiment. Negative sentiment is usually represented with values less than 0, often on a scale from 0 to -1.

Neutral Sentiment: A neutral sentiment means that the text doesn't convey a particularly positive or negative view. This could include factual statements or content that doesn't express emotion. Neutral sentiment is usually represented with a value of 0.


import matplotlib.pyplot as plt
 
plt.figure(figsize=(10, 6))
plt.scatter(News_df_daily.index, News_df_daily['Polarity'], label='Polarity')
plt.plot(News_df_daily.index, News_df_daily['Polarity'], color='blue', alpha=0.5)
plt.title('Sentiment Polarity over Time')
plt.xlabel('Date')
plt.ylabel('Polarity')
plt.legend()
plt.grid(True)
plt.show()

 

Since all the polarity values fall within positive  range, it indicates a consistently positive sentiment, but not excessively so. The sentiment is likely to be mildly to moderately positive across the dataset. There may be differences in intensity, but they are all in the positive direction

2.) Nvidia Closing Price Over Time

Lastly, we can plot Nvidia's closing price to observe how it corresponds with the sentiment:


nvda_subset = nvda_df['2023-08-01':]
plt.figure(figsize=[15,7])
plt.plot(nvda_subset['Close'])
plt.title('NVIDIA Closing Price Over Time (from 2023-08-02)')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.grid(True)
plt.show()

 

Since the sentiment polarity is positive and neutral there has been a corresponding gradual increase in the stock price. This correlation may be attributed to various interconnected factors. Positive sentiment, reflecting optimistic views and emotions related to a specific company or market, can boost investor confidence and increase demand for the stock, leading to a rise in price. Positive public perception may also influence consumer behavior, translating into improved sales or other favorable financial outcomes that further bolster the stock price. Additionally, some investment strategies employing algorithmic trading might respond to positive sentiment by triggering buying actions, contributing to the price increase. However, it's essential to recognize that the relationship between positive sentiment and increasing stock price might also be influenced by underlying financial performance, broader market conditions, or other external factors. While the observed correlation between sentiment and price movement can be insightful, a comprehensive analysis is crucial to understand the intricate dynamics fully.

Conclusion

This comprehensive guide walks you through the process of performing sentiment analysis on Nvidia stocks by scraping news articles, calculating sentiment scores, integrating with stock data, and visualizing the results.

Understanding the relationship between public sentiment and stock price is an evolving field of study, with great potential for investors, researchers, and market analysts. Remember that the sentiment analysis only provides one perspective and should be used in conjunction with other metrics and market analyses to make more accurate predictions and decisions.

 

The complete code and for this project can be found on the PyFi GitHub page.

 

Written by Numan Yaqoob, PHD candidate

 

DISCLAIMER
*This information is for educational purposes only, and is not financial advice. Trading securities is risky, and can result in financial losses. Trade at your own risk.
Back to blog