lmari’s blog

Data analytics, machine learning & front end development

Scraping Financial Data with Selenium

f:id:lmari:20190520174728j:plain

Selenium with Python — Selenium Python Bindings 2 documentation

 

 Install Beautiful Soup 4

  • Followup: Unable to run in sublime text. Program ran in PyCharm
  • Read xpath

Getting FB stock data from yahoo finance

from bs4 import BeautifulSoup
import urllib.request

url = 'https://finance.yahoo.com/quote/FB?p=FB'

headers ={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36'}
req = urllib.request.Request(url, headers=headers)

resp = urllib.request.urlopen(req)

html = resp.read()

soup = BeautifulSoup(html, 'html.parser')

tagged_values= soup.find_all("td", {'class': 'Ta(end) Fw(600) Lh(14px)'})
print(tagged_values)

values = [x.get_text() for x in tagged_values]
for value in values:
print(value)

print('\n')

  • Can replace values = [x.get_text() for x in tagged_values] using normal for loop:

values =
for tv in tagged_values:
values.append(tv.get_text())

Finding the tagged_values

  • Ta(end) Fw(600) Lh(14px)

f:id:lmari:20190520183449j:plain

Getting FB, GOOG and AAPL stocks

  • Indent everything after for symbol in symbols:

from bs4 import BeautifulSoup
import urllib.request

symbols = ['FB', 'GOOG', 'AAPL']

for symbol in symbols:
url = 'https://finance.yahoo.com/quote/'+symbol+'?p+'+symbol

headers ={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36'}
req = urllib.request.Request(url, headers=headers)

resp = urllib.request.urlopen(req)

html = resp.read()

soup = BeautifulSoup(html, 'html.parser')

tagged_values= soup.find_all("td", {'class': 'Ta(end) Fw(600) Lh(14px)'})
print(tagged_values)

values = [x.get_text() for x in tagged_values]

for value in values:
print(value)

print('\n')

 

Get data with Selenium 

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

# example option: add 'incognito' command line arg to options
# option = webdriver.ChromeOptions()
# option.add_argument("--incognito")

# create new instance of chrome in incognito mode
browser = webdriver.Chrome(executable_path='C:\\Users\\Z\\chromedriver')

# go to website of interest
browser.get("https://finance.yahoo.com/quote/FB?p=FB")

# wait up to 10 seconds for page to load
#timeout = 10
#try:
# WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//a[@class='Fz(15px) D(ib) Td(inh)']")))
#except TimeoutException:
# print("Timed out waiting for page to load")
# browser.quit()


# get all of the titles for the financial values
titles_element = browser.find_elements_by_xpath("//td[@class='C(black) W(51%)']")
titles = [x.text for x in titles_element]
'''
WRITTEN AS A NORMAL FOR LOOP:
titles =
for x in titles_element:
titles.append(x.text)
'''
print('titles:')
print(titles)


# get all of the financial values themselves
values_element = browser.find_elements_by_xpath("//td[@class='Ta(end) Fw(600) Lh(14px)']")
values = [x.text for x in values_element] # same concept as for-loop/list-comprehension above
print('values:')
print(values, '\n')


# pair each title with its corresponding value using zip function and print each pair
for title, value in zip(titles, values):
print(title + ': ' + value)

  • zip display 'title' and 'value' as 2 columns next to each other

Previous Close: 186.99
Open: 184.84
Bid: 184.01 x 800
Ask: 184.84 x 2900
Day's Range: 184.28 - 187.58
52 Week Range: 123.02 - 218.62
Volume: 10,485,370
Avg. Volume: 17,041,224
Market Cap: 528.937B
Beta (3Y Monthly): 1.30
PE Ratio (TTM): 27.50
EPS (TTM): 6.74
Earnings Date: Jul 23, 2019 - Jul 29, 2019
Forward Dividend & Yield: N/A (N/A)
Ex-Dividend Date: N/A
1y Target Est: 221.50