Download csv from finance.yahoo.com/quote/AAPL/history
HK Stocks csv https://hk.finance.yahoo.com/quote/0012.HK/history?p=0012.HK
Financial info https://finance.yahoo.com/quote/AAPL/financials?p=AAPL
Lesson 1, Lesson 2, Lesson 3, Lesson 4, Lesson 5, Lesson 6, Lesson 7
1. Import Data
import pandas as pd
df = pd.read_csv("aapl.csv",parse_dates=["Date"], index_col="Date")
df.head(2)
- Tip: Change dates to date time index
- Verify by: type(df.Date[0]) as TimeStamp or df.index as DatetimeIndex
- Partial dataframe
- Parse: syntax analysis
If no dates imported
rng = pd.data_range(start='6/1/2017', end='6/30/2017', freq='B')
rng
- B means business days
- Verify with calendar to make sure dates are correct, exclude weekends
- US holiday not excluded in B, need to import US calendar
Incorporate day in index in dataframe
df.set_index(rng, inplace=True)
df
Include weekends
- Pad gather data on Fri and extend to Sat and Sunday
- Can do M= month, Q=quarterly, W=weekly, H=hourly, see documentation
df.asfreq('D', method='pad')
2. Price and Charts
Particular date
df['2017-06-30']
Particular month
df['2017-01'].Close
Average stock price for particular month
df['2017-01'].Close.mean()
Range
df['2017-06-30':'2017-07-30'']
Retrieve monthly data (frequency, average price)
df.Close.resample('M').mean()
Plot weekly graph
- M= month, Qquarterly, W=weekly, H=hourly
%matplotlib inline
df.Close.resample('W').mean().plot()
Bar Chart
%matplotlib inline
df['Close'].resample('M').mean().plot(kind='bar')
Daily Chart
df.Close.plot()
3. Holidays
Generate US holidays
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
us_cal = CustomBusinessDay(calendar=USFederalHolidayCalendar())
rng = pd.date_range(start="7/1/2017",end="7/23/2017", freq=us_cal)
rng
Chinese holiday calendar https://github.com/adyliu/chinese-holiday/blob/master/holiday.py
Abstract holiday calendar
- Date defined in 'My Birth Day' will be excluded from index
- observance= holiday rules, see documentation
from pandas.tseries.holiday import AbstractHolidayCalendar, nearest_workday, Holiday
class myCalendar(AbstractHolidayCalendar):
rules = [
Holiday('My Birth Day', month=4, day=15),#, observance=nearest_workday),
]
my_bday = CustomBusinessDay(calendar=myCalendar())
pd.date_range('4/1/2017','4/30/2017',freq=my_bday)
For Egypt weekdays
b = CustomBusinessDay(weekmask="Sun Mon Tue Wed Thu", holidays=["2017-07-04"])
pd.date_range(start="7/1/2017", end="7/29/2017",freq=b)
Adding more holidays
b = CustomBusinessDay(holidays=['2017-07-04', '2017-07-10'], weekmask=egypt_weekdays)
pd.date_range(start="7/1/2017",periods=20,freq=b)
Generate 72 periods of data
rng= pd.date_range(start='1/1/2017', periods=72, freq='B')
rng
Generate random numbers
import numpy as np
ts = pd.Series(np.random.randint(1,10,len(rng)), index=rng)
ts.head(10)
4. Uniform Date Time Function
Handle invalid dates
- garbage string 'abc' raise errors -----> pd.to_datetime(dates, errors='ignore')
- converted 'abc' to 'NaT' -----> pd.to_datetime(dates, errors='coerce')
- Epoch (unix time) - number of seconds passed since Jan 1 1970
pd.to_datetime(['2017-01-05', 'Jan 6, 2017', 'abc'], errors='coerce')
Convert dates to date time
import pandas as pd
dates = ['2017-01-05', 'Jan 5, 2017', '01/05/2017', '2017.01.05', '2017/01/05','20170105']
pd.to_datetime(dates)
Convert times to date time
dt = ['2017-01-05 2:30:00 PM', 'Jan 5, 2017 14:30:00', '01/05/2016', '2017.01.05', '2017/01/05','20170105']
pd.to_datetime(dt)
European style dates
pd.to_datetime('5-1-2016', dayfirst=True)
Custom date format
pd.to_datetime('2017$01$05', format='%Y$%m$%d')
5. Period and PeriodIndex
- Timestamp - instant of time, e.g. 2:00PM
- Timespan - duration, e.g. 2016
- dir(y) shows all possible functions -----> y.start_time
- Add 1 month -----> m + 1, add 1 day = d + 1 (automatically add 1 day for leap year)
- Extra day ----> y.is_leap_year
- Add 1 hour -----> h+1
import pandas as pd
y = pd.Period('2016')
y
Monthly period
m = pd.Period('2017-12')
m
Daily period
d = pd.Period('2016-02-28', freq='D')
d
Hourly period
h = pd.Period('2017-08-15 23:00:00',freq='H')
h
Add 5 hours
h+pd.offsets.Hour(5)
Quarterly period
q= pd.Period('2017Q1', freq='Q-JAN')
q
q2=pd.Period('2018Q2', freq='Q-JAN')
q2
q2-q
- For Walmart, fiscal year can be different, use Q-JAN, can verify with functions:
q1.start_time
q1.end_time
Convert quarterly to month
q1.asfreq('M',how='start')
q1.asfreq('M',how='end')
Weekly period
w = pd.Period('2017-07-05',freq='W')
w
Show fiscal quarters of Walmart
r = pd.period_range('2011', '2017', freq='q')
r
r[0].start_time
r[0].end_time
r = pd.PeriodIndex(start='2016-01', freq='3M', periods=10)
r
import numpy as np
ps = pd.Series(np.random.randn(len(idx)), idx)
ps
Date time format
pst=ps.to_timestamp()
pst
pst.to_period()
pst
Walmart's financials
import pandas as pd
df = pd.read_csv("wmt.csv")
df
Columns to rows
df.set_index("Line Item",inplace=True)
df = df.T
df
df.index = pd.PeriodIndex(df.index, freq="Q-JAN")
df
Create additional start date column
df["Start Date"]=df.index.map(lambda x: x.start_time)
df
Create additional end date column
df["End Date"]=df.index.map(lambda x: x.end_time)
df
6. Timezones
- 2 types of time zones in Python: Naive or time zone aware index
- All time zones strings can be found in pytz, e.g. 'Asia/Hong_Kong'
- Dateutil use time zones available on OS, prefer pytz
- tz_localize
from pytz import all_timezones
print (all_timezones)
Intraday stock price
import pandas as pd
df = pd.read_csv("msft.csv", header=1,index_col='Date Time',parse_dates=True)
df
Convert naive date time to time zone aware
- -4 is 4 hours behind UTC time zone
df = df.tz_localize(tz='US/Eastern')
df
df = df.tz_convert('Europe/Berlin')
df
df.index = df.index.tz_convert('Asia/Calcutta')
london = pd.date_range('3/6/2012 00:09:00', periods=10, freq='H',tz='Europe/London')
london
td = pd.date_range('3/6/2012 00:00', periods=10, freq='H',tz='dateutil/Europe/London')
td
Arithmetic between time zones
rng = pd.date_range(start="2017-08-22 09:00:00",periods=10, freq='30min')
s = pd.Series(range(10),index=rng)
s
Convert to Berlin time zone
b = s.tz_localize(tz="Europe/Berlin")
b
Convert to Calcutta time zone
m = s.tz_localize(tz="Asia/Calcutta")
m.index
Python will convert to UTC, then align dates
b + m
7. Shifting and Lagging
- Shift date point to left or right hand side
- Calculate price change in date
import pandas as pd
df = pd.read_csv("fb.csv",parse_dates=['Date'],index_col='Date')
df
df.shift(1)
df.shift(-1)
Price Change
df['Prev Day Price'] = df['Price'].shift(1)
df
One Day Change
df['Price Change'] = df['Price'] - df['Prev Day Price']
df
5 Day Return
df['5 day return'] = (df['Price'] - df['Price'].shift(5))*100/df['Price'].shift(5)
df
Simplify data frame
df = df'Price'
df
Adjust Dates
- Know frequency, e.g. business day -----> freq 'B'
- Check freq has changed from None to 'B' -----> df.index
df.index = pd.date_range(start='2017-08-15',periods=10, freq='B')
df
df.tshift(1)