lmari’s blog

Data analytics, machine learning & front end development

Pandas Time Series

 

Download csv from finance.yahoo.com/quote/AAPL/history

HK Stocks csv https://hk.finance.yahoo.com/quote/0012.HK/history?p=0012.HK

Financial info https://finance.yahoo.com/quote/AAPL/financials?p=AAPL

 

Lesson 1Lesson 2Lesson 3Lesson 4Lesson 5Lesson 6Lesson 7

 

1. Import Data

import pandas as pd

df = pd.read_csv("aapl.csv",parse_dates=["Date"], index_col="Date")

df.head(2)

 

  • Tip: Change dates to date time index
  • Verify by: type(df.Date[0]) as TimeStamp or df.index as DatetimeIndex
  • Partial dataframe
  • Parse: syntax analysis

If no dates imported

 

rng = pd.data_range(start='6/1/2017', end='6/30/2017', freq='B')

rng

  • B means business days
  • Verify with calendar to make sure dates are correct, exclude weekends
  • US holiday not excluded in B, need to import US calendar

Incorporate day in index in dataframe

df.set_index(rng, inplace=True)

df

 

Include weekends

  • Pad gather data on Fri and extend to Sat and Sunday
  • Can do M= month, Q=quarterly, W=weekly, H=hourly, see documentation

df.asfreq('D', method='pad')

 

 

2. Price and Charts

 

Particular date

df['2017-06-30']

 

Particular month

df['2017-01'].Close

 

Average stock price for particular month

df['2017-01'].Close.mean()

 

Range

df['2017-06-30':'2017-07-30'']

 

Retrieve monthly data (frequency, average price)

df.Close.resample('M').mean()

 

Plot weekly graph

  • M= month, Qquarterly, W=weekly, H=hourly

%matplotlib inline

df.Close.resample('W').mean().plot()

 

Bar Chart

%matplotlib inline

df['Close'].resample('M').mean().plot(kind='bar')

 

Daily Chart

df.Close.plot()

 

 

3. Holidays

Generate US holidays

 

from pandas.tseries.holiday import USFederalHolidayCalendar

from pandas.tseries.offsets import CustomBusinessDay

 

us_cal = CustomBusinessDay(calendar=USFederalHolidayCalendar())

 

rng = pd.date_range(start="7/1/2017",end="7/23/2017", freq=us_cal)

rng

 

 

Abstract holiday calendar

  • Date defined in 'My Birth Day' will be excluded from index
  • observance= holiday rules, see documentation

 

 

from pandas.tseries.holiday import AbstractHolidayCalendar, nearest_workday, Holiday

class myCalendar(AbstractHolidayCalendar):

    rules = [

        Holiday('My Birth Day', month=4, day=15),#, observance=nearest_workday),

    ]

    

my_bday = CustomBusinessDay(calendar=myCalendar())

pd.date_range('4/1/2017','4/30/2017',freq=my_bday)

 

 

For Egypt weekdays

 

b = CustomBusinessDay(weekmask="Sun Mon Tue Wed Thu", holidays=["2017-07-04"])

pd.date_range(start="7/1/2017", end="7/29/2017",freq=b)

 

Adding more holidays

b = CustomBusinessDay(holidays=['2017-07-04', '2017-07-10'], weekmask=egypt_weekdays)

pd.date_range(start="7/1/2017",periods=20,freq=b)

 

Generate 72 periods of data

rng= pd.date_range(start='1/1/2017', periods=72, freq='B')

rng

 

Generate random numbers

import numpy as np

ts = pd.Series(np.random.randint(1,10,len(rng)), index=rng)

ts.head(10)

 

4. Uniform Date Time Function

Handle invalid dates

  • garbage string 'abc' raise errors -----> pd.to_datetime(dates, errors='ignore')
  • converted 'abc' to 'NaT' -----> pd.to_datetime(dates, errors='coerce')
  • Epoch (unix time) - number of seconds passed since Jan 1 1970

pd.to_datetime(['2017-01-05', 'Jan 6, 2017', 'abc'], errors='coerce')

 

Convert dates to date time

import pandas as pd

dates = ['2017-01-05', 'Jan 5, 2017', '01/05/2017', '2017.01.05', '2017/01/05','20170105']

pd.to_datetime(dates)

 

Convert times to date time

dt = ['2017-01-05 2:30:00 PM', 'Jan 5, 2017 14:30:00', '01/05/2016', '2017.01.05', '2017/01/05','20170105']

pd.to_datetime(dt)

 

European style dates

pd.to_datetime('5-1-2016', dayfirst=True)

 

Custom date format

pd.to_datetime('2017$01$05', format='%Y$%m$%d')

 

5. Period and PeriodIndex

  • Timestamp - instant of time, e.g. 2:00PM
  • Timespan - duration, e.g. 2016
  • dir(y) shows all possible functions -----> y.start_time
  • Add 1 month -----> m + 1, add 1 day = d + 1 (automatically add 1 day for leap year)
  • Extra day ----> y.is_leap_year
  • Add 1 hour -----> h+1

 

import pandas as pd

y = pd.Period('2016')

y

 

Monthly period

m = pd.Period('2017-12')

m

 

Daily period

d = pd.Period('2016-02-28', freq='D')

d

 

Hourly period

h = pd.Period('2017-08-15 23:00:00',freq='H')

h

 

Add 5 hours

h+pd.offsets.Hour(5)

 

Quarterly period

q= pd.Period('2017Q1', freq='Q-JAN')

q

 

q2=pd.Period('2018Q2', freq='Q-JAN')

q2

 

q2-q

 

  • For Walmart, fiscal year can be different, use Q-JAN, can verify with functions:

 

q1.start_time

q1.end_time

 

 

Convert quarterly to month

q1.asfreq('M',how='start')

q1.asfreq('M',how='end')

 

Weekly period

w = pd.Period('2017-07-05',freq='W')

w

 

Show fiscal quarters of Walmart

r = pd.period_range('2011', '2017', freq='q')

r

 

r[0].start_time

r[0].end_time

 

r = pd.PeriodIndex(start='2016-01', freq='3M', periods=10)

r

 

import numpy as np

ps = pd.Series(np.random.randn(len(idx)), idx)

ps

 

Date time format

pst=ps.to_timestamp()

pst

 

pst.to_period()

pst

 

Walmart's financials

import pandas as pd

df = pd.read_csv("wmt.csv")

df

Columns to rows

df.set_index("Line Item",inplace=True)

df = df.T

df

 

df.index = pd.PeriodIndex(df.index, freq="Q-JAN")

df

 

Create additional start date column

df["Start Date"]=df.index.map(lambda x: x.start_time)

df

 

Create additional end date column

df["End Date"]=df.index.map(lambda x: x.end_time)

df

6. Timezones

  • 2 types of time zones in Python: Naive or time zone aware index
  • All time zones strings can be found in pytz, e.g. 'Asia/Hong_Kong'
  • Dateutil use time zones available on OS, prefer pytz
  • tz_localize

from pytz import all_timezones

 

print (all_timezones)


Intraday stock price

import pandas as pd

df = pd.read_csv("msft.csv", header=1,index_col='Date Time',parse_dates=True)

df

 

Convert naive date time to time zone aware

 

  • -4 is 4 hours behind UTC time zone

 

df = df.tz_localize(tz='US/Eastern')

df

 

df = df.tz_convert('Europe/Berlin')

df

 

df.index = df.index.tz_convert('Asia/Calcutta')

 

london = pd.date_range('3/6/2012 00:09:00', periods=10, freq='H',tz='Europe/London')

london

 

td = pd.date_range('3/6/2012 00:00', periods=10, freq='H',tz='dateutil/Europe/London')

td

 

Arithmetic between time zones

rng = pd.date_range(start="2017-08-22 09:00:00",periods=10, freq='30min')

s = pd.Series(range(10),index=rng)

s

 

Convert to Berlin time zone

b = s.tz_localize(tz="Europe/Berlin")

b

 

Convert to Calcutta time zone

m = s.tz_localize(tz="Asia/Calcutta")

m.index

 

Python will convert to UTC, then align dates

b + m

 

7. Shifting and Lagging

  • Shift date point to left or right hand side
  • Calculate price change in date

import pandas as pd

df = pd.read_csv("fb.csv",parse_dates=['Date'],index_col='Date')

df

 

df.shift(1)

 

df.shift(-1)

Price Change

df['Prev Day Price'] = df['Price'].shift(1)

df

 

One Day Change

df['Price Change'] = df['Price'] - df['Prev Day Price']

df

 

 

5 Day Return

df['5 day return'] =  (df['Price'] - df['Price'].shift(5))*100/df['Price'].shift(5)

df

 

 

Simplify data frame

df = df'Price'

df

 

Adjust Dates

 

  • Know frequency, e.g. business day -----> freq 'B'
  • Check freq has changed from None to 'B' -----> df.index

 

df.index = pd.date_range(start='2017-08-15',periods=10, freq='B')

df

 

df.tshift(1)