April 29, 2017, 8:29 p.m.

Python Cheat Sheet for data Science

Questions: use of % in python

 

 

General Python functions

Map

filters

list comprehensions

[item**3 for item in x if item%2==0]

item for item in x

lambda

 

Numpy

mean()

sum()

max()

min()

std() [this is for standard deviation]

filter without having to write the syntax, for example x[x%2==0] returns the list of numbers in array x which are divisible by 2

np.arange(10,-1,-1) this makes a arithemetic progression np.arange(starting num,final num,difference in steps)

linspace     

.astype('type')

np.random.random_integers(0,50,20) [this gives 20 integers from 0 to 50]

array_name*2 [works because of vectorization]

you can vectorize a function by np.vectorize(functionName)

reshape(no. of rows, nu of columna)

.shape

.size

.ndim

.seed()

.transpose()

.flatten()

.ravel()

.cumprod()

.concatenate(the arrays)

.dstack(srrays)

.hsplit()

.vsplit()

 

 

Pandas functions

 

shape

head()

.index (indexes based on a array)

.series(what:what)   builds a series 

data[(data<10)&(data>5)] 

data[data<10][data>5]

sum(data>5)

ignores NAN values in case of mean, sum etc functions

.ffill

.bfil

.fillna

fill_value

 to create a data frame all arrays must be of the same length

.sort_index()

Create a Database pd.DataFrame({'istColumnName':arrayName,'2ndColumnName':arrayName...}) Note: all arrays should be of the same length

.sort('columnName')

.describe()

pd.read_csv('file name')

.set_index

inplace=true (changes the database instead of returning a new one

letters.index = pd.date_range('9/1/2012',periods=26) WHEN CHANING INDEX

letters = pd.DataFrame({'lowercase':lcase, 'uppercase':upcase})  defining column names while creating a dataframe

.pop(_columnName_)

df = pd.read_csv('../data/raw_running_data.csv', parse_dates=['Date']) (to ensure a particular column is parsed in a particulat format

http://i.stack.imgur.com/GbJ7N.png (data frame joins)

_________________________________

cols = ['Date', 'Miles', 'Time']

df.columns = cols

this changes the name of columns in df

______________________________________

strftime (to find the day of the week for a date)

 

Note: we cant apply location based lookups in the normal format if our index is a series of mumbers. but we can use functions like .iget(_) or .iloc(_)