April 29, 2017, 8:32 p.m.

Multiple ways of generating data for data science

One way to generate a normal distribution for 10000 data points with mean 27000 standard deviation 150000, 

Using numpy: 

Generates numbers distributed normally.

import numpy as np

incomes = np.random.normal(27000, 15000, 10000)

Random numbers: 

uniformSkewed = np.random.rand(100)

Uniform distribution

from numpy import random, array

incomeCentroid = random.uniform(20000.0, 200000.0)

 

Generating uniformly distributed: 

x = np.arange(-3, 3, 0.001)

[-3.    -2.999 -2.998 ...,  2.997  2.998  2.999]

Generating ones:

data = np.ones(100)

[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]

 

print  np.arange(30)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29]

 

Using pylab

Random numbers:

from pylab import randn

X = randn(500)
Y = randn(500)
plt.scatter(X,Y)
plt.show()