[Online Review] Python

Life is short and I use python.


Built-in Functions

format()

input('words to pump'): use a variable to get the returned input

any(), all(): These logical functions are helpful tools to check whether any or all elements in a container are true.

enumerate(): Enumerate adds the ability to have an index that corresponds to each value in a collection and iterate over both. It makes a sequence of tuples that contain the index and the corresponding value.

isinstance(a,b): checks to see whether a is an instance of, or is of type b.

if __name__ == '__main__'

  • 当.py文件被直接运行时,if name == ‘main‘之下的代码块将被运行
  • 当.py文件以模块形式被导入时,if name == ‘main‘之下的代码块不被运行

sorted():could be used with pairs

1
2
pairs = [(2,7),(2,6),(1,4)]
print(sorted(pairs))

output:[(1, 4), (2, 6), (2, 7)]

list.drop(): 不能边for index 边delete list 里头的元素!!!!这样index会出错

Unit Testing

primes.py

1
2
3
4
5
6
7
def is_prime(number):
"""Return True if *number* is prime."""
for element in range(number):
if number % element == 0:
return False

return True

test_primes.py

1
2
3
4
5
6
7
8
9
10
11
12
import unittest
from primes import is_prime

class PrimesTestCase(unittest.TestCase):
"""Tests for `primes.py`."""

def test_is_five_prime(self):
"""Is five successfully determined to be prime?"""
self.assertTrue(is_prime(5))

if __name__ == '__main__':
unittest.main()

The file creates a unit test with a single test case: test_is_five_prime. Using Python’s built-in unittest framework, any member function whose name begins with test in a class deriving from unittest.TestCase will be run, and its assertions checked, when unittest.main() is called.

Computational Programming

Numpy

Array Features
a is a numpy array
a.ndim:
a.shape: (row, col)
a.dtype

len(a): length of 1st dim

np.unique(row_values, return_counts=False)
np.where(a==value): return an index that is a numpy array inside a tuple; to get the index number, use [0][0]

Creating arrays
np.arrange(a, b, step_length): [a,b)
np.linspace(a, b, num_of_steps): [a,b]

np.full(value, repeat): make an array of all the same value

Random number generation
np.random.seed(int)
np.random.rand(size): uniform
np.random.randn(size): nromal (gaussian)
np.random.randint(low, high, size): [low, high)

Statistics
np.percentile(x, percentile): Compute the percentile of x
hist, bin_edges = np.histogram(x, bins=10): hist : array, the values of the histogram; bin_edges : array, the correspondding bin edges

np.meshgrid(): Return coordinate matrices from coordinate vectors

1
2
3
4
nvalues = 5
xvalues = np.linspace(-4,4,nvalues)
yvalues = np.linspace(-4,4,nvalues)
np.meshgrid(xvalues,yvalues)

output:
[
array([[-4., -2., 0., 2., 4.],
[-4., -2., 0., 2., 4.],
[-4., -2., 0., 2., 4.],
[-4., -2., 0., 2., 4.],
[-4., -2., 0., 2., 4.]]),

array([[-4., -4., -4., -4., -4.],
[-2., -2., -2., -2., -2.],
[ 0., 0., 0., 0., 0.],
[ 2., 2., 2., 2., 2.],
[ 4., 4., 4., 4., 4.]])
]

Indexing and slicing
a[start:end:step]: to reverse a sequence, use a[::-1]
A slicing operation creates a view on the original array, which is just a way of accessing array data. Thus the original array is not copied in memory. When modifying the view, the original array is modified as well.
2D array:

Copy
a[::2].copy()

Matrix Operations
Element-wise multiplication: matrix1 * matrix2

Matrix multiplication:
matrix1 @ matrix2
matrix1.dot(matrix2)
np.dot(matrix1,matrix2)

Transpose: x.T

Outer Product: np.outer(x,x) so that you can do matrix multiplication for 2 vector, since row vector cannot be transposed in to col vector,
or we can use: x.reshape(row, 1) to force x into a col vector

np.cumsum(): cumulative sum

matplotlib

matplotlib is a plotting library for Python that builds heavily on NumPy. This package can be used to make publication-quality plots and graphics and is highly customizable. While matplotlib is one of the most well-established Python plotting libraries, there are a number of other plotting packages that you may find useful for your work:

  • Seaborn. Creates beautiful plots easily and is built on matplotlib
  • ggplot. For lovers of R’s plotting system, ggplot2, this library translates much of that functionality from R to Python. It’s based on the “Grammar of Graphics” theory of data visualization.

Pyplot tutorial

pandas

From 10 Minutes to pandas

Use date as index

1
2
3
4
5
6
7
8
9
10
11
In [45]: s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))

In [46]: s1
Out[46]:
2013-01-02 1
2013-01-03 2
2013-01-04 3
2013-01-05 4
2013-01-06 5
2013-01-07 6
Freq: D, dtype: int64

Features
df.dtypes
df.index(row), df.columns
df.values

Statistics
df.describe(): count, mean, std, min, 25%, 50%, 75%, max
df.mean(axis=0)

df.T: transpose
Sort
df.sort_index(axis=0, ascending=True): 0–row, 1–col
df.sort_values(by='col_name')

Selection
df.loc[:,['A','B']]: by column name
df.iloc[3:5,0:2]: by position

df.shift(): shift up / down data, NaN 补全
df.apply(function, axis=0)

pd.Series.value_counts()

str methods
pd.Series.str.lower()
pd.Series.str.len()

pd.Series.str.strip()
pd.Series.str.replace(old, new, case=None)
pd.Series.str.split(split_with): split str into list
pd.Series.str.cat(sep=None, na_rep=None): concatenate strs to a longer str
pd.Series.str.extract(regex)
pd.Series.str.contains(pat)

Merge
pd.concat([a, b], axis=0): concatenate rows(0)/ columns(1)
pd.merge(left, right, how='inner', on=None )
df.append(rows): append rows to a df

Group
df.groupby(by): can be a list; then we can do sum() or some other aggregations

Reshape
df.stack(): Stack the prescribed level(s) from columns to index.
df.unstack(): reverse operation

1
2
3
4
5
6
7
8
9
10
>>> df_single_level_cols
weight height
cat 0 1
dog 2 3

>>> df_single_level_cols.stack()
cat weight 0
height 1
dog weight 2
height 3

Pivot Tables

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
In [105]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
.....: 'B' : ['A', 'B', 'C'] * 4,
.....: 'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
.....: 'D' : np.random.randn(12),
.....: 'E' : np.random.randn(12)})
.....:

In [106]: df
Out[106]:
A B C D E
0 one A foo 1.418757 -0.179666
1 one B foo -1.879024 1.291836
2 two C foo 0.536826 -0.009614
3 three A bar 1.006160 0.392149
4 one B bar -0.029716 0.264599
5 one C bar -1.146178 -0.057409
6 two A foo 0.100900 -1.425638
7 three B foo -1.035018 1.024098
8 one C foo 0.314665 -0.106062
9 one A bar -0.773723 1.824375
10 two B bar -1.170653 0.595974
11 three C bar 0.648740 1.167115

In [107]: pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])

Out[107]:
C bar foo
A B
one A -0.773723 1.418757
B -0.029716 -1.879024
C -1.146178 0.314665
three A 1.006160 NaN
B NaN -1.035018
C 0.648740 NaN
two A NaN 0.100900
B -1.170653 NaN
C NaN 0.536826

Date Functionality

Categoricals

pd.set_option('float_format', '{:f}'.format): display entire number

Reference

Understanding Unit Testing
pandas-docs 0.23.3