记录一些常用的API

Tensorflow

tf.title()

tf.tile()应用于需要张量扩展的场景，具体说来就是：
如果现有一个形状如[width, height]的张量，需要得到一个基于原张量的，形状如[batch_size,width,height]的张量，其中每一个batch的内容都和原张量一模一样。tf.tile使用方法如：

import tensorflow as tf
raw = tf.Variable(tf.random_normal(shape=(1, 3, 2)))
multi = tf.tile(raw, multiples=[2, 1, 1])
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(raw.eval())
    print('-----------------------------')
    print(sess.run(multi))
'''
[[[-0.50027871 -0.48475555]
  [-0.52617502 -0.2396145 ]
  [ 1.74173343 -0.20627949]]]
-----------------------------
[[[-0.50027871 -0.48475555]
  [-0.52617502 -0.2396145 ]
  [ 1.74173343 -0.20627949]]
 [[-0.50027871 -0.48475555]
  [-0.52617502 -0.2396145 ]
  [ 1.74173343 -0.20627949]]]
'''

tf.reduce_max()

tf.reduce_max函数的作用：计算张量的各个维度上的元素的最大值。例子:

import tensorflow as tf
max_value = tf.reduce_max([1, 3, 2])
with tf.Session() as sess:
    max_value = sess.run(max_value)
    print(max_value)
   	
'''
3
'''

tf.sequence_mask

的作用是构建序列长度的mask标志。例子：

import tensorflow as tf
mask = tf.sequence_mask([1, 3, 2], 5)
with tf.Session() as sess:
    mask = sess.run(mask)
    print(mask)

1
2
3

[[ True False False False False]
 [ True  True  True False False]
 [ True  True False False False]]

tf.where

tf.where(
    condition,
    x=None,
    y=None,
    name=None
)

Counter类

Counter类的目的是用来跟踪值出现的次数。它是一个无序的容器类型，以字典的键值对形式存储，其中元素作为key，其计数作为value。计数值可以是任意的Interger（包括0和负数）。Counter类和其他语言的bags或multisets很相似。Counter类可以很方便的对文本进行单词计数并得出最大的n个词。

sum(c.values())  # 所有计数的总数
c.clear()  # 重置Counter对象，注意不是删除
list(c)  # 将c中的键转为列表
set(c)  # 将c中的键转为set
dict(c)  # 将c中的键值对转为字典
c.items()  # 转为(elem, cnt)格式的列表
Counter(dict(list_of_pairs))  # 从(elem, cnt)格式的列表转换为Counter类对象
c.most_common()[:-n:-1]  # 取出计数最少的n-1个元素
c += Counter()  # 移除0和负值

enumerate()说明

list1 = ["这", "是", "一个", "测试"]
for index, item in enumerate(list1):
    print index, item
>>>
0 这
1 是
2 一个
3 测试
# 遍历大文件
count = 0
for index, line in enumerate(open(filepath,'r'))： 
    count += 1

Scipy

Scipy是一个科学计算库

scipy.sparse.hstack

>>> from scipy.sparse import coo_matrix, hstack
>>> A = coo_matrix([[1, 2], [3, 4]])
>>> B = coo_matrix([[5], [6]])
>>> hstack([A,B]).toarray()
array([[1, 2, 5],
       [3, 4, 6]])

Numpy

stack

import numpy as np
a=[[1,2,3,4],
   [5,6,7,8],
   [9,10,11,12]]
print("列表a如下：")
print(a)
print("增加一维，新维度的下标为0")
c=np.stack(a,axis=0)
print(c)
print("增加一维，新维度的下标为1")
c=np.stack(a,axis=1)
print(c)
输出：
列表a如下：
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
增加一维，新维度的下标为0
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
增加一维，新维度的下标为1
[[ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [ 4  8 12]]
import numpy as np
a=[[1,2,3],
   [4,5,6]]
b=[[1,2,3],
   [4,5,6]]
c=[[1,2,3],
   [4,5,6]]
print("a=",a)
print("b=",b)
print("c=",c)
print("增加一维，新维度的下标为0")
d=np.stack((a,b,c),axis=0)
print(d)
print("增加一维，新维度的下标为1")
d=np.stack((a,b,c),axis=1)
print(d)
print("增加一维，新维度的下标为2")
d=np.stack((a,b,c),axis=2)
print(d)
输出：
('a=', [[1, 2, 3], [4, 5, 6]])
('b=', [[1, 2, 3], [4, 5, 6]])
('c=', [[1, 2, 3], [4, 5, 6]])
增加一维，新维度的下标为0
[[[1 2 3]
  [4 5 6]]
 [[1 2 3]
  [4 5 6]]
 [[1 2 3]
  [4 5 6]]]
增加一维，新维度的下标为1
[[[1 2 3]
  [1 2 3]
  [1 2 3]]
 [[4 5 6]
  [4 5 6]
  [4 5 6]]]
增加一维，新维度的下标为2
[[[1 1 1]
  [2 2 2]
  [3 3 3]]
 [[4 4 4]
  [5 5 5]
  [6 6 6]]]

hstack()

import numpy as np
a=[1,2,3]
b=[4,5,6]
print(np.hstack((a,b)))
输出：[1 2 3 4 5 6 ]
import numpy as np
a=[[1],[2],[3]]
b=[[1],[2],[3]]
c=[[1],[2],[3]]
d=[[1],[2],[3]]
print(np.hstack((a,b,c,d)))
输出：
[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]

vstack()

import numpy as np
a=[1,2,3]
b=[4,5,6]
print(np.vstack((a,b)))
输出：
[[1 2 3]
 [4 5 6]]
import numpy as np
a=[[1],[2],[3]]
b=[[1],[2],[3]]
c=[[1],[2],[3]]
d=[[1],[2],[3]]
print(np.vstack((a,b,c,d)))
输出：
[[1]
 [2]
 [3]
 [1]
 [2]
 [3]
 [1]
 [2]
 [3]
 [1]
 [2]
 [3]]

squeeze()

## 删除
'''
 1）a表示输入的数组；
 2）axis用于指定需要删除的维度，但是指定的维度必须为单维度，否则将会报错；
 3）axis的取值可为None 或 int 或 tuple of ints, 可选。若axis为空，则删除所有单维度的条目；
 4）返回值：数组
 5) 不会修改原数组；
 作用：从数组的形状中删除单维度条目，即把shape中为1的维度去掉
'''
>>> x = np.array([[[0], [1], [2]]])
>>> x.shape
(1, 3, 1)
>>> np.squeeze(x).shape
(3,)
>>> np.squeeze(x, axis=0).shape
(3, 1)
>>> np.squeeze(x, axis=1).shape
Traceback (most recent call last):
...
ValueError: cannot select an axis to squeeze out which has size not equal to one
>>> np.squeeze(x, axis=2).shape
(1, 3)

argsort()

# 从中可以看出argsort函数返回的是数组值从小到大的索引值
>>> x = np.array([3, 1, 2])
>>> np.argsort(x) #按升序排列
array([1, 2, 0])
>>> np.argsort(-x) #按降序排列
array([0, 2, 1])
>>> x[np.argsort(x)] #通过索引值排序后的数组
array([1, 2, 3])
>>> x[np.argsort(-x)]
array([3, 2, 1])
另一种方式实现按降序排序：
>>> a = x[np.argsort(x)]
>>> a
array([1, 2, 3])
>>> a[::-1]
array([3, 2, 1])

Pandas

cumsum

#http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.cumsum.html
>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64
    
>>> s.cumsum()
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64
    
 >>> s.cumsum(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
    
>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0
>>> df.cumsum()
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0
>>> df.cumsum(axis=1)
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0