python - sparse matrix (python

◎前置概念

當我們在做資料檢索時，如tf-idf、term count中，會發現這些matrix中只含有少量的非0數值，若document、word數量及大時，將其矩陣完整的存下來會導致Memory占用過多，甚至不夠。
此時便可以使用稀疏矩陣，不將0數值也存下來。
使用numpy的lil_matrix配合tocsr，進行一系列運作

◎LIL（list of lists）

A = lil_matrix((4, 4))
A[0, 0] = 1
A[1, 2] = 1
A[1, 3] = 2
A[2, 3] = 1
print(A.data)
print(A.rows)

[list([1.0]) list([1.0, 2.0]) list([1.0]) list([])]
[list([0]) list([2, 3]) list([3]) list([])]

◎CSR（compressed sparse row）

B = A.tocsr()
print(B.data)
print(B.indices)
print(B.indptr)

[ 1.  1.  2.  1.]
[0 2 3 3]
[0 1 3 4 4]

◎LIL + CSR

wu的隨筆寫寫