In [1]:
## Lecture 4, sparse matrix operations.

#= We are going to look at the Candyland matrix!

See some analysis online:
* <http://www.lscheffer.com/CandyLand.htm>
* <http://datagenetics.com/blog/december12011/index.html>

We are studying a slightly different set of rules
with respect to the sticky states. In one set
of rules (on those pages), a sticky state just holds
your character determinstically for one step. In our
set of rules, a sticky state holds you until you draw
a specific color. So if you see some differences,
that is why.
=#
In [2]:
## We have build the Markov transition matrix for you!
using DelimitedFiles
using SparseArrays
data = readdlm("candyland-matrix.csv",',')
TI = Int.(data[:,1])
TJ = Int.(data[:,2])
TV = data[:,3]
T = sparse(TI,TJ, TV, 140,140)
coords = readdlm("candyland-coords.csv",',')
x = coords[:,1]
y = coords[:,2]
cells = Int.(readdlm("candyland-cells.csv",','))
start,dest = cells[1:2]
Out[2]:
2-element Array{Int64,1}:
 140
 134
In [3]:
##
sum(T,dims=1)
Out[3]:
1×140 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  0.0  1.0  1.0  1.0  1.0  1.0  1.0
In [4]:
##
findall(vec(sum(T,dims=1)) .< 0.9)
Out[4]:
1-element Array{Int64,1}:
 134
In [5]:
## Let's see how Julia stores thigns
typeof(T)
Out[5]:
SparseMatrixCSC{Float64,Int64}
In [6]:
## What can we do with T?
fieldnames(typeof(T))
(:m, :n, :colptr, :rowval, :nzval)
Out[6]:
(:m, :n, :colptr, :rowval, :nzval)
In [7]:
##
T.colptr
Out[7]:
141-element Array{Int64,1}:
    1
   19
   37
   55
   73
   74
   92
  110
  128
  146
  164
  182
  200
    ⋮
 2210
 2220
 2229
 2237
 2244
 2244
 2245
 2246
 2249
 2252
 2255
 2273
In [8]:
##
T.nzval
Out[8]:
2272-element Array{Float64,1}:
 0.09375 
 0.09375 
 0.09375 
 0.09375 
 0.09375 
 0.09375 
 0.0625  
 0.015625
 0.0625  
 0.0625  
 0.046875
 0.046875
 0.0625  
 ⋮       
 0.0625  
 0.0625  
 0.015625
 0.0625  
 0.0625  
 0.046875
 0.046875
 0.015625
 0.015625
 0.015625
 0.015625
 0.015625
In [9]:
##
T.rowval
Out[9]:
2272-element Array{Int64,1}:
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
   ⋮
   7
   8
   9
  10
  11
  12
  13
  20
  42
  69
  92
 102
In [10]:
## Why sparse matrices?
# The first reason is memory!
print(varinfo(r"T"))
A = Array(T)
print(varinfo(r"A")) # so it takes about 10% of the memory!
| name |       size | summary                                |
|:---- | ----------:|:-------------------------------------- |
| T    | 36.758 KiB | 140×140 SparseMatrixCSC{Float64,Int64} |
| TI   | 17.789 KiB | 2272-element Array{Int64,1}            |
| TJ   | 17.789 KiB | 2272-element Array{Int64,1}            |
| TV   | 17.789 KiB | 2272-element Array{Float64,1}          |
| name |        size | summary                  |
|:---- | -----------:|:------------------------ |
| A    | 153.164 KiB | 140×140 Array{Float64,2} |
In [11]:
## The second reason is speed as you'll see on the homework.
In [12]:
## The downside is that some operations need a lot more care.
In [13]:
## Let's write our own matvec given the raw arrays
function mymatvec(TI,TJ,TV,x,m)
  y = zeros(m)
  for nzi in 1:length(TI)
    i = TI[nzi]
    j = TJ[nzi]
    v = TV[nzi]
    y[i] += v*x[j]
  end
  # A fancy way
  #for (i,j,v) in zip(TI,TJ,TV)
  #  y[i] += v*x[j]
  # end
  return y
end
x = randn(140)
mymatvec(TI,TJ,TV,x,140) == T*x
Out[13]:
true
In [14]:
##
y = mymatvec(TI,TJ,TV,x,140)
z = T*x
Out[14]:
140-element Array{Float64,1}:
  0.06371016857301236 
 -0.020296479972651882
 -0.1449625421168283  
 -0.24110135907572178 
 -0.36956743459549235 
 -0.36956743459549235 
 -0.5753582784096114  
 -0.6618273475359123  
 -0.07198691195557468 
 -0.2998071003106492  
 -0.36486315862614943 
 -0.2080346767048179  
 -0.200573176111561   
  ⋮                   
 -0.12191519280607    
 -0.16089240515977715 
 -0.23860064156506353 
 -0.49914478966491516 
 -0.5989586304958335  
 -5.107819912005116   
  0.023573325927709395
 -1.2306736433728827  
  1.4490833118500825  
  3.3468817404962423  
 -1.786957639795845   
  0.0                 
In [15]:
##
## Let's see how Julia stores thigns
typeof(T)
Out[15]:
SparseMatrixCSC{Float64,Int64}
In [16]:
## What can we do with T?
fieldnames(typeof(T))
Out[16]:
(:m, :n, :colptr, :rowval, :nzval)
In [17]:
##
T.colptr
Out[17]:
141-element Array{Int64,1}:
    1
   19
   37
   55
   73
   74
   92
  110
  128
  146
  164
  182
  200
    ⋮
 2210
 2220
 2229
 2237
 2244
 2244
 2245
 2246
 2249
 2252
 2255
 2273
In [18]:
##
T.nzval
Out[18]:
2272-element Array{Float64,1}:
 0.09375 
 0.09375 
 0.09375 
 0.09375 
 0.09375 
 0.09375 
 0.0625  
 0.015625
 0.0625  
 0.0625  
 0.046875
 0.046875
 0.0625  
 ⋮       
 0.0625  
 0.0625  
 0.015625
 0.0625  
 0.0625  
 0.046875
 0.046875
 0.015625
 0.015625
 0.015625
 0.015625
 0.015625
In [19]:
##
T.rowval
Out[19]:
2272-element Array{Int64,1}:
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
   ⋮
   7
   8
   9
  10
  11
  12
  13
  20
  42
  69
  92
 102
In [20]:
##
function cscmatvec(colptr,rowval,nzval,x,m)
  y = zeros(m)
  for j=1:length(colptr)-1 # for each column ...
    for nzi=colptr[j]:colptr[j+1]-1 # for each entry in the column
      i = rowval[nzi]
      v = nzval[nzi]
      y[i] += v*x[j]
    end
  end
  return y
end
cscmatvec(T.colptr,T.rowval,T.nzval,x,140) == T*x
Out[20]:
true
In [21]:
## Note that getting a random column in T is fast
# Note, run this a few times, the first time
# in Julia tends to be unreliable.
@time T[:,(rand(1:size(T,2)))]
  0.062172 seconds (94.55 k allocations: 4.546 MiB)
Out[21]:
140-element SparseVector{Float64,Int64} with 18 stored entries:
  [9  ]  =  0.015625
  [20 ]  =  0.015625
  [34 ]  =  0.09375
  [35 ]  =  0.09375
  [36 ]  =  0.09375
  [37 ]  =  0.09375
  [38 ]  =  0.09375
  [39 ]  =  0.09375
  [40 ]  =  0.0625
  [41 ]  =  0.0625
  [42 ]  =  0.015625
  [43 ]  =  0.0625
  [44 ]  =  0.046875
  [45 ]  =  0.046875
  [46 ]  =  0.0625
  [69 ]  =  0.015625
  [92 ]  =  0.015625
  [102]  =  0.015625
In [22]:
## But getting a random row in T is slow
@time T[rand(1:size(T,1)),:]
  0.035623 seconds (90.38 k allocations: 4.309 MiB)
Out[22]:
140-element SparseVector{Float64,Int64} with 12 stored entries:
  [3  ]  =  0.0625
  [4  ]  =  0.0625
  [6  ]  =  0.0625
  [7  ]  =  0.0625
  [8  ]  =  0.0625
  [9  ]  =  0.0625
  [10 ]  =  0.09375
  [11 ]  =  0.09375
  [12 ]  =  0.09375
  [13 ]  =  0.09375
  [14 ]  =  0.09375
  [15 ]  =  0.09375
In [23]:
## Okay, maybe we need a bigger matrix
S = sprandn(100000,100000,10/100000)
Out[23]:
100000×100000 SparseMatrixCSC{Float64,Int64} with 999810 stored entries:
  [129   ,      1]  =  0.108705
  [1900  ,      1]  =  0.462637
  [7991  ,      1]  =  -1.63828
  [33105 ,      1]  =  -0.823354
  [45736 ,      1]  =  0.661684
  [53615 ,      1]  =  -1.02866
  [62475 ,      1]  =  1.16481
  [65082 ,      1]  =  -0.929257
  [19815 ,      2]  =  0.167259
  [20866 ,      2]  =  -0.609031
  [36606 ,      2]  =  -0.170267
  [39989 ,      2]  =  0.782436
  ⋮
  [74261 ,  99999]  =  -1.28521
  [80342 ,  99999]  =  -0.323656
  [98510 ,  99999]  =  -0.171019
  [5770  , 100000]  =  0.382809
  [8095  , 100000]  =  -0.0996605
  [46027 , 100000]  =  -0.288339
  [51430 , 100000]  =  -0.249912
  [62715 , 100000]  =  2.815
  [65748 , 100000]  =  0.593881
  [73706 , 100000]  =  0.174726
  [85387 , 100000]  =  0.872983
  [86428 , 100000]  =  -0.809704
In [24]:
##Note that getting a random column in S is fast
@time S[:,rand(1:size(T,2))]
  0.000009 seconds (8 allocations: 608 bytes)
Out[24]:
100000-element SparseVector{Float64,Int64} with 13 stored entries:
  [163   ]  =  -0.524153
  [7936  ]  =  -0.883583
  [10649 ]  =  -1.11708
  [31275 ]  =  1.38168
  [49239 ]  =  0.565219
  [50468 ]  =  -1.76461
  [53327 ]  =  -1.61342
  [56239 ]  =  0.125838
  [63867 ]  =  -1.16887
  [70094 ]  =  0.98508
  [75105 ]  =  0.395899
  [89210 ]  =  -0.126298
  [94645 ]  =  -0.24908
In [25]:
## But getting a random row in S is slow
@time S[rand(1:size(T,1)),:]
  0.000696 seconds (12 allocations: 640 bytes)
Out[25]:
100000-element SparseVector{Float64,Int64} with 5 stored entries:
  [20540 ]  =  0.811427
  [34302 ]  =  0.468191
  [39495 ]  =  1.90373
  [65376 ]  =  0.181673
  [84593 ]  =  0.163289
In [26]:
## Also note that there is NO way we could possibly store S with dense
length(S)*8/1e9 # it would take 80GB of memory, which is 10x my Mac.
Out[26]:
80.0
In [27]:
## And we can go bigger
S = sprandn(10000000,10000000,10/10000000)
Out[27]:
10000000×10000000 SparseMatrixCSC{Float64,Int64} with 100009625 stored entries:
  [170016  ,        1]  =  0.95618
  [3709434 ,        1]  =  0.25972
  [4726074 ,        1]  =  0.479413
  [7326154 ,        1]  =  -0.283622
  [8271451 ,        1]  =  -0.914915
  [9520023 ,        1]  =  -0.808275
  [9873819 ,        1]  =  0.776249
  [61839   ,        2]  =  1.2041
  [2144528 ,        2]  =  -1.72693
  [3336902 ,        2]  =  -0.226241
  [3913567 ,        2]  =  -0.101088
  [4952938 ,        2]  =  1.16258
  ⋮
  [5174587 ,  9999999]  =  -0.571736
  [5471468 ,  9999999]  =  0.467741
  [5608229 ,  9999999]  =  -0.752793
  [6231690 ,  9999999]  =  -0.0658563
  [7843050 ,  9999999]  =  -0.416478
  [9594745 ,  9999999]  =  0.658695
  [3660754 , 10000000]  =  -1.00541
  [4517612 , 10000000]  =  1.05634
  [5718072 , 10000000]  =  -2.43842
  [6131162 , 10000000]  =  0.317956
  [6630531 , 10000000]  =  -0.885944
  [9695588 , 10000000]  =  -0.429376
In [28]:
##Note that getting a random column in S is fast
@time S[:,rand(1:size(T,2))]
  0.000033 seconds (8 allocations: 544 bytes)
Out[28]:
10000000-element SparseVector{Float64,Int64} with 9 stored entries:
  [529634  ]  =  1.73964
  [1212368 ]  =  -1.27239
  [4714160 ]  =  0.0157187
  [4953764 ]  =  -2.49355
  [5070833 ]  =  0.183744
  [6301816 ]  =  1.73803
  [6401559 ]  =  1.25495
  [8062826 ]  =  1.2552
  [9717677 ]  =  -0.927608
In [29]:
## But getting a random row in S is slow
@time S[rand(1:size(T,1)),:]
  0.078867 seconds (14 allocations: 928 bytes)
Out[29]:
10000000-element SparseVector{Float64,Int64} with 15 stored entries:
  [296194  ]  =  0.606406
  [851782  ]  =  0.354075
  [987870  ]  =  0.431335
  [1874937 ]  =  -1.73482
  [2676969 ]  =  1.79189
  [4473983 ]  =  -0.163811
  [5068187 ]  =  -0.71602
  [5427265 ]  =  0.109177
  [5605972 ]  =  0.782018
  [5970655 ]  =  -0.221657
  [7003632 ]  =  -0.600549
  [8186007 ]  =  -1.02635
  [8822910 ]  =  0.232833
  [9017437 ]  =  -0.365962
  [9885553 ]  =  -0.302026