如何直观理解卷积神经网络1D、2D、3D卷积

Stack Overflow上面有一个非常好的回答,这里做一下简单翻译,帮助英文不是很好的朋友。有问题欢迎留言。

↑↑↑↑↑ 1D Convolutions - Basic ↑↑↑↑↑

顾名思义,一维卷积意味着只会朝一个方向计算卷积

input = [W], filter = [k], output = [W],例如:input = [1,1,1,1,1], filter = [0.25,0.5,0.25], output = [1,1,1,1,1]

输出也只是个一维数组

上图的例子就是一个图像平滑的例子


tf.nn.conv1d code Toy Example

import tensorflow as tf 
import numpy as np 

sess = tf.Session() 

ones_1d = np.ones(5) 
weight_1d = np.ones(3) 
strides_1d = 1 

in_1d = tf.constant(ones_1d, dtype=tf.float32) 
filter_1d = tf.constant(weight_1d, dtype=tf.float32) 

in_width = int(in_1d.shape[0]) 
filter_width = int(filter_1d.shape[0]) 

input_1d   = tf.reshape(in_1d, [1, in_width, 1]) 
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1]) 
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME')) 
print sess.run(output_1d) 

↑↑↑↑↑ 2D Convolutions - Basic ↑↑↑↑↑

二维卷积意味着从两个方向计算卷积,输出也是个二维的矩阵,input = [W, H], filter = [k,k] output = [W,H]。


tf.nn.conv2d - Toy Example

ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

↑↑↑↑↑ 3D Convolutions - Basic ↑↑↑↑↑

三维卷积意味着从三个方向计算卷积,输出也是个3维的数组,input = [W,H,L], filter = [k,k,d] output = [W,H,M],这里的d<L才能构成三维卷积。


tf.nn.conv3d - Toy Example

ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_width, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)

↑↑↑↑↑ 2D Convolutions with 3D input ↑↑↑↑↑

尽管输入是三维的,输出却不是,是二维的,因为卷积核深度L等于输入深度L,这就导致卷积核在两个方向上做运算即可,产出的自然也是二维的。input = [W,H,L], filter = [k,k,L] output = [W,H],二维的。如果想要训练多个卷积核,那么产出的就是stacked 2D=3D=2DxN。

conv2d - LeNet, VGG, ... for 1 filter

in_channels = 32 # 3 for RGB, 32, 64, 128, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)


conv2d - LeNet, VGG, ... for N filters

in_channels = 32 # 3 for RGB, 32, 64, 128, ...
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)


↑↑↑↑↑ Bonus 1x1 conv in CNN  ↑↑↑↑↑

上图所示的是1x1的卷积核,输入是三维的,input = [W,H,L], filter = [1,1,L] output = [W,H],输出的形状就是3D=2DxN。如GoogLeNet。


tf.nn.conv2d - special case 1x1 conv

in_channels = 32 # 3 for RGB, 32, 64, 128, ...
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

Animation (2D Conv with 3D-inputs)


下面是1维卷积核对应二维输入:

谨记:卷积的维度可以认为是卷积计算方向

©2020 by EasyCSTech. Special thanks to IPinfo​ and EasyNote.