花一点内存可以加速六倍左右吧
Img = randn(1e3, 1e3);
[h, v] = size(Img);
F = rand(20, 10);
[F_h, F_v] = size(F);
tic
fI = zeros(h-F_h+1, v-F_v+1);
for i=1:(h-F_h+1)
for j=1:(v-F_v+1)
R = Img(i:i+F_h-1, j:j+F_v-1);
fI(i,j) = sum(sum(R .* F));
end
end
toc
% Elapsed time is 5.834513 seconds.
as_vector = @(x) x(:);
tic
index_1 = as_vector(repmat(1:F_h, h-F_h+1, 1)' + repmat(0:(h-F_h), F_h, 1));
expand_F = repmat(F, size(index_1,1) / F_h, 1);
fI2 = zeros(h-F_h+1, v-F_v+1);
for i = 1:(v-F_v+1)
fI2(:, i) = sum(reshape(sum(Img(index_1, i:i+F_v-1) .* expand_F, 2), ...
F_h, []))';
end
toc
% Elapsed time is 0.979537 seconds.
tic
index_2 = as_vector(repmat(1:F_v, v-F_v+1, 1)' + repmat(0:(v-F_v), F_v, 1));
expand_F = repmat(F, 1, size(index_2,1) / F_v);
fI3 = zeros(h-F_h+1, v-F_v+1);
for i = 1:(h-F_h+1)
fI3(i, :) = sum(reshape(sum(Img(i:i+F_h-1, index_2) .* expand_F), ...
F_v, []));
end
toc
% Elapsed time is 0.947438 seconds.
all(all(abs(fI - fI2) < 1e-4)) % true
all(all(abs(fI - fI3) < 1e-4)) % true
如果空间全部展开会变得很慢,不建议
展开一个空间让循环减少一层,是可以快满多的
至于展开比较短还是比较长的部分,我这里测试差不多
一般而言,循环比较少次会比较快
这里只差10次,所以感觉不出来
原PO可以多试试看
另外,根据我的测试,F的大小影响速度非常大
F越大,两层循环跟一层循环的差异越小
我推测是复制的时间会随着F的大小而增加较多
PS: blockproc在这里很慢,不建议用
blockproc:
tic
index_2 = as_vector(repmat(1:F_v, v-F_v+1, 1)' + repmat(0:(v-F_v), F_v, 1));
fun = @(block_struct) sum(sum(block_struct.data .* F));
fI4 = zeros(h-F_h+1, v-F_v+1);
for i = 1:(h-F_h+1)
fI4(i, :) = blockproc(Img(i:i+F_h-1, index_2),[F_h, F_v],fun);
end
toc
% Elapsed time is 129.542762 seconds.
all(all(abs(fI - fI4) < 1e-4)) % true
※ 引述《Portentera (SupP)》之铭言:
: 这是一个影像滤波的循环
: Img is input image. (h, v) is image isze.
: F is filter matrix. (F_h, F_v) is filter size.
: fI is filtering image.
: for i=1:h
: for j=1:v
: R = Img( i:i+F_h, j:j+F_v );
: fI(i,j) = sum(sum(R .* F));
: end
: end
: 刚开始学习用Matlab,只会使用for循环解决问题;
: 想学习如何改写成矩阵运算,感谢大大们解惑!