1.Explicit copies via host
2.Zero-copy shared host array
3.Peer-to-peer memory copy
4.Peer-to-peer memory access
EX1:
cudaSetDevice(0);
cudaMemcpy(DM1,HM,n,D2H);
cudaSetDevice(1);
cudaMemcpy(HM,DM2,n,H2D);
or
cudaMemcpy(D1,D2,n,D2D)
EX2:
....
EX3:
....
EX4:
....