Sorry, I made a mistake, the codes that caused the problem is:
printf("Run kernel on GPU%d, taking source data from GPU%d and writing to GPU%d...\n",
gpuid, gpuid, gpuid);
SimpleKernel<<<blocks, threads>>>(g1, g0);
I've found the reason. This is because P2P of CUDA should disable IOMMU. The default configuration of my motherboard is in auto made and it caused the crash, but I just tried to enable IOMMU and made such a mistake again.
Sorry for disturbing you. Many thanks.
MODERATOR NOTE: I am glad to see you have resolved your issue. To help keep this post on topic and help other with a similar issue, I am locking this thread.