mainly here I'm providing jointly what I think could be good samples to be present in stream sdk showing how to use some advanced features of latest hardware (sorry some may have been asked for in some other questions by me):

*dual dma cayman usage example which show a nice perf improvement over a single dma card as evergreen

*a example showing concurrent kernels in cayman


*a example showing how to use zero copy (cpu mem acessible from gpu)

*microbench showing fusion improved bandwith between gpu and cpu vs discrete cards over pcie and also another microbench showing improved/reduced latency between cpu gpu improvements may be using some host sync between kernels..

*openvideo decode

*fusion efficient unified mem usage (maybe can be equal to zero copy sample)

*some example showing how to use fetch to lds to improvements in perf..