With -r address translation (AT) is disabled, i.e. you can't use 3D streams or 1D streams of size > 8192.
I think with Catalyst 9.5, AT (only 1D streams > 8192) is not working becuase of a regression. But, you can go back to previous versions to make it work.
AT has some performance overhead in terms of some extra ALU operations in kernel (calculation to convert 1D stream address to 2D buffer address and vice-versa), but usually these calculation should not be an overhead as we are almost never ALU limited in kernel.
Of course, you can save any un-necessary calculation if you control it manually and compare the performance difference between Brook+ AT vs your manual code.