PERISTENT_MEM_AMD is a AMD Platform Specific flag - You can take advantage of it in AMD platforms.
It wont work on NVIDIA GPUs.I think your code will not even compile.
If you cannot precalculate, you only have to allocate a chunk and divide the chunk among workgroups. When each workgroup has hit its limit, it has to write out the calculated values to its chunk along with meta-information on where it left-off the computation. A subsequent kernel should read it back and resume computation....This has to happen like a pipeline with memory chunks being allocated for each iteration (and probably freed (or) re-used subsequently)
If you describe the algorithm in more details, I can help you out.