How to enable Lightweight Profiling (LWP) on a multicore system?

I am doing a project on multicore architecture, and I find the LWP useful for counting the DCache misses. However, how can I enable LWP on the machine? Do I need to download it from somewhere and install it on the machine?

I've looked in to the specificaion doc of LWP but it didn't mention how to enable it.

Thanks in advance!