I modified llvm (roc-1.6.x) a bit to generate a code that can run on AMDGPU pro dirver. It can run but the performance is over 10% slower than AMDGPU's online compiler, for the same opencl code. I wonder if there is some flags I can set to tune up llvm. If you can give me some examples it will be great.
            Asked
            
        
        
            Active
            
        
            Viewed 187 times
        
    3
            
            
        - 
                    Could you please show your modification? – Michael Lukin Sep 07 '18 at 13:27
 - 
                    1There is some open source project to follow: https://github.com/zawawawa/GCNminC – user1200759 Sep 07 '18 at 16:29
 - 
                    1I changed some local id and local size to constant and now llvm code is the same fast as amdgpu – user1200759 Oct 26 '18 at 17:32