I am trying to transform screen space coordinates(2D) to world space(3D) for point cloud generation in python language. Given to me are projection matrix, view matrix, and a depth image. I am trying to follow these steps: Getting World Position from Depth Buffer Value.
So far, I have come up with this code:
import random
import numpy as np
origin = camera[:-1]
clipSpaceLocation =[]
m_points = []
# Matrix multipication of projection and then view and finally inverse of it
IViewProj = np.linalg.inv(proj @ view)
                          
for y in range(height):
    for x in range(width):        
                
        # 4x1
        # depth image with grayscale values from 0-255
        clipSpaceLocation = np.array([(x / width) * 2 - 1,
                                      (y / height) * 2 - 1,
                                       depth[y,x] * 2 - 1,
                                      1])
        # 4x4 @ 4x1 -> 4x1
        worldSpaceLocation = IViewProj @ clipSpaceLocation
        # perspective division
        worldSpaceLocation /= worldSpaceLocation[-1]
        worldSpaceV3 = worldSpaceLocation[:-1]
        m_points.append(worldSpaceV3)
                   
        
m_points = np.array(m_points)
m_points are [xyz] position, which I am eventually plotting on point cloud but it isn't giving the correct result, it's basically giving me the point cloud of depth image. Can anyone help me with this?