
Update! A sample
demonstrating picking of meshes using DirectX 8 is now available
IntroductionDirect3D provides the means to project your 3D world onto the screen, but often 3D titles require the ability to convert screen coordinates into 3D, and to determine the objects that are visible at a location in the viewport. Such techniques are often used for picking targets or plotting weapon trajectories in 3D games. This article lies the initial groundwork required for establishing a ray pick routine, by providing algorithms and working code to convert a pixel coordinate into a ray in world space. Overview of Ray CastingThe process of converting screen coordinates to 3D requires that we run through the vertex transformation process in reverse. Basically, this is a five step process:
Normalizing Screen CoordinatesWe begin our journey with screen coordinates, corresponding to a pixel on the screen. For the sake of a standard for this discussion, we will assume that coordinates are based on a 640 x 480 resolution. To make use of these coordinates, we must first redefine them in terms of the visible area of the viewing frustum. There are two differences that we must account for to do this:
To deal with this, we scale the incoming coordinates and offset them to the center. We must also handle the difference in width and height, to compensate for the aspect ratio of the display: #define WIDTH
640.0f dx=(x/WIDTH_DIV_21.0f)/ASPECT; Scaling Coordinates to the FrustumThe next step we must accomplish is to determine what these coordinates mean in view space. To understand this, we must first take a brief look at the characteristics of the viewing frustum. If you look at the frustum from any side, you can view it as a two dimensional triangle  we will use this perspective to analyze the problem, as it allows us to deal with one axis at a time. In fact, the horizontal and vertical axis are not interrelated in this problem, so this part of the process really becomes a 2D problem.
Since we defined the frustum, we know the angle at which the sides of the frustum meet  this is the field of view we used to create the projection matrix in the first place. Since we have split the frustum into two halves, each triangle has an angle at the origin of view equal to FOV/2. If we calculate the tangent of this angle, we now have a value that corresponds to the ratio between the displacement on the X axis versus the distance away from the viewer on the Y.
With this information in hand, we can now calculate 3D coordinates in view space. Since we know that the center of the screen corresponds to (0,0,Z), and since we know where the edge of the screen is, we can now interpolate any point in between. Since we have already normalized our screen values, all we need to do is multiply them by the tangent to find a ratio for this point in the frustum. We can adapt the previous lines of code to include this: dx=tanf(FOV*0.5f)*(x/WIDTH_DIV_21.0f)/ASPECT; Note that this code calculates the tangent each time  this is only for clarity. You can calculate this once at the start of the application, or use a constant if your projection will not change. Calculating the End Points of the RayNext, we can calculate the coordinates of the ray relative to the view, using end points at the near and far clipping planes: p1=D3DVECTOR(dx*NEAR,dy*NEAR,NEAR); Generating an Inverse of the View MatrixTo transform our coordinates back to world space, we will need an "inverse matrix" of our view matrix  that is, a matrix that does the reverse of the view matrix, and thus given a coordinate that has been multiplied by the view matrix, will return the original coordinate. Fortunately, there is a handy helper function in the D3DMATH.CPP and D3DMATH.H files of the Direct3D framework that takes care of it for us: lpDevice>GetTransform(D3DTRANSFORMSTATE_VIEW,&viewMatrix); Note that not all matrixes can be inverted, but in normal use this should not be a problem. It is stipulated, however, that the above function will fail if the last column of the matrix is not 0,0,0,1. Converting the Ray to World CoordinatesFinally, all that remains is to multiply these vectors by our inverse matrix, and there it is! We have defined a line in 3D World coordinates that corresponds to the screen coordinates we started with. D3DMath_VectorMatrixMultiply(p1,p1,invMatrix); Working Code Sample for Ray ProjectionTo make it easier to grab, here is a complete copy of the source code presented above : #define NEAR 10.0f #define FAR 4000.0f #define FOV 0.8f #define WIDTH 640.0f #define HEIGHT 480.0f #define WIDTH_DIV_2 (WIDTH*0.5f) #define HEIGHT_DIV_2 (HEIGHT*0.5f) #define ASPECT 1.3333f void calcRay(int x,int y,D3DVECTOR &p1,D3DVECTOR &p2) { float dx,dy; D3DMATRIX invMatrix,viewMatrix; dx=tanf(FOV*0.5f)*(x/WIDTH_DIV_21.0f)/ASPECT; dy=tanf(FOV*0.5f)*(1.0fy/HEIGHT_DIV_2); lpDevice>GetTransform(D3DTRANSFORMSTATE_VIEW,&viewMatrix); D3DMath_MatrixInvert(invMatrix,viewMatrix); p1=D3DVECTOR(dx*NEAR,dy*NEAR,NEAR); p2=D3DVECTOR(dx*FAR,dy*FAR,FAR); D3DMath_VectorMatrixMultiply(p1,p1,invMatrix); D3DMath_VectorMatrixMultiply(p2,p2,invMatrix); } Where to Go from HereIf you are using this to retrieve a vector  for example, setting the direction for a projectile to travel  then at this point, you have what you need. You can subtract the two vectors returned from the above function, and normalize them to get a vector representing direction in world space: D3DVECTOR calcDir(int x,int y) { D3DVECTOR p1,p2; calcRay(x,y,p1,p2); return Normalize(p2p1); } Many applications will be more demanding, requiring the ability to determine what is visible at a given screen location. To do this, the ray must be tested for intersection against objects in the scene. There may be multiple points of intersection  the closest intersection (with a visible poly) to the viewer is the one that will be visible. Unfortunately, there are a lot of variables in implementing this, which is why there is not method provided in Direct3D Immediate Mode  any "generic" means of testing object intersection with a ray, solely from lists of primitives, would require that intersection with every single triangle in every object be tested for intersection. On the other hand, by knowing information about the scene and the objects residing there, a developer can optimize this routine greatly based on a particular application. For example, object bounding boxes may first be tested to see if polygon based testing is necessary  or bounding box intersection may often be all that is needed to provide acceptable results. Depending on the object shape, bounding spheres can also provide an efficient means of testing, or multiple overlapping spheres can be used to bound more complex objects. These are things that... well, figuratively speaking, only a mother (read: the application's developer) would know. You will need to take a close look at your application to find the best way to handle this. Come back soon for more articles on this topic.... We will be providing a series of articles on various intersection testing techniques and methods for managing a scene database, as time permits.

Visitors Since 1/1/2000:
