Batching of Polygons with a Vertex Cache

Home | Up | Search | X-Zone News | Services | Book Support | Links | Feedback | Smalltalk MT | The Scrapyard | FAQ | Technical Articles

 

Written by Robert Dunlop
Microsoft DirectX MVP

Introduction to the Problem

A common performance pitfall in Direct3D is caused by rendering a few polygons at a time, resulting in a large number of DrawPrimitive() or DrawIndexedPrimitive().  Often it is possible to group polygons together into larger batches, resulting in far greater performance.

The worst case scenario occurs when an application is repeatedly rendering one or two triangles per rendering call.  I see this in applications far more frequently than one might expect, and the majority of these occurrences seem to fall into several design categories:

bullet

Applications that use spatial partitioning trees, such as BSPs, are prone to rendering each visible node individually as the scene is traversed.
 

bullet

Applications ported from OpenGL commonly fall prey to this syndrome.
 

bullet

Applications involving tile maps, such as isometric RPG style games.

Often such applications will also use DrawPrimitiveUP() or DrawIndexedPrimitiveUP() rather than their more efficient buffered counterparts.

Batching Primitives

The solution is to gather together primitives into larger batches of polygons, and render them together in a single call.  Typically each batch should contain at least 100-200 triangles, though with today's hardware rendering 1000 or more triangles per call is preferred.

Batching triangles together for rendering in a single DP call requires that all of them be rendered with the same render states, including texture, lighting, transformations, etc.  This will require sorting to be performed at some level.  If the application already has polygons segregated in this manner then it will be easy to adapt, otherwise sorting will have to be performed.

Use of a Vertex Cache

At the end of this article we will provide source code for a class providing a "vertex cache", which provides a cache that your application can stream primitives to during a scene, to be rendered together in larger batches.  Note that the implementation provided is limited to triangle lists and is designed to accept arrays of vertices containing only those vertices to be rendered in a given primitive, but this can be easily modified to suit your needs.

To use the vertex cache class, your application must first create an instance of CVertexCache after D3D initialization.  The constructor takes five parameters:

CVertexCache(UINT maxVertices,UINT maxIndices,UINT stride,DWORD fvf,DWORD processing)

bullet

maxVertices - defines the maximum number of vertices to cache into a single batch

bullet

maxIndices - defines the maximum number of indices to cache into a single batch

bullet

stride - defines the stride of the vertex format

bullet

fvf - specifies a combination of Flexible Vertex Format flags defining the format of the vertices

bullet

processing - defines the vertex processing type (specify either D3DUSAGE_SOFTWAREPROCESSING or 0)

To render with the vertex cache, three functions are used:

HRESULT Start()

Call this function prior to rendering with the vertex cache to set up the index and vertex streams, as well as the vertex shader.  Note that these states must not be modified until any primitives subsequently rendered have been flushed from the cache.

HRESULT Draw(UINT numVertices,UINT numIndices,const WORD *pIndexData,
const void *pVertexStreamZeroData)

Writes primitives to the cache, copying numVertices vertices from the pVertexStreamZeroData pointer and numIndices indices from the pIndexData pointer.  Only 16 bit (WORD) indices are supported in this version, and that the number of indices rather than the number of primitives are specified.  The value of numIndices should be three times the number of triangles to be rendered.

Passing NULL for pIndexData causes the vertices to be treated as a non-indexed triangle list.  The value of numIndices should still be set according to the triangle count, and should be equal to numVertices.  The value of numVertices must be evenly divisible by three in this case.

In this sample class the values of numIndices and numVertices cannot exceed the maximums set on initialization.  This could easily be dealt with, though, by testing for this condition and immediately rendering the passed primitive if larger than the buffer sizes.

If a call to Draw() fills either the index or vertex buffer to capacity, the triangles in the cache are rendered automatically and the cache cleared to make room for new primitives.

HRESULT Flush()

This function renders any triangles remaining in the cache.  This function is called by Draw() whenever the cache is full, and should also be called prior to changing render states to render primitives requiring a different texture or other state changes.

Basic Usage

The pseudocode below illustrates use of the vertex cache to render groups of polygons sorted by texture:

// g_vertexCache points to a previously created instance of CVertexCache
g_vertexCache->Start();
for (int i=0;i<numTextures;i++) {
    lpDev->SetTexture(0,textures[i]);
    for (int j=0;j<numTriangles[i];j++) 
        g_vertexCache->Draw(..,numTriangles[i]*3,..,..);
    g_vertexCache->Flush();
}

Source Code

// VertexCache.h: interface for the CVertexCache class.
//
//////////////////////////////////////////////////////////////////////
#include <d3d8.h>
class CVertexCache  
{
public:
	HRESULT Flush();
	HRESULT Start();
	HRESULT Draw(UINT numVertices,UINT numIndices,const WORD *pIndexData,
			const void *pVertexStreamZeroData);
	CVertexCache(UINT maxVertices,UINT maxIndices,UINT stride,DWORD fvf,DWORD processing);
	virtual ~CVertexCache();
	DWORD m_fvf;
	UINT m_maxVertices;
	UINT m_numVertices;
	UINT m_maxIndices;
	UINT m_numIndices;
	IDirect3DVertexBuffer8 *m_vBuf;
	IDirect3DIndexBuffer8 *m_iBuf;
	UINT m_stride;
	BYTE *m_vertPtr;
	WORD *m_indPtr;
};

// VertexCache.cpp: implementation of the CVertexCache class.
//
//////////////////////////////////////////////////////////////////////

#include "VertexCache.h"

#define SAFE_RELEASE(x) if (x) {x->Release(); x=NULL; }

//////////////////////////////////////////////////////////////////////
// Construction/Destruction
//////////////////////////////////////////////////////////////////////

CVertexCache::CVertexCache(UINT maxVertices,UINT maxIndices,UINT stride,DWORD fvf,
			DWORD processing)
{
	// create the vertex buffer
	m_vBuf=NULL;
	lpDevice->CreateVertexBuffer(maxVertices*stride,
				 D3DUSAGE_DYNAMIC|D3DUSAGE_WRITEONLY|processing,
				 fvf,D3DPOOL_DEFAULT,&m_vBuf);

	// create the index buffer
	m_iBuf=NULL;
	lpDevice->CreateIndexBuffer(maxIndices*sizeof(WORD),
				D3DUSAGE_DYNAMIC|D3DUSAGE_WRITEONLY|processing,
				D3DFMT_INDEX16,D3DPOOL_DEFAULT,&m_iBuf);

	// clear the vertex and index counters
	m_numVertices=0;
	m_numIndices=0;

	// save buffer sizes, vertex format, and stride
	m_maxVertices=maxVertices;
	m_maxIndices=maxIndices;
	m_stride=stride;
	m_fvf=fvf;

	// clear buffer pointers
	m_indPtr=NULL;
	m_vertPtr=NULL;
}

CVertexCache::~CVertexCache()
{
	// release vertex and index buffers
	SAFE_RELEASE(m_vBuf);
	SAFE_RELEASE(m_iBuf);
}

HRESULT CVertexCache::Draw(UINT numVertices, UINT numIndices, const WORD *pIndexData,
			const void *pVertexStreamZeroData)
{
	HRESULT hr;

	// will this fit in the cache?
	if (m_numVertices+numVertices>m_maxVertices||
		m_numIndices+numIndices>m_maxIndices) 

		// no, flush the cache
		Flush();

	// check to see if we have pointers into buffers, lock if needed
	if (!m_indPtr)
		if (FAILED(hr=m_iBuf->Lock(0,0,(BYTE **) &m_indPtr,D3DLOCK_DISCARD)))
			return hr;
	if (!m_vertPtr) 
		if (FAILED(hr=m_vBuf->Lock(0,0,&m_vertPtr,D3DLOCK_DISCARD)))
			return hr;

	// copy the vertices into the cache
	memcpy(&m_vertPtr[m_stride*m_numVertices],pVertexStreamZeroData,m_stride*numVertices);

	// save current index count
	int startInd=m_numVertices;

	// loop through the indices
	for (int i=0;i<numIndices;i++) {
		
		// add the index
		m_indPtr[m_numIndices]=((pIndexData!=NULL)?pIndexData[i]:i)+startInd;

		// increment the index counter
		m_numIndices++;
	}

	// adjust vertex counter
	m_numVertices+=numVertices;

	// return success
	return S_OK;
}

HRESULT CVertexCache::Flush()
{
	HRESULT hr;

	// unlock the vertex and index buffers
	if (m_indPtr) {
		m_iBuf->Unlock();
		m_indPtr=NULL;
	}
	if (m_vertPtr) {
		m_vBuf->Unlock();
		m_vertPtr=NULL;
	}

	// are there triangles to render?
	if (m_numIndices&&m_numVertices) 

		// yes, render them
		if (FAILED(hr=lpDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST,
							  0,m_numVertices,
							  0,m_numIndices/3)))
			return hr;

	// clear the vertex and index counters
	m_numVertices=0;
	m_numIndices=0;

	// return success
	return S_OK;
}

HRESULT CVertexCache::Start()
{
	HRESULT hr;

	// set the index buffer, vertex buffer, and shader for the device
	lpDevice->SetIndices(m_iBuf,0);
	lpDevice->SetStreamSource(0,m_vBuf,m_stride);
	lpDevice->SetVertexShader(m_fvf);
	
	// clear the vertex and index counters
	m_numVertices=0;
	m_numIndices=0;

	// lock the vertex and index buffers
	m_indPtr=NULL;
	if (FAILED(hr=m_iBuf->Lock(0,0,(BYTE **) &m_indPtr,D3DLOCK_DISCARD)))
		return hr;
	m_vertPtr=NULL;
	if (FAILED(hr=m_vBuf->Lock(0,0,&m_vertPtr,D3DLOCK_DISCARD)))
		return hr;

	// return success
	return S_OK;
}

This site, created by DirectX MVP Robert Dunlop and aided by the work of other volunteers, provides a free on-line resource for DirectX programmers.

Special thanks to WWW.MVPS.ORG, for providing a permanent home for this site.

Visitors Since 1/1/2000: Hit Counter
Last updated: 07/26/05.