ATI's ASHLI (was Re: SIMD compilers)

Top Page

Reply to this message
Author: Andy Sy
To: compsci
Old-Topics: SIMD compilers
Subject: ATI's ASHLI (was Re: SIMD compilers)
ATI's Advanced Shading Language Interface (downloadable form compiles
(A) Renderman and glslang (OpenGL shading language) shaders into either
(B) OpenGL vertex/fragment programs or DirectX Shader spec code.

(A) are the so-called high level shading languages which look
very much like C. Other examples would be Nvidia's Cg, Microsoft's
HLSL (a subset of Cg), and the Stanford Shading Language.

(B) are assembly languages that are intended to come very close to
the operations performed in hardware by GPU/VPUs(*).

(*) In most cases and for portability's sake, these do NOT represent
the native assembly language instruction sets of the GPUs/VPUs...
thus you have two translation phases for high level shaders, one
for the C-like language to the 'virtual asm byte-code', then from
that to the GPU's native instruction set... if it has one... I'm
actually curious about this portion and not really sure what goes on
in this translation phase... whether the GPU does have a native
instruction set... or if the graphics driver breaks down this virtual
instruction set into microcode which is what gets fed to the GPU...
(See Comment on Hardware Architecture section below)

Because ASHLI's interface displays the high level shader code right
next to the generated low level shader code (consisting of vector
oriented assembly language instructions) this might be a great
learning opportunity to see object code generation for SIMD
instruction sets in action. Shader code are invariably short
snippets less than a hundred lines long so the simplicity is
another thing that should aid understanding.

What MS calls pixel shaders, OpenGL calls fragment programs.
What MS calls vertex shaders (a misnomer), OpenGL refers to as
vertex programs.

Pixel shaders/fragment programs apply operations to pixels (these
affect color and transparency), thus calling them shaders is appropriate.

Vertex shaders/vertex programs apply operations to vertices (x,y,z,w
coordinate values), thus they do not strictly perform shading (although
in some cases their values are used to affect a pixel's color and

Guide to Ashli files
*.sl    Renderman C-like source code
*.glsl  glslang C-like source code
*.fp    OpenGL fragment program assembly source
*.vp    OpenGL vertex program assembly source
*.psh   DirectX Pixel Shader (2.0) assembly code

*.tsto scene setup code (?) - input to the testoron.exe test harness(?)

Comment on Hardware Architecture
First generation GPU hardware (GeForce 2/DirectX 8 pixel shader spec
1.1) did not even have flow control and were basically glorified
'register combiners'. Today's 3rd generation GPU hardware (Radeon 9xxx/
GeForce FX meant for DirectX 9 and OpenGL 2.0) are starting to look
like real coprocessors that can even be used to calculate non-graphics
algorithms (though I'm not sure how efficiently you will be able to ship
the results back into system memory).

It kind of sucks that the GPU/VPU has its own dedicated monster bus
and has to share data with the main CPU over a narrower/slower pipe
(AGP 8x and pretty soon PCI Express) and moreover, afaik, the speed
back and forth is not the same (i.e. asymmetrical, like DSL and Cable),
with pulling data back to system memory significantly slower.

But an integrated solution seems out of the question. Apparently, the
nature of graphics computations necessitates a very different architecture...
GPU/VPUs are supposed to have hundred stage pipelines (super sensitive to
stalls) and consume humongous bandwidth (the GeForce FX 5900 has 30.4GB/sec
of onboard bus bandwidth whereas even a still-fictional DDR800 on your
standard mobo architecture will only give you 6.4GB/sec).

Thus even if Intel or AMD implemented these instruction sets on either the
main CPU or a coprocessor that sat on the mainboard, the bus architecture
would still be all wrong (for 3D at least)... unless... it is possible to
cheaply replicate the vidcard's ultrawide bus architecture on the mobo

Sigh... when it comes to parallel processing (and this is just SIMD), all
sorts of architectural considerations come into play... it's not just a
question of instruction sets...

True Computer Science Mailing List
compsci@??? (#CompSci @
Searchable Archives: