Einführung in PCGrafikarchitekturen
Seminar aus Praktischer Informatik – SS 2000
Plattner Stefan 9660070
Gliederung des Vortrags
• Talisman Architekturkonzept
• Nvidia Geforce 3D Beschleuniger
• Der Transformationsprozess
• OpenGL Programmierschnittstelle
• DDR Speicher – Bandbreitenprobleme des PC
2
Codename „Talisman“
• Revolutionäre Multimedia
Architektur für die PC Welt
• Erstmal vorgestellt 1996
(Sigraph)
• Erste Implementierung
„Escalante/Touchstone“
•
•
•
•
•
•
Microsoft
Samsung
Silicon Engineering
Philips Microelectronics
Cirrus Logic
Fujitsu Microelectronics
3
Zielsetzung der Architektur
• Entwicklung einer umfassenden, effizienten
Multimedia Architektur für den PC
• Berücksichtigung der Limitierung bei
•
Speicher
•
Bandbreite
•
Kosten
• Aufheben der starken Einschränkungen bei
•
Bildauflösung
•
Bildrate
•
Farbtiefe
•
Szenenkomplexität
4
Schlüsselkonzepte
• Compositing Image Layers
• Compression
• Chunking
• Multipass Rendering
5
3D Grafik Pipeline
Application tasks
Steuert Objektbewegung
Interaktion der Objekte
Scene level tasks
Object Level culling
Level of Detail
Transform
Berechnung der aktuell gültigen Szene
Projektion von 3D auf 2D
Lightning
Beleuchtungs- und
Reflexionseffekte werden berechnet
Triangle Setup &
Dreiecksaufbau aus den Punktdaten
Clipping
Beschneidung der Dreiecke
Rendering
Modell wird in Pixel zerlegt
Farbwerte berechnet
6
Escalante System Partitioning
RDRAM
2M
Escalante VLSI Components
Standard Components
Commodity DRAM
RDRAM
2M
R
Media
DSP
Polygon
Object
Processor
Image
Layer
Compositor
Compositing
DAC
Audio
G
B
}
Video
PCI
Bus
stereo
Modem
7
Samsung MSP-1 Processor
ARM7 RISC
CPU
Vector
coprocessor
32 Bits
5KB Dcache
2KB Icache
256 Bits
16KB IROM
Cache subsystem
Customer
ASIC
Four timer
modules
Codec
Interface
Full-duplex
UART
Interrupt
Controller
Bitstream
Procesor
256 Bits
Customer specific
Interface (mediabus)
Video
Audio
Phone
PCI bus
Interface
32 Bit PCI bus
Memory
Controller
32/64 Bit memory bus
DMA controller
32 Bits
64 Bits
8
Polygon-Object-Processor
Primitive
queue
Primitive
register
Prerasterizer
Texture
read queue
Texture
cache controller
Media DSP
Command
and memory
controler
Pixel
queue
Cache
address
map
Compressed
texture cache
Rambus
Channels
Decompressor
Initial
evaluation
unit
Texture
Texture
Cache
Cache
Rasterizer
Texture
filter engine
Compressor
Depth/Stencil/priority
buffer
Fragment
resolve
Fragment buffer
Color
Colorbuffers
buffers
Pixel
Engine
9
Image Layer Compositor
Primitive
queue
Tiler
Interface
controller
Primitive
register
Prerasterizer
Texture
read queue
Image layer
controller
Cache
address
map
Compressed
texture cache
Decompressor
Initial
evaluation
unit
Image
layer
Texture
cache
Cache
to Compositing DAC
Rasterizer
Image layer
filter engine
Compositing
Buffer controller
10
Compositing DAC
Memory
Controller
Alpha
from Image
Layer
Compositor
RGB video output
Scan line buffer
1,344x32x8
Compositing
Logic
Multiplexer
Scan line buffer
1,344x32x8
from
Media
DSP
Alpha Buffer
1,344x32x8
Media Bus
Interface
256x8
color LUT
Clock
generator
256x8
color LUT
256x8
color LUT
DACs
CRT
Controller
HSync
VSync
System clocks
11
Entwicklung PC-3D Beschleuniger
12
Nvidia GeForce256 DDR
Prozessor
Transistoren
Intel Pentium III
~9.5M
AMD Athlon (K7)
~22M
Nvidia GeForce
~23M
13
Feature Overview (1)
HDTV Video
Processor
4 Pipeline
3D Graphics
Engine
2D Graphics
Acceleration
• Integrierte T&L Einheit
Digital
Interface for
TV Encoder
• 4 paralelle rendering pipelines
Interface for Digital
Flat Panel w/
Scaling
and Centering
VGA
Transform,
Clipping and
Lighting
Host Interface
with DMA
15-25 million triangles/sec
480 million pixels/sec
• 256-bit 2D rendering engine
• AGP 4X with fast writes
• High-speed memory interface
RAMDAC w/
Video Overlay
366MHz, 128-bit
Memory
Interface
SDRAM/SGRAM
SDR/DDR
Up to 128MB
• 350MHz RAMDAC
VIP 2.0 Port
• .22u enhanced 5LM process
14
Feature Overview (2)
Manufacturing process, um
0.22
Maximum z-buffering depth
24
Default chip clocking, MHz
120
Z-biasing
+
Default memory clocking, MHz
166
Stenciling, bits
8
Videomemory interface, bits
128
LOD biasing
+
Videomemory bandwidth, GB/sec
2.6
Trilinear filtering
+
Maximum on-board RAM, MB
64
Multitexturing
+
Truecolor in 3D
+
Multitexturing + trilinear filtering
+
Pipelines
4
Maximum simultaneous textures
2
Transformation & lighting
+
Anisotropic filtering
+
Fillrate in single texturing mode, MPX/sec
480
Embossing bumpmapping
+
Fillrate in dual texturing, MPX/sec
240
Dot product bumpmapping
+
Throughput, million triangles/sec
15
Environment mapped bumpmapping
-
Cubic environment mapping
+
Maximum texture size
2kx2k
32-bit texture support
+
Vertex blending
+
AGP-texturing
+
Maximum blended matrices
2
AGP-texturing in OpenGL
-
DXTC texture compression
+
AGP 2X support
+
S3TC texture compression in OpenGL
+
AGP 4X support
+
Edge antialiasing
-
AGP Fast Writes support
+
Full screen antialiasing
+
15
“The Industry’s First GPU”
Transform
Transform
Engine
Engine
Lighting
Lighting
Engine
Engine
Setup
Setup
Engine
Engine
Rendering
Rendering
Engine
Engine
Pixel
Pixel
Pipe
Pipe11
4x4
Architecture
Pixel
Pixel
Pipe
Pipe22
Pixel
Pixel
Pipe
Pipe33
Pixel
Pixel
Pipe
Pipe44
RIVA TNT2
HDTV Processor
16
Transform & Lightning
CPU
Graphics Processor
3D Accel.
3 Generation
1 Frame
(1/60 sec)
AI,
Physics,
Game Play
Transform
& Lighting
Rendering
1 Frame
(1/60 sec)
CPU
Graphics Processor
17
Was bringt die T&L Engine?
~3500 polygons
~400 polygons
18
Was bringt die T&L Engine? (2)
19
Wie nutzt man die T&L Engine?
20
Benchmarks der aktuellen 3D Chips
21
T&L Benchmarks (Q3)
22
Der Transformationsprozess
Betrachtungstransformation
Modellierungstransformation
Projektionstransformation
Fenstertransformation
23
Geometrische Transformationen
• Verschiebung ( translation )
• Skalierung ( scaling )
24
Geometrische Transformationen (2)
• Rotation ( rotation )
P´ = P + T
P´ = P * S
P´ = P * R
→
→
→
Verschiebung
Skalierung
Rotation
25
Homogenes Koordinatensystem
P( x, y ) → P( W*x, W*y )
P( X, Y, W ) = P( X/W, Y/W )
• translation
• scaling
• rotation
26
Komposition von Transformationen
27
3D Transformationsmatrizen
28
Komplexität von Transformationen
16 Multiplikationen +
12 Additionen für eine einzelne Transform Operation
29
Was ist OpenGL?
•
•
•
•
Software Schnittstelle zu Grafik Hardware
~120 Befehlen zur Beschreibung 3D Modellen
Netzwerkfähig (client, server)
Schnittstelle ist Plattformunabhängig
•
OpenGL enthät keine Befehle für User Inputs
•
Keine Befehle für windowing tasks
• OpenGL puristisch ( soph. Libraries)
•
GL Utility Library (GLU)
•
Open Inventor
• State Machine ( glEnable(), glDisable() )
30
OpenGL Grafikelemente
31
OpenGL Polygon Restriktionen
32
OpenGL Geometrie Direktiven
glBegin (GL_POLYGON);
#define PI 3.1415926535897;
glColor3f (0.0,1.0,0.0);
GLint circle_points = 100;
glVertex2f (0.0,0.0);
glBegin (GL_LINE_LOOP);
glVertex2f (0.0,3.0);
for(i=0;i<circle_points;i++) {
glVertex2f (3.0,3.0);
angle=2*PI*i/circlepoints;
glVertex2f (4.0,1.5);
glVertex2f( cos(angle),sin(angle)
glVertex2f (3.0,0.0);
glEnd();
);
}
glEnd();
33
OpenGL Beispiel Sonnensystem
34
OpenGL Beispiel (2)
void myinit(void)
{ gIShadeModel(GL_FLAT); }
#include <GL/gl.h>
#include <GL/glu.h>
#include "aux.h"
static int year = 5, day = 10;
void display(void)
{
glClear(GL_COLOR_BUFFER_BIT) ;
glColor3f (1.0, 1.0, 1.0);
glPushMatrix() ;
auxWireSphere(l.0);
/* draw sun */
glRotatef((GLfloat) year, 0.0, 1.0, 0.0);
gITranslatef (2.0, 0.0, 0.0);
glRotatef((GLfloat) day, 0.0, 1.0, 0.0);
auxWireSphere(0.2);
/* draw smaller planet */
void myReshape(GLsizei w, GLsizei h)
{
glviewport(0, 0, w, h);
glMatrixMode(GL_PROJECTION) ;
gILoadIdentity();
gluPerspective(60.0, (GLfloat) w/(GLfloat) h,
1.0, 20.0);
glMatrixMode(GL_MODELVIEW) ;
gILoadIdentity();
gITranslatef (0.0, 0.0, -5.0);
}
int main(int argc, char** argv)
{
auxInitDisplayMode(AUX_SINGLE | AUX_RGBA) ;
auxInitPosition(0, 0, 500, 500);
auxInitWindow(argv[0]);
myinit() ;
auxReshapeFunc(myReshape) ;
auxMainLoop(display);
gIPopMatrix();
gIFlush() ;
}
}
35
Speicherbandbreite
° Processor
• Logic capacity: about 30% per year
• Clock rate: about 20% per year
° Memory
• DRAM capacity: about 60% per year (4x every 3 years)
• Memory speed: about 10% per year
• Cost per bit: improves about 25% per year
36
Bandbreitenbedarf für
anspruchsvolle 3D Anwendung
640x480
800x600
1024x768
1280x1024
1600x1200
14
12
10
GB/s
8
6
2.9GB/sec oberes Limit 4
[email protected] 2
0
16 bpp
32 bpp
37
Vergleich SDR/DDR Speicher
38
Performance GeForceDDR vs SDR
Quake III
Quake II
Unreal
Expendable
0%
50%
100%
150%
200%
1600x1200x32
1024x768x16
39
Literatur & Quellenangaben
• Talisman Multimedia for the PC – Martin Randall; 1997 IEEE
• Fundamentals of Interactive Computer Graphics – J.D.
Foley, A. Van Dam; Adison Wesley 1984
• OpenGL Programming Guide – Jackie Neider, TomDavis,
Mson Woo; Addison Wesley 1993
• OpenGL Reference Manual; Addison Wesley 1992
• http://www.nvidia.com/Developer.nsf
• http://www.nvidia.com/Developer.nsf
• http://www.anandtech.com
• http://www.tomshardware.com/mainboard/00q1/000315/
• http://www.ixbt-labs.com/mainboard/ddr-sdram.shtml
• http://www.sgi.com
40

folien